Skip to content
Tech News
← Back to articles

Ask HN: How do you separate intentional test boilerplate from real duplication?

read original get Code Duplicate Detection Tool → more articles
Why This Matters

This discussion highlights a significant challenge in code duplication detection tools: distinguishing between intentional test boilerplate and genuine code duplication. Addressing this issue is crucial for improving the accuracy of automated tools, which benefits developers by reducing false positives and streamlining code maintenance. As open-source projects grow, refining such detection methods becomes increasingly vital for maintaining clean, efficient codebases.

Key Takeaways

I am maintaining an open-source project (deterministic open source duplicate-code detector) and a user asked for a feature which I don’t have a clear answer on how to implement.

This seems a very hard problem to solve:

-Tests repeat the same scenario. For a structural detector, this flags as repetition (duplication). However, tests are not something people want to delete from the codebases.

-The repetitions from tests (on purpose) end up looking like undesired code duplication and the tools canno tell which is which.

-One way to solve this would be something like a human in the loop (kind of how linters allow user to accept something once, while keeping the default first run zero-config).

Wonder how you have seen this handle and if anyone have any ideas.

Here is the the repo: https://github.com/Rafaelpta/dupehound

And here is the issue with more detail: https://github.com/Rafaelpta/dupehound/issues/23