frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Ask HN: How do you separate intentional test boilerplate from real duplication?

8•rafaepta•2d ago
I am maintaining an open-source project (deterministic open source duplicate-code detector) and a user asked for a feature which I don’t have a clear answer on how to implement.

This seems a very hard problem to solve:

-Tests repeat the same scenario. For a structural detector, this flags as repetition (duplication). However, tests are not something people want to delete from the codebases.

-The repetitions from tests (on purpose) end up looking like undesired code duplication and the tools canno tell which is which.

-One way to solve this would be something like a human in the loop (kind of how linters allow user to accept something once, while keeping the default first run zero-config).

Wonder how you have seen this handle and if anyone have any ideas.

Here is the the repo: https://github.com/Rafaelpta/dupehound

And here is the issue with more detail: https://github.com/Rafaelpta/dupehound/issues/23

Comments

ambicapter•2h ago
What is a "structural detector"?
nagaiaida•2h ago
clicking through to the repo linked at the end it appears to be rolling-hash-style ast structural pattern matching that ignores things like what names identifiers concretely have
echoangle•2h ago
Maybe I don’t quite understand the question but can you not just define a function that sets up the shared test state and use that in every test?
dezgeg•2h ago
Detect tests somehow (eg. in rust you could check for #[test]) and just skip the analysis for that function?
rafaepta•1h ago
Yeah, that is pretty much what it does already: it tries to recognize test files and skip them. Dupehound is available for 12 languages Today.

Some languages like RUst you mentioned, have a clear tag that says "this is a test," but others do not, so the tool has to guess from file names and ends up missing some and skipping too much.

Also as I mentioned on the answer below, sometimes you actually do want to see the repeats inside tests, or normal code repeats on purpose too. So I am leaning toward letting users wave off one specific case by hand instead of skipping everything blindly.

peterabbitcook•1h ago
I’ve dealt with a question that rhymes with this.

Sonarqube or CodeQL reports might tell me what percentage of a repo is duplicated code, and a large percentage of that is in src/test/java

I find that a lot of the time this is not just some flippant observation but a clue that I should be using a mechanism like @ParameterizedTest instead of @Test, so I rewrite those tests in a way that makes them easier to set-up, define parameters/constraints, inputs, and outputs. Sometimes it does get a little convoluted as you either use a lot of naked Arguments.of() or define test-class-scoped nested records to encapsulate test parameters, inputs, expected outputs, etc.

bilbo-b-baggins•55m ago
I would say since your focus is on structural or programmatic detection, and not LLM heuristics, the problem depends on language a lot.

In Rust or Go there’s super clear test markers or filenames.

In Javascript it would have to detect the framework in use then detect test files and tests embedded in program files.

And so forth.

Are you doing any call sequencing heuristics? Like if the same 5 calls (with different args) appear in the same order in multiple places (even in test files) that might be a strong signal for deduplication. Or even if the same 5 calls are in the same order with a couple different interleaved calls - the fuzziness of the heuristic might be something tunable to a language, or particular codebase, or framework, etc.

Ask HN: Will programmers write more efficient code during the memory shortage?

34•amichail•59m ago•47 comments

Ask HN: How do you separate intentional test boilerplate from real duplication?

8•rafaepta•2d ago•7 comments

Ask HN: Is anyone using the A2A protocol?

89•asim•1d ago•41 comments

Ask HN: What tools are you using for AI-assisted code review?

21•agos•1d ago•20 comments

Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

1304•cloudking•4d ago•558 comments

Ask HN: What is the coolest tech progress outside AI?

9•vantareed•6h ago•5 comments

Ask HN: Open-Source Intelligence

3•silent_butagrim•10h ago•3 comments

Ask HN: Is there a recognized standard for swarm intelligence benchmarking?

5•stephanieriggs•10h ago•1 comments

Ask HN: How are thinking efforts implemented?

104•simianwords•1w ago•31 comments

I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models

436•iliashad•5d ago•115 comments

Self-adapting and mutating LLM based viruses/worms

3•rozumbrada•12h ago•3 comments

Trillions of dollars spent just to work on customer services?

8•YihaoZhang•15h ago•2 comments

Ask HN: What are you working on? (June 2026)

310•david927•5d ago•1137 comments

Ask HN: I'm lost. How can I define ICP (Ideal Customer Profile)?

4•snowhy•1d ago•6 comments

Ask HN: Is anyone else leaving AUR?

4•lordkrandel•17h ago•6 comments

Ask HN: Why hasn't there been a real competitor to Ticketmaster yet?

265•mdni007•1w ago•240 comments

Ask HN: Is there a way to stop the animated Google Doodles?

10•arnejenssen•1d ago•12 comments

Ask HN: How do you effectively communicate or present?

7•hnthrow10282910•1d ago•5 comments

Ask HN: Conflicted about founding engineer role

7•gondolin1683•1d ago•18 comments

Ask HN: Do you find vibe coding / agentic engineering to be fulfilling?

8•uejfiweun•1d ago•11 comments

Ask HN: Favorite text heavy blogs that are a joy to read?

120•joshmarinacci•1w ago•31 comments

AWS Bedrock to require sharing data with Anthropic for Mythos and future models

427•TomAnthony•1w ago•255 comments

Ask HN: What's a prompt you've written that you're genuinely proud of?

10•akashwadhwani35•1d ago•6 comments

Ask HN: How do you find new books to read?

5•ahmedfromtunis•1d ago•5 comments

Ask HN: Has anyone had success with SBIR grants and what is the process like?

11•lyfeninja•1d ago•8 comments

Notes on DeepSeek

211•vinhnx•1w ago•141 comments

Ask HN: Want to build something open source on nights and weekends together?

39•vira28•1w ago•18 comments

Ask HN: Are other people seeing a spike in IT problems with businesses?

14•PaulHoule•2d ago•11 comments

Ask HN: Opus and regression with patterns not included in trainng data

2•dleech•1d ago•5 comments

Ask HN: Would it be useful to have a slop button in addition to flag?

41•BugsJustFindMe•1w ago•23 comments