HACKER Q&A
📣 SnowHill9902

How can one ever prove if code was copied and not written from scratch?


How can one ever prove if code was copied and not written from scratch?


  👤 rsclient Accepted Answer ✓
I got hired to be an expert witness to prove this very thing, with a twist: Person P wrote software A in language RS/S. Person P then wrote software B in language VB/B. The question was whether they had improperly used software A when they wrote software B (they had sold software A)

And I could prove that they had! The key was that we could find vestiges of the A code in the B code in a way that wasn't just person P's own personal style. for example, software A would set a logging variable as it did some processing, and under specific conditions would actually write to a log file. Software B also had the logging variable, and set in the exact same way even though the underlying code was different and it was never written out. And there were date-stamped comments in software A, and the same comments were present in software B, even though software B was written long after software A.


👤 t-3
You can't. With human language, the wordspace is so large that plagiarism is often identifiable, but idiom and such are very common, so even there it's very hard to tell derivative from original (and that's not even going into patterns and archetypes). With programming languages, the wordspace is usually tiny, the idioms are very many, and most programs closely follow an archetypical pattern. If whole sections are copied, you can't tell, because almost everyone uses those same operations in the same way, the same variable naming patterns, and the same formatting style.

Lazy copiers might be easy to spot if there were inconsistencies in formatting throughout a program, comments that match copied-from-source, lots of extraneous glue that indicates lacking the understanding to factor and adapt techniques.


👤 mixmastamyk
There are probably services you can hire to do this for you. But it will simply be a more sophisticated version of google/github-searching the code fragments.

Also, everyone uses some Stack Overflow snippets, so it don't expect things to be absolutely from scratch.