If a story hasn't*had significant attention in the last year or so, then we don't treat it as a dupe, because it's important for good articles to get multiple chances at getting attention. Otherwise the randomness of what gets noticed on /newest would be even more dominant than it already is.
Presumably an exact URL match, and maybe within a timescale?
Let community moderate the itself. It's old school and maybe dumb but not everything needs smart ass AI.
I propose automatic de-dupe: whatever the title says, if the exact same URL has already been submitted, just count it as an upvote on the existing story... optional: if the submitted caption is different, then add a comment that says "also submitted by X with the caption `Y`."