HACKER Q&A
📣 aaronharnly

Securely finding the intersection of two lists?


I've now three or four times run into the situation where I want to securely and privately find the intersection of a list I have, with a list someone else has – but without either of us seeing one another's full list.

For example, I want to find discover customers with a prospective partner. But they don't want to send us their complete customer list, and we don't want to send ours to them.

There is CS research into the topic of "Private Set Intersection Protocols", but I haven't found a working implementation. And I could pay an accounting or legal firm to do this manually, but that will cost more money and time than it's usually worth.

Have you ever heard of somebody doing an escrow service for this, where party A and party B each send their lists to the service, the esrvice verify the size of each list (so we're not just spamming the service), and then they send back the matches to both parties? Or a tool simple enough to send to someone else to run locally?


  👤 klavinski Accepted Answer ✓
What about exchanging the hashes of the customers? A common hash would mean a common customer.

"What if he sends back my list of hashes, although he has no common customer?"

I would do as follows:

1. Both pick a hashing function (f and g).

2. Both share their hashing function.

3. Both hash their list with two functions.

4. Both send the list of hashes computed with the other's function.

5. Common hashes <=> common customer.

For example, my list: Amanda, Patrick

His list: Patrick, Stanley

I choose f, he chooses g.

My list |> f = 12, 54 (mine)

My list |> g = 36, 2 (public)

His list |> f = 54, 7 (public)

His list |> g = 2, 68 (his)

I compare both lists with f, and he does the same with g. I conclude the only common customer is Patrick. He would not be able to send 54 without knowing Patrick or reverse-engineering the function (or being pretty lucky).


👤 aaronharnly
This is one of the many papers on the topic of algorithms to compute set intersection while revealing little or no additional information - but I haven't seen anyone running this as a service or writing a tool to make it possible for two parties:

https://pdfs.semanticscholar.org/389d/1af665a5111306e1f118ff...


👤 gwittel
I haven’t seen an off the shelf thing. It might be possible with PKI incantations but the first that comes to mind without a third party is homomorphic encryption.

Libs like MS SEAL only do limited numeric operations. Without reading the research, that seems fine. Hash each name to a number. Encrypt, then do a compare. (Lots of hand waving here).

Another possible scheme are things like the Google safe browsing update/client side protocol. There you’re working against hashes only. Depending on needs it may be good enough.


👤 eb0la
If you can tolerate some error, a bloom filter might work.

Both parties make their own bloom filter and send the results to the other party.

The good part is no personal data goes inside. I mean, you cannot tell beforehand if someone is a customer until you try to store their data into the filter.

Bad news: bloom filters have erros. So, before doing this better guess the monetary value of the error.