HACKER Q&A
📣 jonmchan

Is managing user PII in analytics a pain point for anyone?


Dealing with personal identifiable information (PII) in analytics can be tricky, especially if you want to respect people's privacy and be compliant with all the recent privacy laws like GDPR and CCPA. Anonymizing and de-identifying the data can be good way to deal with this as it gives you a lot more flexibility to utilize the data freely, protecting business analysts from accidentally violating privacy policies; however, what do you do when you want to engage with cohorts of users that you found through anonymized and de-identified datasets? Is this a challenge that anyone else faces?

I'm trying to gauge interest for creating a PII management system that would provide a clear demarcation of roles for analyst and automated engagement platforms:

1. It would allow analysts and analytics platforms the ability to operate on anonymized datasets

2. It would provide role-based authorization for engagement platforms to query contact info from cohorts generated from anonymized datasets utilizing the information solely to send predefined, automated messages via email, text, snail mail, etc.

Does a system like this sound useful for anybody? I'd be especially interested to hear from someone with first hand experience in this area. Thanks.


  👤 iknownothow Accepted Answer ✓
We use dbt at work. The entire analytics lineage (not the data) is laid bare for all to see. There is no analytics done outside of dbt. If PII were to enter into the lineage accidentally, it's easy to spot and remove. You also get to know which downstream models have used this PII.

👤 donutshop
Would love something like this. The behemoth in this space is Microsoft Purview and more competition would be good.