The output of the model will be shared with all clients.
The clients require the model to see data that has been stripped of sensitive information. We could crudely just remove columns that we felt were sensitive but this would impact performance of the model.
Has anyone got experience or thoughts on how to approach this?
Any software / open-source on not that could help?
Txs MD
I know this won't sit well with privacy minded folks, but if you just need the data de-identified and not anonymized, you could pick the fields that might contain sensitive data and do a character for character swap. This way you retain the information without storing the personal information in a raw form.
names, addresses, id numbers or account numbers can be easily randomized. Dates and numbers (what kind of system is it?) are trickier since they are used in calculations. Finally the tricky part is making sure that the anonymized data still can't be tracked back to real entities.