HACKER Q&A
📣 hcxl

Are privacy concerns with GenAI services overblown?


A lot of people that I talked to brought up privacy as the biggest issue when it comes to using a GenAI service like OpenAI, Anthropic, etc. They would always say they "cannot upload confidential data" to the service and they are wary of what AI will do with these data. I'm just not sure whether this concern is founded as: 1) They are already using cloud services: email, slack, cloud storage. Literally all company data is on the cloud. 2) These AI services actually made it clear that they do not use any data from the APIs for training.

Are they being contradicting or am I just not seeing the potential security issue of a cloud GenAI service? Really curious what is the consensus outside of the circle of people I talked to and also what are the critical company data, if any, that somehow can be in all cloud services except GenAI?


  👤 hypoxia Accepted Answer ✓
Yes, they are overblown, with some caveats.

In terms of API usage, OpenAI has never used the prompts for training but this is very poorly understood among enterprise CEOs and CIOs. Executives heard about the Samsung incident early on (confidential information submitted by employees via the ChatGPT interface, which was training on the data by default at the time), and their trust was shook in a fundamental way.

The email analogy is very apt - companies send all of their secrets to other peoples' computers for processing (cloud compute, email, etc.) without any issue. BUT there's a big caveat: abuse moderation. Prompts, including API calls, are normally stored by OpenAI/MS/etc. for a certain period and may be viewed by a human to check for abuse (e.g. using the system to do phishing requests). This causes significant issues when it comes to certain type of data. Worth nothing that the moderation by default approach is in the proces of being dialed down, and there are now top tier enterprise plans that are no longer moderated by 3rd parties by default.

TL;DR: The concern stems from an early loss of trust (Samsung), but there is a valid issue for certain types of data (abuse moderation), but there are ways around it if you have enough money (enterprise plans).