HACKER Q&A
📣 MattyRad

Would denormalizing a string prevent AI/LLM consumption?


Hi. With burgeoning AI, I don't particularly like the idea of my persona being unwittingly scraped into an AI corpus.

Would denormalizing a string to unicode help prevent AI from matching text in a prompt? For example, changing "The quick brown fox" to "๐“ฃ๐“ฑ๐“ฎ ๐“บ๐“พ๐“ฒ๐“ฌ๐“ด ๐“ซ๐“ป๐“ธ๐”€๐“ท ๐“ฏ๐“ธ๐”" or "apple" to "รรžรžlรฉ". Since the obfuscated strings use different tokens, they wouldn't match in a prompt, correct?

Note that I'm not suggesting that an AI couldn't produce obfuscated unicode, it can. This question is only about preventing one's text from aiding a corpus.


  👤 PaulHoule Accepted Answer ✓
I was working on foundation models for business and we had done some work on character embeddings that would counteract that back in 2017.

Pro Tip: people whose ideas were worth stealing were worried about Googleโ€™s web scraping and the whole economy about it were unfair and exploitative 10 years ago. Suddenly the people whose ideas arenโ€™t worth stealing are up in arms about it.

Think more about having ideas that are worth stealing (e.g. leading the herd not following the herd) instead of getting your ideas stolen.