How to publish online without it being used for ML model training?

Question

Is there any way to publish somewhere in a way that is unlikely (a guarantee would be great, but that seems impossible) it'd be used for model training?

themodelplumber · Accepted Answer

If by online you mean web publishing, it seems there's some consensus around these tags:

https://twitter.com/globalcomix/status/1604279726985474048

Another thing to potentially look into is the headers being sent:

https://twitter.com/stealcase/status/1605736262949687296

smoldesu · Answer

Watermarking it might be a good start. The people who go through training data are probably flagging pictures that have unsightly artifacts or unrealistic destructive changes in the image. If you add a Shutterstock-style watermark it would probably get removed from most sets, and a prominent signature in the bottom corner would probably also pretty well.As for text, I guess your best bet is to either limit exposure of it or intentionally poison the data. It's a little bit harder to do this in writing I guess, but you could still try by creating unusable fictitious accounts of characters named "Biden" or "Boris" doing increasingly ridiculous things. Any politically-stark moderator would probably remove your data before it hits the model, and if it does there's a good chance it will be flagged as problematic.

How to publish online without it being used for ML model training?

Is there any way to publish somewhere in a way that is unlikely (a guarantee would be great, but that seems impossible) it'd be used for model training?

If by online you mean web publishing, it seems there's some consensus around these tags: https://twitter.com/globalcomix/status/1604279726985474048Another thing to potentially look into is the headers being sent:https://twitter.com/stealcase/status/1605736262949687296

If by online you mean web publishing, it seems there's some consensus around these tags:
https://twitter.com/globalcomix/status/1604279726985474048
Another thing to potentially look into is the headers being sent:
https://twitter.com/stealcase/status/1605736262949687296