HACKER Q&A
📣 ausudhz

ChatGPT and AI-like services do reward data owners?


As you all are aware, ChatGPT has been a buzzword for a few weeks now and, prior to that, other similar services like Dall-E, autopilot and so on have received similar attentions.

These services heavily leverage on publicly available data crawled from the internet to build models that are highly scalable in nature, providing an exponential benefit as compared as existing technologies (eg. Spending time searching across multiple websites)

The question is, are they rewarding the owners of the data that have been used to train this models? While you would like to brush it off easily by saying that they use public information that are freely available on internet, I'd like you to consider that:

First, it has been already proven that some services, like autopilot, have used GPL code during training of the model (which should force share-alike licensing terms)

Second, currently available licensing doesn't consider these use cases yet given that they're pretty new and started to popup just recently.

Why is this important? Well, first because if content creators are not rewarded for their work, they won't create content anymore and that would heavily influence the likelihood of having quality content to train new version of the model. While there are instances of quality content freely available (e.g. wikipedia) most of these cases are composed by no profit organizations that get fundings in a way or another.

Currently a big part of the web works in a very simple way: content creators create content, place some ads and, thanks to the traffic from search engines, get rewarded economically thanks to the ads revenue. Similarly, in other instances, there's a revenue sharing model (e.g. it's what has done the fortune of YouTube)

In future, if ChatGPT takes on, people could get information without even visiting a website, therefore not providing any sort of benefits to the content creator, which would likely stop to produce content (think about recipe websites)

Now that ChatGPT is going to charge 20$ per month per user, how much of this is going to reward who has provided the content (indirectly) instead of monetizing it only to the benefit of one company?


  👤 sharemywin Accepted Answer ✓
How would such a model work?

Let's take your recipe example? I don't think grandma got compensated for her original chicken soup recipe that someone posted online and then got tweaked to 100 versions of which 30,000 people upvoted or liked the various versions.

And then the AI comes along and outputs a new version of the recipe that's not identical to any of the previous versions because you told it to substitute pepper with some kind of other substitute.

Who should get compensated what?


👤 nickfromseattle
I read a good take on this.

Ethically, it's probably a good thing to compensate the rights holders.

But it also means the only companies that can afford this licensing are going to be the existing incumbents, e.g. Facebook, Amazon, Google, Microsoft.

And that society is probably worse if these companies gate keep society's access to good AI and the vasty majority of the productivity gains AI brings.


👤 altdataseller
Content creators are already compensated for their work (thru ads, donations, etc) and ChatGPT wont affect that.

👤 LunarAurora
The Copyright Law is already so convoluted, I'm curious how they are gonna apply it on AI Models in the coming decades.

For established libraries (photos/books...) controlled by corporations, it would be possible to licence in bulk for inclusion in models. But I don’t see how you can get royalties per generated item (like streaming)

How about Public Funds? Too “socialist” ? well brace yourself for revolution : The OSS trained on everything under the sun are coming (soon.) What are you gonna do about it?