> The model was trained using text databases from the internet. This included a whopping 570GB of data obtained from books, webtexts, Wikipedia, articles and other pieces of writing on the internet. To be even more exact, 300 billion words were fed into the system.
I believe it's unfair to these sources that ChatGPT drives away their clicks, and in turn the ad income that would come with them.
Scraping data seems fine in contexts where clicks aren't driven away from the very site the data was scraped from. But in ChatGPT's case, it seems really unfair to these sources and the work that the authors put, as people would no longer even to attempt to go to these sources.
Can this start breaking the ad-based model of the internet, where a lot of sites rely upon the ad income to run servers?
People get internet hostile at me for this question, but it really is that simple. They've automated you, and it's definitely going to be a problem, but if it's acceptable for your brain to do the same thing, you're going to have to find a different angle to attack it than "fairness".
1. You have a widely read spouse named Joe who reads constantly. He's got a good memory, and typically if you have a question you just ask him instead of searching for it yourself. Are you depriving Joe's sources of your eyeballs?
2. Many books summarize and restate other books. If I read Cliff's Notes on a book, for example, I can learn a lot about the original book without buying it. Is this depriving the author?
3. I have a website that proxies requests to other websites and summarizes them while stripping out ads.
So which of these examples are a better metaphor for what a LLM does?
I don't know. The fact is, LLMs are a new thing in our tech and culture and they don't quite fit into any of our existing cultural intuitions or norms. Of course it's ambiguous! But it's also exciting.
Yesterday: 1) You do research, you publish a book, you write some posts. 2) People discover your work and you personally, they visit your posts and subscribe to you. 3) You have an opportunity to upsell your book and make money on ads to sustain your future work; more importantly, you get to see traffic stats and see what is in demand, you get thank-you emails and feel valued.
Tomorrow: 1) you do research, write posts, publish a book, 2) it is all consumed by a for-profit operated LLM. 3) People ask LLM to get answers, and have no reason or even opportunity to buy your book or know you exist.
What exactly are the incentives to publish information openly in that world?
(Will they even believe you if you say you’re the one who did the niche research powering some specific ChatGPT answer, in a world everyone knows that you can just ask an LLM?)
Artists are already in full rebellion against this, as they should be, being nearly eclipsed by AI, except when it comes to inventing new styles and hand-crafting samples for the models to train on. These, I assume, are either scraped off the web, or signed away in unfair ToS of various online publishing platforms.
Since the damage individually is small (they took some code from me without attribution, ok) but collectively enormous, in my opinion it the role of government to step in and soften the blow if necessary.
If you have a problem with ChatGPT's "scraped data", then you have more fundamental issues with how the internet is as it is today.
Please, people, learn how to focus your thoughts. Go read up on copyright law in the United States. If you go into learning about copyright law trying to justify your own preconceived notions you will gain nothing.
I'm actually not really sure I have an opinion on the ethics of it. Same argument as Adblock. You don't get to control how people consume your content if you put it out in the world for free. That goes for profiles, or articles, reddit posts, StackOverflow, etc. The only thing that's ironic is that large tech companies throw a fit whenever you want to turn the tables and scrape them.
For now, I have removed my existing works, both technical and creative, from the internet and won't be adding more while I try to work out what to do.
On the other hand, the focus on the potential of ChatGPT's natural language processing capabilities highlights the significance of learning and using LLM (Language Models) in data handling. The utilization of LLM can potentially lead to a future where traditional databases become obsolete and are replaced by advanced language models. As such, the development and integration of LLM in our daily lives and processes can bring about many benefits and possibilities.
At some point participating in the internet means your stuff is going to be seen. I wear glasses to read web content. I don't think the glasses company should pay royalties for what I read. chatGPT is a tool that allows me to understand and use the information people put onto the internet better.
Far from a matter of fairness, this is simply another way that selfish people are trying to monetize the future, to make it more and more difficult and expensive for others to participate.
"I've always wished I could charge everyone one earth. chatGTP looks like the future. If I can tap the money flow there I will get mo' money."
I'm against it.
It maybe was unfair to telephone operators when connection automation was implemented, as it made operators obsolete, but the older model couldn't scale, the same way reading text from source doesn't scale for human productivity.
E.g. Summary of How to Win Friends and Influence People: Effective Steps to Better Interpersonal Relationships by Book Lyte
ChatGPT does more of a mashup with the learned data than humans need to, that'll do me.
We can only hope. It’s unfair to someone that my browser can ask your server for a page, I see an ad for random bullshit nobody would ever care about, and money changes hands behind the scenes and that counts as an economic transaction which boosts GDP. It’s unfair (in my favour) that I can piggy back off this to get things for free.
And when I say “someone“ I suspect “everyone”. Sadly spending money advertising “Yorkshire woman finds guaranteed way to win on the horses” doesn’t seem to have caused anyone to run out of money and have the whole thing collapse yet. And it’s unfair on real small businesses with products paying for adverts which people don’t see or are clicked by bots or are misreported and all they can do is throw money at Google and Facebook and hope.
Clearly, ownership of ideas runs out, because we all use linked lists or binary trees, or paper, or turbines or the list goes on. We don't pay money to the inventors of linked lists, or the heirs or successors-in-interest to the inventor of paper. Why not? When does ownership of an idea expire? Why do we unconsciously accept copyright or patent limits of today?
There's also an issue with simultaneous invention, but that's out of scope here. Clearly ChatGPT is just regurgitating or otherwise emitting previously-ingested material.
Discussion is pointless because everyone already has an opinion and it's very firm.
Google is doing this in search results for years, so does bing. apple also does this in their built in dictionary.
why rant about chatgpt that currently at least is a small company in comparison.
question: How could the people who generate used in an ai language model be paid for their work?
answer: There are several ways in which the people who generate content for an AI language model could be paid for their work:
Royalty-based payment: Content creators could receive a percentage of the revenue generated from the use of their content in the AI language model.
Token-based payment: If the AI language model is built on a blockchain, content creators could be paid in tokens that could be traded for cryptocurrency or fiat currency.
Partnership with content publishers: The developers of the AI language model could partner with content publishers to compensate the creators of the scraped content.
2 - Code was trained from GitHub. GitHub is Microsoft. OpenAI is Microsoft money. So Microsoft trained its AI on Microsoft code. You disagree? Then GTFO from GitHub and don't feed Microsoft your code anymore.
3 (the most important point) - Q: "Can this start breaking the ad-based model of the internet, where a lot of sites rely upon the ad income to run servers?"
Fuck YEAH!! please do so. I hope the shit show that ad model is crashes and burn to the ground. You can't use internet without having a solid armor on you with uBlock Origin and/or NoScript (or PiHole if you want the same readable experience on rest of your house devices).
Hopefully. This would be the best outcome I can think of for the Internet.
Obviously storage is not a major factor here.
> What's the New York Times scrambled egg recipe?
GPT returns the exact recipe. If I were NYT I'd be frustrated. Their content is now showing without the ad views or paywall.
Is there something analogous to saliency maps for LLM?
1) AI is open sourced and we adapt stably. Either everybody has the opportunity to be their own business, or there is UBI.
2) AI is open sourced but it is unfairly distributed. Only some people are suited to BTOB, and/or UBI is shit.
3) AI is not open sourced, the wealthy edge out mankind and a planet scale genocide occurs.
4) none of it matters because the looming war between the US & China explodes or climate change wipes us out in any meaningful capacity that could pursue AI.
Given the track record of our species, #1 feels like wishful thinking
It’s really just building a better model.
Will LLMs drive interest/activity away from wikipedia.org? Will it put its own sources of high-quality ad-supported content -- wikihow.com, for example (though I can't be totally sure it scraped from there) -- out of business? Or is there an earth-shattering copyright suit against OpenAI in the works as we speak?
> Can this start breaking the ad-based model of the internet
Is the alternative that everything is behind some kind of paywall by default, to block scraping? Is that where we're heading?
"Copyright" "ingenuity of thought" etc are concepts that need to be overhauled since a lot more people now have access to higher education.
How could training an AI on the works of someone who has already been paid for them be unfair? - Possibly because it effects their future marketability and income?
Current authors, artists, internet commenters, clearly have an interest in the results of their creative endeavors being used for gain that they won't benefit from. This is very similar to the extractive monopolies of YouTube and the rest of social media. Their profit at our expense.