Prompt: "king of belgium giving a speech to an audience, but the audience members are cucumbers"
All 4 results (all no good as far as the prompt is concerned): https://ibb.co/gz5RDkB
Fullsize of the one with the watermark https://ibb.co/DzGR063
In the United States, there are two bits of case law that are widely cited and relevant: In Kelly v. Arriba Soft Corp (9th), found that making thumbnails of images for use in a search engine was sufficiently "transformative" that it was ok. Another case, Perfect 10 (9th), found that thumbnails for image search and cached pages were also transformative.
OTOH, cases like Infinity Broad. Corp. v. Kirkwood found that that retransmission of radio broadcast over telephone lines is not transformative.
If I understand correctly, there are four parts to the US courts' test for transformativness within fair use (1) character of use (2) creative nature of the work (3) amount or substantiality of copying (4) market harm.
I'd think that training a neural network on artwork--including copyrighted stock photos--is almost certainly transformative. However, as you show, a neural network might be overtrained on a specific image and reproduce it too perfectly--that image probably wouldn't fall under fair use.
There are also questions of if they violated the CFAA or some agreement crawling the images (but Hiq v Linkedin makes it seem like it's very possible to do legally) and whether they reproduced Getty's logo in a way that violates trademarks (are they trying to use it in trade in a way there could be confusion though?)
When that is finally tried in court, if it fails to any meaningful extent at all (including going all the way up to Supreme Courts as it doubtless will), then Copilot is dead, DALL·E is dead, GPT-3 is dead, all of these things will be immediately discontinued in at least the affected jurisdictions, at least until such a time as they get the laws changed or judgements overturned.
The dynamics in play is highly questionable. Countless artists and photographers put effort into creating their works. They put they work online to get some attention and recognition. A company comes along, scrapes all of it and starts selling access to the model to generate something that looks highly derivative. The original cohort of artists and photographers not only get zero money or attention from this new endeavor, they are now in competition with the resulting model.
In short, someone whose work was essential to building a thing gets no benefits and possibly even gets (financially) harmed by that thing. Just because this gets verbally labeled "fair use" doesn't make it fair.
Additional point:
Just a few years ago a bunch of tech companies were talking about "data dignity". Somehow, magically, this (marketing) term is no longer used anywhere.
Considering how strict and heavy-handed copyright handling has been otherwise, this has added to my belief that copyright in practice is really just enforcement of the interests of whatever industry has the most power at a given time: When entertainment and content generation was the biggest revenue generator, copyright couldn't be strict enough, now all money is on AI and suddenly loopholes the size of barn doors pop up.
They aren't hosting the infringing content. Training on the data is probably covered under fair use. Generations are of _learned_ representations of the dataset, not the dataset itself. This makes it closer to outputting original works (probably owned by the person who used the model).
The players involved here are known for being litigious, however. I wouldn't be surprised if OpenAI did in fact pay some hefty fee upfront to get full permission to use these images.
https://www.reddit.com/r/KidsAreFuckingStupid/comments/8tgxs...
[0] https://cdn.ca9.uscourts.gov/datastore/opinions/2022/04/18/1...
BTW you can add 'royalty free' to the prompt to get rid of those most of the time (lol?).
That being said, arguments about copyright are just a fig leaf as far as I am concerned. The outcome of whether this is allowed or not will depend on the net impact of using those models on the job market and whether society will be willing to tolerate it.
You'll get a public link, at `labs.openai.com` rather than some random image-sharing site, which will show the image & the prompt used to generate it (including a credit to "your-first-name × DALL·E").
Say you were an artist who went to every art show and museum and studied all the art there.
If you produced a work of art solely from memory that contained large portions of other people's copyrighted art, would that still fall under copyright/require licensing?
We can't assume any licensing behind closed doors, my guess is that OpenAI has an agreement with Getty, take a look at the licensing in this Observer piece, it's been licensed by Getty, this would indicate that Getty are happy with scraping.
https://www.theguardian.com/commentisfree/2022/aug/20/ai-art...
Besides, this is not infringement in principle, the AI has been trained to think that high-quality news images have watermarks.
If a company reverse engineers a competitors product, they still buy the product to tear it apart and figure out how it works.
If a student learns from their teacher, then goes on to sell a similar kind of work as what their teacher makes, at least the student paid for the classes.
This arrangement offers none of that. As long as theft is illegal, this should be. I'd call it parasitic, but it isn't; this is a parasite who's sole intent is to kill the host.
You'd be surprised...
They probably already have specialized filtering models built to filter out censorable terms. They may be imperfect, but they are there. A watermark remover might be an easy addition.
When Stable Diffusion released their model playground, I used the prompt Peter at the pearly gates dressed as a security guard and got three images two of which were censored and one that was an ordinary image. So, the capability is there already. Just a matter of time before they get good at watermark removal.
There are lots of photos with watermark circulating on web, for example in memes and unfinished webpages (when finished, these will be replaced with paid variant without watermark).
BTW, Copilot also ignored all licenses of the source code it memorized.
Datasets are the new capital. If they could, most employees would probably also object to their company using the result of their work to replace their job. But they can't. It's the same with artists here.
Could be great for featured images for blog posts.
You are wasting CO2 even discussing it
The last time i checked it was when colpilot got public, they could have trained it only on gpl code. The source license/copyright et all don't matter.
This makes me think back to the controversy over github copilot; if these AIs are going to be trained on other peoples' IP then somebody needs to be held accountable when they commit plagiarism.
Otherwise, im sure Microsoft won't mind my new "gamemaker AI" that i trained on that new halo game last year, or this "OS AI" that I trained on windows 11.
By the means of Artificial INTELIGENCE, we must to accept a mind or intelligence is free to perceive external elements and use every stimulus to execute its own creative process.
The world is a perpetual iteration cycle amongst human beings. Good artists borrow, great artists steal.