Presumably these companies are scraping public websites to check if their images are being used without attribution.
If someone was to take the image and run it through stable diffusion to generate a new image but using that image as a source, should this also require attribution if it was just used as a starting point?
I'm sincerely curious on peoples thoughts from both an ethical AND legal perspective (with all the usual disclaimers)
For example, one perspective is that the generated image may not resemble the original image but in a sense was used to get to that point, similar to how an artist may see a copyrighted image and decide on a creative spin on that image.
A further perspective is that stablediffusion may have been trained on copyrighted images in the first place even though it may not exactly reproduce an image in it's training corpus.
Ethically, I don't see remixing and derivative works as so different from what humans naturally do.
There're no prerequisites for fair use. I.e. licenses don't apply in such cases.
You could read more about it here: https://en.wikipedia.org/wiki/Fair_use
P.S. I am not a lawyer.
> A further perspective is that stable diffusion may have been trained on copyrighted images in the first place even though it may not exactly reproduce an image in it's training corpus.
Given that watermarked copyrighted images sometimes come up in the outputs, I'm sure that is the case. Had that been trained on copyrighted music, the whole company would be sued to the ground and the model would never be released without permission or no attribution.
So for StabilityAI, it is fine to break the copyright of digital artists and use their work without their permission, but not fine to do the same to copyrighted musicians and artists and instead generate music from public domain sources. I'm sure voice cloning requires permission from the person as well otherwise there would be more legal issues.
The same goes for Copilot training on AGPL code outside of GitHub including StackOverflow, etc. but IANAL.
[0] https://techcrunch.com/2022/10/07/ai-music-generator-dance-d...
The interesting question to me is how relevant these input images will be in the future. I've already seen demos where people whom cannot draw for shit paint some sloppy strokes in MS Paint in order to hint AI in the correct direction. This is how a kid's drawing transforms into a Hollywood-class rendered scene. If this is the future direction, we might not need "fancy" input images by human artists, and the question becomes less relevant.
The very concept of "human authorship" is going to be challenged. It's not as simple as prompt->image. People are combining AI with post processing, independently generated layers, it's all going to be a hybrid mess.
As a human artist, I can study Picasso's paintings my whole life and paint in the style of Picasso, but as long as I don't copy an existing work of Picasso then how can what I do be copyright infringement?
Copyright doesn't protect style, afaik, though IANAL.
Incidentally, human artists get copied all the time. Walk outside any major museum and you'll see endless Van Gogh imitators peddling their wares on the sidewalk. Somehow people don't get all hot and bothered about it. But when an AI does it suddenly everyone's up in arms over copyright violation.
How many human artists have copied Picasso's style? Probably thousands. How many have tried to paint like Da Vinci? Probably millions. Where is the outrage?
If it doesn’t actually matter, it doesn’t actually matter.
The easiest way to make it not matter is avoiding the gray area by creating images by established means with clear legal precedent.
Good luck.
* the fact that the AI trained on a set of images doesn't mean it's a derivative work, the same way I don't think it's a derivative work if I see a bunch of images online and that influences my painting style
* if an image was used as a direct input (img2img), then the result is a derivative work even if the image is not similar enough
BTW I don't think the same reasoning holds for all AI output, for example if Copilot basically "copy-pastes" an exact snippet of code, that's derivative of the original (unless the snippet is trivial or the only way to achieve the result, ie. non-copyrightable).
As for your question, just replace the machine learning algorithm with a human in the loop and ask yourself the same question, and you got your answer according to our norms as is.