• Riskable@programming.dev
    link
    fedilink
    English
    arrow-up
    13
    arrow-down
    17
    ·
    edit-2
    10 days ago

    I can’t take anyone seriously that says it’s “trained on stolen images.”

    Stolen, you say? Well, I guess we’re going to have to force those AI companies to put those images back! Otherwise, nobody will be able to see them!

    …because that’s what “stolen” means. And no, I’m not being pendantic. It’s a really fucking important distinction.

    The correct term is, “copied” but that doesn’t sound quite as severe. Also, if we want to get really specific, the images are presently on the Internet. Right now. Because that’s what ImageNET (and similar) is: A database of URLs that point to images that people are offering up for free to anyone that wants on the Internet.

    Did you ever upload an image anywhere publicly, for anyone to see? Chances are someone could’ve annotated it and included it in some AI training database. If it’s on the Internet, it will be copied and used without your consent or knowledge. That’s the lesson we learned back in the 90s and if you think that’s not OK then go try to get hired by the MPAA/RIAA and you can try to bring the world back to the time where you had to pay $10 for a ringtone and pay again if you got a new phone (because—to the big media companies—copying is stealing!).

    Now that’s clear, let’s talk about the ethics of training an AI on such data: There’s none. It’s an N/A situation! Why? Because until the AI models are actually used for any given purpose they’re just data on a computer somewhere.

    What about legally? Judges have already ruled in multiple countries that training AI in this way is considered fair use. There’s no copyright violation going on… Because copyright only covers distribution of copyrighted works, not what you actually do with them (internally; like training an AI model).

    So let’s talk about the real problems with AI generators so people can take you seriously:

    • Humans using AI models to generate fake nudes of people without their consent.
    • Humans using AI models to copy works that are still under copyright.
    • Humans using AI models to generate shit-quality stuff for the most minimal effort possible, saying it’s good enough, then not hiring an artist to do the same thing.

    The first one seems impossible to solve (to me). If someone generates a fake nude and never distributes it… Do we really care? It’s like a tree falling in the forest with no one around. If they (or someone else) distribute it though, that’s a form of abuse. The act of generating the image was a decision made by a human—not AI. The AI model is just doing what it was told to do.

    The second is—again—something a human has to willingly do. If you try hard enough, you can make an AI image model get pretty close to a copyrighted image… But it’s not something that is likely to occur by accident. Meaning, the human writing the prompt is the one actively seeking to violate someone’s copyright. Then again, it’s not really a copyright violation unless they distribute the image.

    The third one seems likely to solve itself over time as more and more idiots are exposed for making very poor decisions to just “throw it at the AI” then publish that thing without checking/fixing it. Like Coca Cola’s idiotic mistake last Christmas.

      • 90s_hacker@reddthat.com
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        edit-2
        10 days ago

        I do agree with them that stealing may not be the right word, isn’t plagiarism more accurate? But plagiarism is generally considered theft so it probably doesn’t matter. I just found it really interesting that I personally haven’t really given much thought to the semantics of theft when no physical object is involved even though it’s been discussed for like centuries atp

        • Riskable@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 days ago

          Plagiarism isn’t correct either. For something to be plagiarism it needs to be both copied exactly as well as intentionally lying about the authorship (i.e. you claim you wrote something that you didn’t).

          The output of Large Language Models similar in some ways to plagiarism—when someone claims they wrote something that was actually just the output of an LLM. However, that really isn’t the same thing because an LLM isn’t a legal entity that’s capable of owning anything.

          LLMs are also just a tool. An advanced tool that can generate all sorts of texts and software but they still require a human to tell them what to do.

          If some human asks ChatGPT to write something in the style of Stephen King what even is that? That’s not against the law (you can’t copyright a writing style). It’s basically, “not a thing.” Is that even a bad thing? I honestly don’t think so because of I put myself in those same shoes: “write a comment in the style of Riskable” all I can do is 🤷. It’s of no consequence.

          I’d also argue that it’s of no consequence to authors either. What impact does it have on them? None. It doesn’t effect their book/whatever sales. It doesn’t hurt the market for their works—if anything, it makes the market for their works greater because their works won’t be total shit like the output of some LLM (LOL).

      • Riskable@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 days ago

        I find it telling that the best rebuttal anyone can come up with to my comment is to say it’s a “shit take.”

        I mean, wow.