Most AI-generated images with photorealistic and 3D elements have obvious defects, but I’m curious if anyone’s done some analysis on the flat cartoon-style AI images. Cartoons, comics, and 2D artwork usually aren’t meant to be photorealistic, but I can tell something is off at a glance. What exactly is it?

  • SuluBeddu@feddit.it
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    8 days ago

    I think generators have some kind of inherent style that we somehow learn to recognise

    Like sure they have learned on thousands of styles for each type of image, and you have some control of the style through prompt, but one issue with the transformer decoder model (the principles of which back almost all genAI at this point) is that at each generation step it gets the stuff generated so far as input.

    This feedback loop might induce repeated choices even on different prompts in the later stages of the generation. This is not apparent on images because they are seen all at once, but it is pretty evident on Suno (at least v3): later parts of different songs might share sounds. At least in my experiments making it generate EDM. I’m now able to spot the synth it often ends up creating.

    In terms of pictures and videos, that might be a reason generated stuff are consistently uncanny across image types.

    • DanVctr@sh.itjust.works
      link
      fedilink
      arrow-up
      3
      ·
      8 days ago

      I 2nd this, especially with Suno. As soon as a generated song comes on my Spotify, I recognize the specific synths used by the Suno model.