• Lvxferre@lemmy.ml
      link
      fedilink
      arrow-up
      4
      ·
      1 year ago

      I think that image models are a completely different beast from language models, and I’m simply not informed enough about image models. So take what I’m going to say with a grain of salt.

      I think that it’s possible that image models do some sort of abstraction that resembles how humans handle images. Including modelling a third dimension not present in a 2D picture, or abstractions like foreground vs. background. If it does it or not, I don’t know.

      And unlike for language models, the image model hallucinations (e.g. people with six fingers) don’t seem to contradict the idea that the model still recognises individual objects.