• Lvxferre@lemmy.ml
    link
    fedilink
    arrow-up
    4
    ·
    1 year ago

    I think that image models are a completely different beast from language models, and I’m simply not informed enough about image models. So take what I’m going to say with a grain of salt.

    I think that it’s possible that image models do some sort of abstraction that resembles how humans handle images. Including modelling a third dimension not present in a 2D picture, or abstractions like foreground vs. background. If it does it or not, I don’t know.

    And unlike for language models, the image model hallucinations (e.g. people with six fingers) don’t seem to contradict the idea that the model still recognises individual objects.