...

Lvxferre@lemmy.ml · 1 year ago

I think that image models are a completely different beast from language models, and I’m simply not informed enough about image models. So take what I’m going to say with a grain of salt.

I think that it’s possible that image models do some sort of abstraction that resembles how humans handle images. Including modelling a third dimension not present in a 2D picture, or abstractions like foreground vs. background. If it does it or not, I don’t know.

And unlike for language models, the image model hallucinations (e.g. people with six fingers) don’t seem to contradict the idea that the model still recognises individual objects.

Even_Adder@lemmy.dbzer0.com · 1 year ago

This video gives a decent explanation of what might be going on with the hands if you’re interested.

Lvxferre@lemmy.ml · 1 year ago

Thanks - I’ll check it out.