Suing Writers Seethe at OpenAI's Excuses in Court

floofloof@lemmy.ca · 2 years ago

Suing Writers Seethe at OpenAI's Excuses in Court

mindbleach@sh.itjust.works · 2 years ago

I don’t care what works a neural network gets trained on. How else are we supposed to make one?

Should I care more about modern eternal copyright bullshit? I’d feel more nuance if everything a few decades old was public-domain, like it’s fucking supposed to be. Then there’d be plenty of slightly-outdated content to shovel into these statistical analysis engines. But there’s not. So fuck it: show the model absolutely everything, and the impact of each work becomes vanishingly small.

Models don’t get bigger as you add more stuff. Training only twiddles the numbers in each layer. There are two-gigabyte networks that have been trained on hundreds of millions of images. If you tried to store those image, verbatim, they would each weigh barely a dozen bytes. And the network gets better as that number goes down.

The entire point is to force the distillation of high-level concepts from raw data. We’ve tried doing it the smart way and we suck at it. “AI winter” and “good old-fashioned AI” were half a century of fumbling toward the acceptance that we don’t understand how intelligence works. This brute-force approach isn’t chosen for cost or ease or simplicity. This is the only approach that works.

anachronist@midwest.social · 2 years ago

Models don’t get bigger as you add more stuff.

They will get less coherent and/or “forget” the earlier data if you don’t increase the parameters with the training set.

There are two-gigabyte networks that have been trained on hundreds of millions of images

You can take a huge tiff of an image, put it through JPEG with the quality cranked all the way down and get a tiny file out the other side, which is still a recognizable derivative of the original. LLMs are extremely lossy compression of their training set.

mindbleach@sh.itjust.works · 2 years ago

which is still a recognizable derivative of the original

Not in twelve bytes.

Deep models are a statistical distillation of a metric shitload of data. Smaller models with more training on more data don’t get worse, they get more abstract - and in adversarial uses they often kick big networks’ asses.

DeathsEmbrace@lemmy.ml · 2 years ago

Which is why we shouldn’t be using something we don’t and can’t use properly.

mindbleach@sh.itjust.works · 2 years ago

Right, copyright law.

DeathsEmbrace@lemmy.ml · 2 years ago

No this will benefit capitalism and wealthiest people the most. The rest of us will suffer because of this. People can only think of the positives of AI and never the negatives this is weed all over again.

mindbleach@sh.itjust.works · 2 years ago

Motivation to discuss anything with you goes flying out the window, if you think ending marijuana prohibition is anything but positive for the common people. And you’re going to drop that turd in a completely unrelated punchbowl.

DeathsEmbrace@lemmy.ml · edit-2 2 years ago

Marijuana is always characterized as positives and people always forget the negatives in every conversation. This is the exact same shit. Weed shouldn’t even be illegal but those dumb racist white men in the 60s-80s with their paranoia decided to outlaw it. Fuck the exact doctors and psychologists that “analyzed” it said everything was bullshit so they had a professional you dumbass too. I’m not getting into racist history with you but take my first sentence as the argument.

mindbleach@sh.itjust.works · 2 years ago

Talk less.

DeathsEmbrace@lemmy.ml · 2 years ago

I should the average human is stupid.