• smoothbrain coldtakes@lemmy.ca
    link
    fedilink
    English
    arrow-up
    26
    ·
    10 months ago

    It’s also only valuable if people keep contributing to it. It’s highly likely the majority of current existing reddit data has been largely incorporated into many LLMs prior to the API access limiting. Google paying them 60 million dollars is a hilarious pittance to keep training their LLMs, given how much money AI services will likely generate off of the training data.

    I don’t actively use reddit anymore, but when I need an answer to something that isn’t programming-related, it’s usually the top source on any given web search. That kind of content is basically the only stuff I would give a shit about. I can’t imagine how much absolute garbage you have to sift through on the platform to get reliable training data. Maybe the ratio is terrible and that’s why Google paid so little.