• Communist@beehaw.org
    link
    fedilink
    arrow-up
    5
    ·
    1 year ago

    It’s not, this method of analysis is terrible, they’re just asking gpt4 to grade the responses, not actually testing anything beyond that.

    • aponigricon@beehaw.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      That doesn’t necessarily invalidate the point they’re making. Other forms of analysis, strikingly, provide pretty much completely equivalent results.