• A_A@lemmy.world
    link
    fedilink
    English
    arrow-up
    20
    arrow-down
    2
    ·
    2 days ago

    it’s not word completion, its so far from it :

    (…) He told the BBC of his shock when he found what it had done, given his research was not published so could not have been found by the AI system in the public domain. (…)

    (…) “It’s not just that the top hypothesis they provide was the right one,” he said. "It’s that they provide another four, and all of them made sense. “And for one of them, we never thought about it, and we’re now working on that.” (…)

    • DarkCloud@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      4
      ·
      edit-2
      2 days ago

      Assuming Open AI ect only use data from the public domain is stupid (and contrary to most news sources on the matter). He has literally no idea what the AI has trained on (not even developers know, because there’s just too much of it to be reviewed by humans). They’ve undoubtedly bought countless amounts of data that isn’t readily searchable by public engines.

      He sounds very ill informed on the matter of data collection and probably just had his info/data on a cloud service somewhere whose text was part of the trillions of terrabytes LLM have accessed and trained on.