Copyright Suit Against Meta Platforms Regarding AI Training Techniques
Prominent writers, including Michael Chabon and Sarah Silverman, are suing Meta Platforms, the parent company of Facebook and Instagram, alleging copyright infringement. The main charge is against Meta for allegedly using thousands of copyrighted books without authorization in order to train Llama, its artificial intelligence language model.
![]() | ||
| Copyright Suit Against Meta Platforms Regarding AI Training Techniques |
The company apparently moved forward with the controversial dataset even after Meta's legal team sent out strong warnings about the potential legal ramifications of using pirated books for AI training. When evidence from chat logs emerged, revealing researcher Tim Dettmers of Meta discussing the dataset's procurement in a Discord server, the situation became more complicated.
The chat logs show that Dettmers corresponded with Meta's legal department, raising questions regarding the permissibility of using book files as training data. Citing concerns about "books with active copyrights," the legal team recommended against using the material right away. Participants in the chat discussed whether training using such data could be justified by the fair use doctrine, a legal theory in the United States that shields some unapproved uses of works that are protected by copyright.
The case, which was first started in the summer, recently combined two unrelated lawsuits against Meta. A California judge dismissed a portion of the Silverman lawsuit last month, which prompted the authors to revise their claims and indicated a changing legal landscape.
Beyond Meta, the ramifications of this legal dispute could have an impact on the larger AI sector. If these lawsuits are successful, it could become more expensive to create AI models with a lot of data, which would expose businesses to more scrutiny and demands for payment from content creators. Furthermore, AI businesses like Meta may be forced by new European regulations to reveal the data that was used to train their models, putting them at further legal risk.

No comments