CoinRSS: Bitcoin, Ethereum, Crypto News and Price Data

  • CONTACT
  • MARKETCAP
  • BLOG
CoinRSS: Bitcoin, Ethereum, Crypto News and Price Data
  • BOOKMARKS
  • Blockchain
  • Crypto
    • Bitcoin
    • Ethereum
    • Forex
    • Tether
  • Market
    • Binance
    • Business
    • Investor
    • Money
    • Trading
  • News
    • Coinbase
    • Mining
    • NFT
    • Stocks
Reading: Zuckerberg Knowingly Used Pirated Data to Train Meta AI, Authors Allege
Share
You have not selected any currencies to display
CoinRSS: Bitcoin, Ethereum, Crypto News and Price DataCoinRSS: Bitcoin, Ethereum, Crypto News and Price Data
0
Font ResizerAa
  • Blockchain
  • Crypto
  • Market
  • News
Search
  • Blockchain
  • Crypto
    • Bitcoin
    • Ethereum
    • Forex
    • Tether
  • Market
    • Binance
    • Business
    • Investor
    • Money
    • Trading
  • News
    • Coinbase
    • Mining
    • NFT
    • Stocks
Have an existing account? Sign In
Follow US
© Foxiz News Network. Ruby Design Company. All Rights Reserved.
CoinRSS: Bitcoin, Ethereum, Crypto News and Price Data > Blog > News > Zuckerberg Knowingly Used Pirated Data to Train Meta AI, Authors Allege
News

Zuckerberg Knowingly Used Pirated Data to Train Meta AI, Authors Allege

CoinRSS
Last updated: January 12, 2025 2:23 pm
CoinRSS Published January 12, 2025
Share

Mark Zuckerberg approved using pirated books to train Meta AI, even after his own team warned the material was illegally obtained, a group of authors allege in a recent court filing.

The allegations come from a copyright infringement lawsuit filed by a group of authors including the comedian Sarah Silverman, Christopher Golden, and Richard Kadrey in a California federal court in July 2023. The group claimed Meta misused their books to train its Llama LLM, and they’re asking for damages and an injunction to stop Meta from using their works. The judge in the case dismissed most of the author’s claims in November of that same year, but these recent allegations may breathe new life into the legal dispute.

“Meta’s CEO, Mark Zuckerberg, approved Meta’s use of the LibGen dataset notwithstanding concerns within Meta’s AI executive team (and others at Meta) that LibGen is ‘a dataset we know to be pirated,'” lawyers for the plaintiffs said in a Wednesday filing. Despite these red flags, the lawsuit alleges that, “after escalation,” Zuckerberg gave the green light for Meta’s AI team to proceed with using the controversial dataset.

Representatives for Meta did not immediately respond to Decrypt’s request for comment.

LibGen, short for Library Genesis, is an online platform that provides free access to books, academic papers, articles, and other written publications without properly abiding by copyright laws. It operates as a “shadow library,” offering these materials without authorization from publishers or copyright holders. It currently hosts over 33 million books and over 85 million articles.

The lawsuit alleges Meta tried to keep this under wraps until the last possible moment. Just two hours before the fact discovery deadline on December 13, 2024, the company dumped what plaintiffs describe as “some of the most incriminating internal documents it has produced to date.”

Meta’s own engineers seemed uncomfortable with the plan, according to statements in court filings. The group of authors allege internal messages show Meta engineers hesitated to download the pirated material, with one noting that “torrenting from a [Meta-owned] corporate laptop doesn’t feel right (smile emoji).” Nevertheless, they proceeded to not only download the books but also systematically strip out copyright information to prepare them for AI training, the lawsuit claims.

The latest filings in the lawsuit paint a picture of a company fully aware of the risks: One internal memo warned that “media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, may undermine our negotiating position with regulators.” Yet Meta went ahead anyway, both downloading and distributing (or “seeding”) the pirated content through torrenting networks by January 2024, according to the lawsuit.

When questioned about these activities in a deposition, Zuckerberg appeared to distance himself from the decision, testifying that such piracy would raise “lots of red flags” and “seems like a bad thing.”

The court documents also suggest that Meta’s approach to handling copyrighted information paid more attention to model training than copyright rules. According to the filing, one engineer “filtered […] copyright lines and other data out of LibGen to prepare a CMI-stripped version of it to train Llama.” This systematic removal of copyright information could strengthen the authors’ claims that Meta knowingly tried to hide its use of pirated materials.

The revelations come at a crucial time for Meta’s AI ambitions. The company has been pushing hard to compete with OpenAI and Google in the AI space, with Llama 3.2 being the most popular open source LLM, and Meta AI being a solid free competitor to ChatGPT with similar features.

Most of these AI companies are facing legal battles due to their questionable practices when it comes to training their large language models. Meta was already sued by another group of authors for copyright infringements, OpenAI is currently facing different lawsuits for training its LLMs on copyrighted material, and Anthropic is also facing different accusations from authors and songwriters.

But in general the tech entrepreneurs and creators have been up in arms ever since generative AI exploded in popularity. There are currently dozens of different lawsuits against AI companies for willingly using copyrighted material to train their models. But as with most things on the bleeding edge, we’ll have to wait and see what the courts have to say about it all.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Source link

You Might Also Like

Australia’s DigitalX Slashes Costs, Reports 99% Annual Gain in Bitcoin Fund

Bitcoin ‘Heading Toward’ $73K Amid Macro Headwinds, Meme Coin Swoon: 10X Research

XLM’s short-term MVRV flashes caution – Time to worry?

SEC Acknowledges Bitwise’s Spot XRP ETF Filing Adding to a Crowded List

Degens, Rejoice: Meme Coins Are Not Securities, Says SEC

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Copy Link Print
Previous Article ‘Limited upside’ for Bitcoin? Here’s what Coinbase is predicting!
Next Article Bitcoin – Wait or accumulate? Look out for THESE short-term signals!
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recipe Rating




Follow US

Find US on Socials
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Subscribe to our newslettern

Get Newest Articles Instantly!

- Advertisement -
Ad image
Popular News
Nasdaq-listed EdTech firm Classover to raise $500M for Solana treasury push
BTC Price will Hit $100K before Bitcoin Sweeps $30K Lows
Crypto Bahamas: Regulations Enter Critical Stage as Gov’t Shows Interest

Follow Us on Socials

We use social media to react to breaking news, update supporters and share information

Twitter Youtube Telegram Linkedin
CoinRSS: Bitcoin, Ethereum, Crypto News and Price Data coin-rss-logo

We influence 20 million users and is the number one business blockchain and crypto news network on the planet.

Subscribe to our newsletter

You can be the first to find out the latest news and tips about trading, markets...

Ad imageAd image
© CoinRSS: Bitcoin, Ethereum, Crypto News and Price Data. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?