Amazon flouted copyright laws to win AI arms race - former AI lead

It appears that the Amazon team working on improving Alexa's search results and AI functionality has been illegally using copyrighted data for training purposes. (Image source: Amazon - edited)

Dr Viviane Ghaderi, an ex-Amazon executive and lead AI engineer, has revealed in a California lawsuit that she was told to ignore company protocol and copyright law in order to beat the competition in an AI arms race.

Julian van der Merwe (traduit par Ninh Duy), Publié 04/26/2024 🇺🇸 🇪🇸 ...

Amazon isn't the first or the last tech giant to hop onto the AI train, but it looks like it might be in some hot water after an ex-executive alleged Amazon of flouting copyright law and internal guidance in a bid to speed up the development of an AI it planned to use for Alexa.

In a recent California lawsuit (via The Register) [PDF] filed by Dr Viviane Ghaderi against Amazon over a number of unrelated complaints relating to alleged workplace discrimination, retaliation, harassment, and wrongful termination, the AI researcher and former executive alleges that Amazon targetted and punished her for speaking out against the company breaking its own rules about using copyrighted training data in its AI models.

Upon her return from leave, Ms. Ghaderi conveyed to the Legal Department that her leaders had directed her to violate internal copyright policies and applicable law. Her concern was later proven reasonable when the Times sued OpenAI, Inc., Microsoft Corporation, and OpenAI’s affiliate corporations for copyright infringement.

In March 2023, Styskin met with Ms. Ghaderi to understand why Defendants were not meeting goals on a project relating to search quality on the Alexa team. Ms. Ghaderi outlined the challenges she had faced because of Amazon’s internal copyright-related policies—which she had fully complied with—and that she had met with a representative from the Legal Department to explain her concerns and the tension they posed with the direction she had received from upper management, which advised her to violate the direction from Legal. Styskin rejected Ms. Ghaderi’s concerns about Amazon’s internal policies and instructed her to ignore those policies in pursuit of better results because “everyone else”—i.e., other AI companies—“is doing it.”

One of the biggest criticisms of modern AI systems, like ChatGPT and DALL-E, is the use and alleged abuse of copyrighted material in models to train image generators and text transformers. The information and allegations divulged in these court documents, if true, confirm that Amazon, and likely other AI companies, have been knowingly disobeying copyright laws in order to train their models and get ahead of the competition.

Currently, there is a growing list of lawsuits against AI developers over data scraping. The New York Times seemingly has a pretty convincing court case open against both Microsoft and OpenAI, while Getty last year took Stability AI to court over use of its stock images. Google, on the other hand, reportedly managed to back itself into a corner after it scraped YouTube for training data.