New York Times lawyers claim OpenAI accidentally deleted evidence in copyright case

Probably not intentional, but '150 person-hours' of work were still lost

Updated The New York Times has filed a letter in its copyright infringement case against OpenAI and Microsoft, alerting the court that the ChatGPT maker accidentally deleted a bunch of data that may have been evidence. 

The letter [PDF], filed yesterday in the Southern District of New York by lawyers for the Times, asserts that OpenAI engineers deleted "all of News Plaintiffs' programs and search result data" from one of two virtual machines set up for the purpose of allowing the plaintiffs to scour OpenAI training data for copyrighted material. 

The lawsuit in question was filed in late 2023, alleging that OpenAI and Microsoft used articles from the Times to train ChatGPT and other models and readily displayed the content of articles from the newspaper when asked - all without permission, the Times claimed. 

"OpenAI has provided the News Plaintiffs with two dedicated virtual machines with improved computing resources for performing their searches, and News Plaintiffs have spent an additional 150 person-hours (and even more computing hours) since November 1 searching OpenAI's training data," lawyers Ian Crosby and Steven Lieberman said in the letter. 

"While OpenAI was able to recover much of the data that it erased, the folder structure and file names of the News Plaintiffs' work product have been irretrievably lost," the document continued. "Without the folder structure and original final names, the recovered data is unreliable and cannot be used to determine where the News Plaintiffs' copied articles were used to build Defendants' models."

As a result, the plaintiffs have been forced to redo "an entire week's worth of its experts' and lawyers' work," the letter asserted. There's no assertion that OpenAI deleted the data on purpose, mind you, with the Times' lawyers saying that they "have no reason to believe [it] was intentional." 

Crosby and Lieberman did note in the letter that the incident "underscore[s] that OpenAI is in the best position to search its own datasets for the News Plaintiffs' works using its own tools and equipment," but argue OpenAI hasn't been receptive to such requests. 

"Since the last hearing, the News Plaintiffs have sent OpenAI information for OpenAI to perform two separate searches on the News Plaintiffs' behalf," claiming that the requests were sent on November 4 and 13. "To date, the News Plaintiffs have not received results from either those searches, or confirmation that OpenAI has started them." 

Because OpenAI hasn't committed to conducting searches "in a timely manner," The Times' lawyers are requesting that the court order OpenAI "to identify and admit which of the News Plaintiffs' works it used" and save it the burden of digging through the digital stacks itself. 

"We disagree with the characterizations made and will file our response soon," OpenAI told The Register, while declining to elaborate on which portion it disagreed with - the deletion claim or the non-response to query requests. 

A response hasn't been filed as of writing. ®

Updated to add on November 26

OpenAI has filed a letter [PDF] in response to The Times’ claims, alleging that not only did it not delete data, but also blaming the newspaper for the whole fiasco.

“OpenAI did not delete any evidence. What happened? plaintiffs requested a configuration change [that] resulted in removing the folder structure and some file names on one hard drive,” OpenAI’s lawyers explained. “A drive that was supposed to be used as a temporary cache for storing OpenAI data, but evidently was also used by plaintiffs to save some of their search results (apparently without backups).”

OpenAI added that the folks doing the searching on behalf of The Times had numerous issues using the provided machine, “repeatedly running flawed code that overwhelmed and crashed the file system.”

Rather than refusing to collaborate with the Times to conduct searches for copyrighted material, OpenAI also claimed it had offered to take over the search efforts “provided plaintiffs supply clear and reasonable proposals,” which it claims it hasn’t received.

“OpenAI stands ready to conduct reasonable searches,” the lawyers said. “Plaintiffs’ focus, however, must shift from manufacturing conflicts to achieving solutions.”

OpenAI has asked the judge to order both parties to “meet and confer on search proposals” to resolve the issue.

More about

TIP US OFF

Send us news


Other stories you might like