Market

OpenAI Legal Issues: Data Retention After Chat Deletion

When The New York Times (NYT) and a handful of other publishers filed a sweeping copyright infringement complaint against OpenAI, they also fired off a demand that OpenAI must freeze every consumer chat log and API request forever – even those that users have deleted. 

The review is prepared by Dmitry Baraishuk, a partner and Chief Innovation Officer (CINO) at Belitsoft, an AI software development company. Nowadays, businesses need custom-designed AI solutions that intelligently adapt to their data, boost sales conversions with AI recommendations, process complex data quickly for valuable insights into customers and employees, and advance accuracy in business decisions though machine learning models.

OpenAI’s public response, delivered through a series of blog posts, court filings, and Q&As, tries to walk a tightrope. It needs to reassure hundreds of millions of users that privacy remains “at the core” of its products while also persuading judges that the NYT’s data preservation request is a legal overreach. Beneath the legalese is a high-stakes clash between modern privacy norms, nineteenth-century discovery rules, and the emerging question of whether training a large language model on copyrighted text is fair use or digital trespass.

Privacy as the opening move

OpenAI reminds everyone that for years it has offered simple toggles for opting out of model 

training and straightforward ways to delete chats. Internally, engineers designed the systems so that once a conversation is deleted, the corresponding rows in the database are marked for permanent erasure within 30 days. For businesses, OpenAI markets more stringent options: Enterprise and Edu workspaces where administrators set custom retention windows, and special “Zero Data Retention” (ZDR) API endpoints that log nothing at all. The point of rehearsing these facts is not only PR – it lays the foundation for a legal argument that the NYT’s demand collides with long-standing, well-publicized privacy promises. OpenAI frames that collision in moral as well as technical terms. Abandoning deletion is not a neutral compliance step. It is a breach of trust that weakens every user’s privacy shield.

The preservation order lands

Into that curated privacy story drops a court-ordered legal hold. The plaintiffs insist that, because ChatGPT occasionally spits out something that looks suspiciously like a paragraph from a Times article, OpenAI must preserve every prompt and every completion so that discovery can later sift for infringing snippets. OpenAI’s counsel brands the request “vastly overbroad,” “unnecessary,” and “an overreach,” stressing that the plaintiffs have offered no concrete evidence that relevant logs are in danger of disappearing. Nonetheless, in U.S. civil litigation, discovery is famously expansive, and magistrate judges often issue broad preservation orders to forestall spoliation fights down the road. So, on paper at least, the hold took effect.

OpenAI reacted on multiple fronts. First, it segregated the affected data – all consumer ChatGPT chats, plus ordinary (non-ZDR) API traffic – into a dedicated, access-controlled repository. Only a “small, audited legal and security team” can touch it, and then only to satisfy the court. Second, the company filed motions seeking reconsideration and, after partial relief, appealed to a district judge. A key early victory: on May 27 the court clarified that ChatGPT Enterprise traffic is explicitly exempt – a carve-out that offers at least one privacy-first tier for corporate customers. Even so, for all other consumer services, deletion is suspended “going forward,” with no sunset clause.

Who is, and is not, affected

The hold cleanly divides OpenAI’s user base. Impacted are:

  • ChatGPT Free, Plus, and Pro individual accounts.
  • ChatGPT Team workspaces (unless they upgrade to Enterprise).
  • Every developer who uses the standard API endpoints without a ZDR addendum.

Not impacted are:

  • ChatGPT Enterprise tenants.
  • The new ChatGPT Edu tier aimed at universities.
  • API customers approved for ZDR traffic.

The distinction matters, because privacy-sensitive organizations – from hospitals to banks – can still promise their regulators that chats could be deleted. Everyone else must take OpenAI at its word that the legal-hold bucket is sealed off from analytics teams, model training pipelines, and even casual internal snooping.

Alternatives

Outside the formal filings, engineers and legal tech pundits brainstorm alternatives. Why not, they ask, store fuzzy hashes (ssdeep) or apply content-defined chunking? A hash preserves enough signal that, if a generated answer later seems suspiciously Times-like, lawyers could hash that answer too and check for a match – without ever having kept the raw user text. OpenAI hints that it floated “workable filtering schemes,” though critics retort that its proposals were thin, tardy, or opaque. One meta-theme emerges: technical literacy on the bench. Some judges (they cite William Alsup in Oracle v. Google) happily dissect API calls and write BASIC. Others glaze over when counsel mentions n-grams. Whose responsibility is it to educate the court on content-aware hashing? Opinion splits: either OpenAI failed to translate technical detail into legal English, or the plaintiffs insisted on full logs precisely because they feared clever shortcuts would thwart them.

The compliance dilemma: law versus policy

Even if OpenAI’s privacy policy proclaims 30-day deletion as a sacred norm, a court order outranks corporate policy. The company concedes as much: it “must comply because it is the law.” At the same time, it flags a latent conflict with GDPR principles. European privacy law allows litigation holds, yet regulators expect data minimization and purpose limitation. An EU watchdog could well conclude that hoarding billions of chats – most totally irrelevant to a U.S. copyright suit – fails the proportionality test. For now, OpenAI sidesteps the clash by saying the order “conflicts with our privacy standards,” without flatly calling it illegal under GDPR. The implication is plain: if Brussels and New York issue contradictory commands, fines on at least one shore are inevitable.

Court-mandated retention normalizes surveillance

Privacy advocates shrug that a legal hold is not a confidentiality guarantee. Data that exists is data that leaks. A single treasure chest containing the musings of senators, soldiers, therapists, and teenagers is catnip for hackers and governments alike. The first rule of information security is that information you never store cannot be exfiltrated. OpenAI counters that the vault is isolated, access-logged, and read-only, but critics recall countless “secure” archives that wandered onto public Internet buckets or were subpoenaed in unrelated probes. Some pessimists go further: they posit that every prompt already lands in an AWS S3 bucket mirrored to government tape libraries, because the Snowden era taught them to assume bulk collection. Whether that is paranoia or cynicism, it underscores a broader risk: court-mandated retention normalizes surveillance, because once data exists, someone eventually demands it.

The copyright battlefield

OpenAI calls the NYT suit “baseless” and “meritless” in classic defendant boilerplate. Detractors say that is corporate spin: demonstration videos show ChatGPT reciting Times passages verbatim when coaxed. Neutral observers note that U.S. copyright law has never squarely answered whether non-expressive ingestion of a text corpus to train a statistical model is fair use. Whatever the outcome, the case is a precedent setter: if wholesale scraping is ruled infringement, every LLM trained on the open web faces existential risk.  

What happens next?

OpenAI’s immediate strategy is procedural. It will press its appeal to the district judge, carve out more exceptions, or persuade the court to adopt a hash-only compromise. If it wins, normal 30-day deletion resumes, and the privacy narrative holds. If it loses, the hold could last through trial and post-trial motions  – a span measured in years.

For businesses considering ChatGPT integrations, the safest harbor remains Enterprise with ZDR. Apple reportedly insisted on ZDR behavior for the upcoming iOS integration, a deal that signals how powerful customers can impose their own retention rules. At the other extreme, ordinary users may adopt guerrilla tactics: flood ChatGPT with cookie-recipe queries or append random public domain paragraphs, hoping to dilute the intelligence value of any future subpoena.

Conclusion

OpenAI’s public stance distills to a single declarative sentence. We will obey the law, fight the overreach, and shield user privacy as best we can. For end users, convenience often trumps abstract privacy worries until a headline reminds them their secrets might live forever on a discovery server.

As the appeal winds on, one outcome is certain. Any eventual settlement or ruling will echo far beyond these litigants, shaping the default retention policies of AI platforms, the comfort level of regulators, and the risk calculus of enterprises deciding whether to build atop a cloud model or to keep their words at home. 

About the Author:

About the Author

Dmitry Baraishuk is a partner and Chief Innovation Officer at a software development company Belitsoft (a Noventiq company). He has been leading a department specializing in custom software development for 20 years. The department has hundreds of successful projects in AI software development, healthcare and finance IT consulting, application modernization, cloud migration, data analytics implementation, and more for startups and enterprises in the US, UK, and Canada.

Source: OpenAI Legal Issues: Data Retention After Chat Deletion

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button