AI & Your Data: Who Logs, Trains On, and Can Subpoena Your Chats

Does ChatGPT train on your chats? Storage vs training, the deletion myth, trackers, subpoenas, who trains by default, and the most private way to use AI.

If you’ve ever typed something into ChatGPT you wouldn’t say out loud — a medical worry, a legal question, a relationship problem, a half-formed idea you’re ashamed of — there’s a fair question lurking underneath: where does that go, and who can read it later?

The honest answer is more complicated than a single privacy toggle. “Does ChatGPT train on your chats?” is really three separate questions wearing one coat: Is it stored? Is it used to train models? Can someone else — a lawyer, a court, an advertiser — get to it? Those have different answers, and the marketing copy blurs them on purpose. This page pulls them apart, names which providers do what by default, and gives you a concrete plan based on how private you actually need to be.

What “opt out” actually does (and doesn’t)

The single most important distinction in AI privacy is storage vs. training. They are not the same lever.

Training = your conversations get fed into the next version of the model. The “improve the model for everyone” toggle controls this.
Storage = your conversations sit on the provider’s servers regardless. This is almost always on, because it powers chat history, abuse monitoring, and legal compliance.

When you flip the “don’t use my data for training” switch, you’ve addressed the training question. You have done nothing about the storage question. Your messages are still on a server, still associated with your account, still retained, and still reachable by the provider, their staff under policy, and anyone with a legal instrument to compel them.

This is the trap. People toggle one setting, feel safe, and keep typing secrets into a system that’s still logging every word. Opt-out is a real improvement over the default — but it’s the floor, not the ceiling. The reason logging stays on even after you opt out is structural, not accidental: we unpack the business model behind it in why cloud AI logs and censors you.

The deletion myth: backups, legal holds, and the 30-day window

“I deleted the conversation, so it’s gone.” Usually false, and the reasons are baked into how production systems work.

The 30-day window. Most major providers describe a soft-delete model: you delete a chat, it disappears from your view, and the underlying data is purged from active systems within roughly 30 days. That’s a sensible policy — but read it carefully. Up to 30 days means it isn’t gone the moment you click delete.

Backups. Production databases are backed up. Backups have their own retention schedules, and “delete my account” rarely reaches into every encrypted backup snapshot instantly. The data ages out; it doesn’t vanish on command.

Legal holds. This is the big one. If a provider is under a court order to preserve data — a “litigation hold” — that order overrides their normal deletion schedule. In that situation, your deleted chats may be retained indefinitely, regardless of what your settings say, until the legal matter resolves. You won’t be notified. This isn’t hypothetical: it’s standard practice across the tech industry when litigation touches user data, and AI providers are no exception.

The takeaway: deletion is a request to the operator, not a physical guarantee. The only chat that’s truly unrecoverable is one that was never stored on someone else’s machine.

Trackers inside chatbots: the pixels you didn’t think about

Here’s a layer most people miss entirely. The conversation with the AI is one privacy surface. The website or app wrapped around it is another — and it often leaks more, faster.

Many AI products, especially consumer chatbots and AI-companion apps, embed third-party analytics and advertising trackers: Meta (Facebook) pixels, Google Analytics/Ads tags, TikTok pixels, and similar. These don’t necessarily read your message text, but they routinely report that you visited, which pages, how long, what you clicked, your device, and an advertising identifier back to those ad networks. For an AI-girlfriend or uncensored-chat app, the mere fact that you use it is sensitive data — and a tracker firing on page load can hand that signal to advertisers.

You can verify this yourself. Open the app in a browser, pop the developer tools (F12), watch the Network tab, and look for requests to facebook.com, google-analytics.com, analytics.tiktok.com, and the like. It’s an eye-opening five minutes. The general lesson holds for companion apps too: the front end you load can leak that you’re a user before you’ve typed a single word.

The general principle: a cloud product’s privacy isn’t just its data policy — it’s every third party its front end loads. Local AI has none of this surface, because there’s no website and no server call to instrument.

Can your chats be subpoenaed? The 2026 legal reality

Yes. If a chat is stored on a provider’s server, it is discoverable — meaning a court can compel the provider to produce it in litigation, and law enforcement can request it through the appropriate legal process.

The mechanics, in plain English:

Civil lawsuits can subpoena relevant records, and “relevant” is interpreted broadly. Your AI chats are records.
Criminal investigations use warrants and court orders to obtain stored communications.
Legal holds (above) mean even deleted chats can be preserved specifically so they can be produced later.
There is no recognized “AI confidentiality privilege.” Talking to a chatbot is not legally protected the way talking to a lawyer, doctor, or spouse can be. Sensitive disclosures to an AI enjoy no special shield.

The simple model to carry in your head: anything stored on a server you don’t control is potentially producible to a third party — by subpoena, warrant, breach, or rogue insider — no matter what the privacy policy says. Policies govern intended use. They don’t repeal the law of discovery. The way to make a chat un-subpoenable is to make sure it never existed anywhere but on your own hardware.

Which providers train on you by default

Defaults matter more than capabilities, because most people never change them. Here’s the landscape as of 2026, based on each provider’s published policies. Read this as general posture — exact terms change, vary by plan, and differ between consumer and enterprise/API tiers, so always confirm against the current policy for your specific account.

Provider	Trains on consumer chats by default?	Notes (per their published policies)
ChatGPT (OpenAI)	Yes, on free/Plus — opt-out available	Settings let you disable “improve the model”; business/enterprise/API tiers are documented as not trained on by default.
Gemini (Google)	Yes, with human review of sampled chats	Google’s policy describes reviewers reading sampled conversations; you can turn off Gemini Apps Activity, but reviewed samples can be retained for an extended period.
Grok (xAI)	Yes, default-on, tied to X data	Positioned as trained on platform data; opt-outs exist but the default leans toward use.
Claude (Anthropic)	Historically not trained on by default	Anthropic has publicly stated consumer chats aren’t used for training by default; recent policy updates added opt-in choices, so check your current setting.
Copilot (Microsoft)	Varies by product	Consumer Copilot and enterprise Microsoft 365 Copilot have different data terms; enterprise tiers carry stronger no-training commitments.

Two honest caveats. First, enterprise and API tiers are consistently more protective than the free consumer app — that’s an industry-wide pattern, not a quirk. Second, every “we don’t train on you” promise is still a promise: the data is on their server, governed by a policy they can revise. Which is the whole point of the next section.

The privacy-tier ladder

Not all “private” is equal. Here’s the hierarchy, weakest to strongest:

Default cloud — stored, often trained on, tracked, subpoenable. The baseline most people are on.
Cloud + opt-out — training off, but still stored, still tracked, still subpoenable. Better, not private.
Zero-retention hosted (ZDR) — the provider contractually commits not to retain your chats; they’re processed and dropped. Strong, if the operator honors it.
Fully local — the model runs on your own machine; there is no server, so there’s nothing to store, train on, track, or subpoena. The ceiling.

Each rung removes a class of risk. The jump from rung 2 to rung 3 removes retention. The jump from rung 3 to rung 4 removes trust — because you stop depending on anyone keeping a promise.

Contractual zero vs. architectural zero

This is the most important idea on the page, so it gets its own section.

Contractual zero-data-retention means a hosted service promises not to keep your data. This is genuinely valuable — a good ZDR provider gives you a fast, no-setup experience with a real commitment behind it. But notice what you’re relying on: a policy, an honest operator, and correct implementation. The data still flows through their server. The promise is the protection.

Architectural zero means there is no server to retain anything. The model runs on 127.0.0.1 — your own loopback address. No prompt ever leaves your machine. Privacy isn’t enforced by a policy you have to trust; it’s enforced by physics. There’s nothing to subpoena, nothing to breach, nothing to train on, because the data never went anywhere.

Contractual zero is a promise not to look. Architectural zero is the absence of anywhere to look.

Both are legitimate, and they serve different people. If you want zero setup and you’re comfortable trusting a clean operator, contractual ZDR is a huge upgrade over default cloud. If you want the kind of privacy that survives a subpoena, a breach, or a change of corporate heart, only the local, architectural version delivers it. This same divide — trusting a server versus owning the machine — is the throughline of why cloud AI logs and censors you.

Your action plan by threat model

Privacy isn’t one-size-fits-all. Match the tier to what you’re actually protecting against.

“I just don’t want to be ad-targeted or train the next model.” Flip the training opt-out on your provider, and use a browser with tracker blocking (uBlock Origin) so the pixels can’t phone home. Cheap, fast, meaningfully better. You’re still stored — fine for everyday questions.

“This is genuinely sensitive — health, legal, financial, personal — but I need it now and have no GPU.” Use a zero-retention hosted service, not the default consumer app. You’re trusting an operator, but a ZDR operator who doesn’t store chats is a categorically safer bet than a free tier that logs and trains.

“I need this to be un-subpoenable, un-breachable, and mine forever.” Go fully local. Install Ollama (curl -fsSL https://ollama.com/install.sh | sh), run a model with ollama run <model>, and your conversations physically never leave your machine. No retention window, no legal hold, no tracker, no policy to trust. If you want a model that also won’t refuse or lecture you, the best uncensored local models run exactly the same way — and because they’re on your disk, you set the boundaries.

The deciding question is simple: do you want privacy you have to trust, or privacy you can prove? Opt-out gives you neither. Contractual ZDR gives you the first. Only local gives you the second.

If you’ve decided you’re done trusting a server with your most private conversations, you have two honest paths forward — a hosted companion for instant, no-setup use, and a fully local one that runs entirely on your own hardware so the data has nowhere to go. Both beat the default cloud; pick the one that matches your threat model below.

AI & Your Data: Who Logs, Trains On, and Can Subpoena Your Chats

What “opt out” actually does (and doesn’t)

The deletion myth: backups, legal holds, and the 30-day window

Trackers inside chatbots: the pixels you didn’t think about

Can your chats be subpoenaed? The 2026 legal reality

Which providers train on you by default

The privacy-tier ladder

Contractual zero vs. architectural zero

Your action plan by threat model

Ember — own it

Freya — no setup

AI & Your Data: Who Logs, Trains On, and Can Subpoena Your Chats

What “opt out” actually does (and doesn’t)

The deletion myth: backups, legal holds, and the 30-day window

Trackers inside chatbots: the pixels you didn’t think about

Can your chats be subpoenaed? The 2026 legal reality

Which providers train on you by default

The privacy-tier ladder

Contractual zero vs. architectural zero

Your action plan by threat model

Ember — own it

Freya — no setup

Related guides

Why Cloud AI Censors You — and What Local AI Does Differently

Are AI Girlfriend Apps Safe? The 2026 Breach Map & Private Alternatives

Is Ollama Actually Private? What Leaves Your Machine (and the One Setting That Doesn't)