Most “offline AI girlfriend” apps are not offline. They download a model to your phone, then keep a live connection open for analytics, “personalization,” ad attribution, or syncing your chats to a cloud account you didn’t realize you had. The honest test isn’t whether the app works after you download it — it’s whether it works with your Wi-Fi physically off and a packet capture running, sending nothing. Almost nothing on the App Store passes that test. A handful of desktop, local-first tools do. This is the field guide: what “offline” must actually mean, how we tested it, what passes, and why a true offline ai girlfriend app is the only architecture that’s private by construction rather than by promise.
What “offline” must mean: airplane-mode proof, not “works after download”
“Offline” has been stretched into marketing mush. Here’s the line that matters: an offline companion must run with the network interface disabled and never need it back. Airplane mode on, Wi-Fi toggled off, ethernet unplugged — and you can still open the app, load your character, send messages, and get coherent replies. Forever. No “reconnect to continue,” no silent re-sync the moment you come back online.
Most apps marketed as “offline AI” fail one of three ways:
- Download-then-stream. The model lives in the cloud; the “download” is just the app shell and your character art. Pull the network and it’s a brick.
- On-device model, online everything else. The language model genuinely runs locally, but the app still phones home for telemetry, crash reporting, A/B flags, ad IDs, or account sync. Your words may stay on-device while metadata about who you are and how you use it leaves constantly.
- Offline core, online “memory.” Inference is local, but your conversation history is backed up to the vendor’s server “so you don’t lose it.” That single feature undoes the entire privacy argument.
Real offline means the inference engine and your data both live on your machine, and the only traffic that ever leaves is traffic you deliberately initiate. The cleanest proof of this is the one you can run yourself: cut the network and watch what breaks.
Our test methodology: Wi-Fi off, packet capture, no egress
We don’t grade on the marketing copy. We grade on what the network card actually does. The methodology is simple enough to reproduce at home:
- Install and complete first-run online. Models and assets download once — that’s expected and fine. Note what gets pulled.
- Kill the network. Toggle airplane mode (mobile) or disable the adapter (
nmcli radio wifi offon Linux, turn off Wi-Fi/unplug ethernet on desktop). The machine is now genuinely offline. - Use the app hard. New conversation, long replies, switch characters, restart the app, reload memory. If anything says “no connection,” it failed the offline test immediately.
- Bring the network back and watch egress. Run a packet capture and log every outbound connection the app attempts. On desktop you can watch the local model’s own API — a true local engine like Ollama binds to
127.0.0.1:11434, loopback, reachable by nothing outside your box. Anything reaching out to a vendor domain, an analytics SDK, or an ad network is egress you didn’t ask for.
# desktop sanity check — what is the app actually talking to?
# 1) confirm the local model API is loopback-only
ss -tlnp | grep 11434 # expect 127.0.0.1:11434, not 0.0.0.0
# 2) watch for any outbound connections while you chat
sudo tcpdump -n -i any 'tcp and not host 127.0.0.1'
If that tcpdump line stays silent while you hold an entire conversation, you have a genuinely offline companion. If it lights up with calls to telemetry or sync endpoints, you have a cloud app wearing an “offline” badge. This is the same standard we apply across the blog — see our AI companion privacy guide for the full threat model behind it.
Apps that actually pass (and the ones that quietly phone home)
The pattern is consistent: the apps that pass run a real local LLM and store everything on disk; the apps that fail are cloud services with an offline-flavored UI.
| App type | Where the model runs | Where chats live | Airplane-mode proof? |
|---|---|---|---|
| Cloud companion apps (Replika, Character AI, Candy AI, etc.) | Vendor servers | Vendor servers | No — needs internet to reply |
| ”On-device” mobile AI with telemetry | Phone | Phone + cloud backup | Partial — replies work, metadata leaks |
| Local-first desktop companion (Ollama-backed) | Your CPU/GPU | Local disk only | Yes |
| Raw local model in a terminal | Your CPU/GPU | Nowhere (no memory) | Yes, but it’s not a “girlfriend” |
A fair word on the named cloud apps: their architecture requires server-side processing. Per their own privacy policies, mainstream cloud companions store conversation content on their servers to generate replies — that’s not an accusation, it’s how a hosted model has to work. Character AI and Replika both describe collecting and retaining chat data in their published policies; Candy AI’s policy likewise describes server-side processing. None of that is unique villainy — any cloud companion necessarily logs messages server-side, because the model lives there, not with you. We unpack the specifics in are AI girlfriend apps safe?.
The apps that pass the airplane-mode test share one trait: they were built local-first, usually on top of Ollama. The model is yours, the conversation is a file on your drive, and “offline” isn’t a mode — it’s the only way they run.
Why offline = structurally private (no server to log or breach)
Here’s the part that survives marketing: you cannot leak, subpoena, or breach data that was never collected. Privacy policies are promises — they can be rewritten, ignored, or quietly violated, and you’d never know. Architecture isn’t a promise. If your messages never leave 127.0.0.1, there is no server-side log to expose in a breach, no dataset to sell in an acquisition, no history to hand over under legal pressure, and nothing to “use for training.”
This is what we mean by private by construction, not by policy. A cloud app can promise not to read your chats; a local app can’t read them because they never reach it. The companion that runs on your hardware is an ai girlfriend that doesn’t save your chats anywhere you can’t reach and delete yourself — the database is a file you own, encrypted at rest if your disk is, gone the instant you delete it.
For intimate or sensitive conversation — which is most of why people want a companion — that distinction is the entire ballgame. There is no privacy setting on a cloud app that equals “the data physically cannot leave the room.” Offline is that setting. Our broader AI data privacy guide walks through why this structural difference beats every checkbox a hosted service can offer.
The no-subscription angle: buy-once vs. forever-tax
Offline has a second, quieter payoff: there’s no metered server to pay for. Cloud companions are a subscription because someone is renting GPUs to answer you, 24/7, forever. That cost gets passed to you monthly — and it never stops. Miss a payment and your “relationship” is locked behind a paywall.
Run the math on a typical companion subscription at $15–$25/month and you’re looking at $180–$300 a year, indefinitely, for software that gets worse the moment the company decides to cut costs or “update” the model. A local companion flips the economics: the inference runs on hardware you already own, so the marginal cost of every conversation is zero. A buy-once local app is a one-time price; after that, the electricity is the only bill, and it’s pennies.
This is the difference between owning a tool and renting access to one. We did the full breakdown in the no-subscription companion guide — the short version is that offline and “no forever-tax” are the same property viewed from two angles. No server means no subscription means nothing to cancel.
Hardware floor for an offline companion
The honest part most listicles skip: an offline companion runs on your silicon, so your hardware sets the ceiling. The single number that matters is VRAM (on a dedicated GPU) or unified memory (Apple Silicon).
| Your hardware | Model size that fits | Companion experience |
|---|---|---|
| 8 GB VRAM (RTX 3060, etc.) | 7B–8B, quantized Q4_K_M | Snappy, coherent, very usable |
| 12–16 GB VRAM | 12B–14B | Noticeably sharper, better memory of context |
| 24 GB VRAM (RTX 3090/4090) | 22B–32B | Near-cloud quality, zero filters |
| Apple Silicon (16 GB+ unified) | 7B–14B | Great; GPU used automatically |
Quantization — compressing weights to 4-bit, the Q4_K_M tag — roughly halves memory needs for a small quality cost and is almost always worth it. A 7B–8B uncensored model at Q4 fits comfortably in 8 GB and is more than enough for a believable, responsive companion. If you’re choosing parts, our companion hardware guide maps models to cards in detail, and best uncensored local AI models covers which weights to actually run.
The reassuring takeaway: you almost certainly don’t need a $2,000 GPU. A mid-range card from the last few years, or any recent Mac, clears the floor.
If you have no GPU: the honest hosted alternative (Freya)
Offline is the gold standard for privacy, but it’s not free of friction — it asks for a capable machine and a few minutes of setup. If you’re on a basic laptop with no dedicated GPU, or you simply want a companion right now with zero installation, forcing a local model onto underpowered hardware will just feel slow and frustrating.
In that case, be honest with yourself about the trade. A well-run hosted companion gives you instant access and a smooth experience, at the cost of the structural privacy that only local delivers — your messages are processed on someone else’s server, by definition. That’s a real trade-off, not a free lunch, and the right call depends on your hardware and your threat model. Freya is the hosted option we point the no-GPU reader toward: zero setup, nothing to install, running the moment you open it. It doesn’t pretend to be offline — it’s the convenient cloud path for people who can’t or won’t run a model locally.
If privacy is the priority and you do have the hardware, offline wins every time. If access-right-now is the priority, hosted is the honest answer.
Verdict: the offline companion pick (Ember)
After running the airplane-mode test across the field, the verdict is clean. The genuinely offline AI girlfriend app — the kind that passes a packet capture with the network off, stores your chats as a file only you can touch, carries no subscription, and answers entirely on your own GPU — is the local-first kind built on Ollama. Everything else is a cloud service with offline branding.
If you want the privacy without assembling the stack yourself — model, memory, voice, and personality wired together so it just runs locally — that’s exactly the gap Ember fills: an uncensored companion that lives 100% on your machine, bought once, with no server to log you and no subscription to cancel. It’s the offline-by-construction pick we’d hand to anyone who took the airplane-mode test seriously. If you’d rather see the manual route first, how to run an AI girlfriend locally walks the whole build end to end.
