Is Ollama Actually Private? What Leaves Your Machine (and the One Setting That Doesn't)

Is Ollama really private? Local inference sends nothing — but one model suffix flips that. What leaves your machine, how to verify zero egress, and the

Short answer: yes — but only if you never type the wrong model name. Ollama runs large language models on your own hardware, and when it’s doing pure local inference, your prompts genuinely never leave the machine. The catch is that recent versions of Ollama also ship a way to run models on Ollama’s servers, and it’s invoked with a model name that differs from a local one by a single suffix. That one setting is the difference between “100% private” and “your chat just went to a datacenter.” This page walks through exactly what Ollama sends, what it doesn’t, how to prove zero egress yourself with a packet capture, and how to lock the whole thing down offline.

This matters most if you’re using a local model as a companion, a journal, a therapist-stand-in, or anything you’d never paste into a cloud chatbot. Privacy you can’t verify isn’t privacy — so let’s verify it.

The promise: local inference = no egress

When you run ollama run llama3.1 and the model is already downloaded, here’s the literal data path: your text goes to the Ollama server process listening on your own machine, the model weights (sitting on your disk) compute a response, and the tokens stream back. No part of that round trip requires a network. You can pull the Ethernet cable, turn off Wi-Fi, and keep chatting.

Per Ollama’s own FAQ, the server “binds 127.0.0.1 port 11434 by default.” That 127.0.0.1 address — loopback — is the whole story. Loopback traffic never hits your network card; the operating system shortcuts it internally. An app talking to 127.0.0.1:11434 is structurally incapable of reaching the internet, the same way talking to yourself in an empty room can’t be overheard.

Ollama’s stated position matches this architecture. Their privacy language is blunt: “Ollama runs locally. We don’t see your prompts or data when you run locally.” That’s not a marketing promise you have to take on faith — it’s a claim you can independently confirm, and we’ll do exactly that below. If you’re new to the tooling, our how to install Ollama guide and the broader how to run AI locally walkthrough cover the basics; this article assumes you’ve got it running and want to audit it.

The -cloud models trap (and how to avoid it)

Here’s the single most important thing on this page. As of late 2025, Ollama added cloud models — a preview feature that runs very large models on “datacenter-grade hardware” on Ollama’s side instead of yours. The point is legitimate: it lets people run a 120B-parameter model that would never fit in consumer VRAM.

The risk is how you opt in. Cloud models are distinguished from local ones by a -cloud suffix on the model tag. Compare:

Command	Where it runs	Does your prompt leave?
`ollama run gpt-oss:120b`	Your machine (if you have the VRAM)	No
`ollama run gpt-oss:120b-cloud`	Ollama’s cloud service	Yes
`ollama run llama3.1`	Your machine	No

One word — -cloud — flips the privacy model entirely. To Ollama’s credit, this isn’t a silent default. Running a cloud model requires you to authenticate first with ollama signin and an ollama.com account; you cannot accidentally trigger it without having signed in. Per Ollama’s docs, cloud-model requests are “automatically offloaded to Ollama’s cloud service.”

How to avoid it: never append -cloud to a model name, and if you want belt-and-suspenders assurance, simply never run ollama signin. Without an authenticated account, cloud offload isn’t available. Treat any model tag ending in -cloud as “this is a hosted API call wearing a local-looking command.” For what Ollama publicly says about handling those hosted requests, read their current privacy policy rather than trusting any third-party summary — for cloud inference you’re now in the same trust model as any cloud AI service, where retention is a policy promise, not a physical guarantee.

Does Ollama phone home? Telemetry reality

This is the question most “ollama telemetry” searches are really asking: even during normal local use, does the binary quietly send analytics?

The honest, verifiable answer: Ollama’s local inference path does not transmit your prompts or generations anywhere — and you don’t have to believe me, because a packet capture (next section) will show it directly. There’s no prompt-level telemetry riding along with your chats.

What does legitimately make outbound connections, and why:

The desktop apps check for updates. Per the FAQ, “Ollama on macOS and Windows will automatically download updates.” That’s a version check against Ollama’s update servers — metadata about your app version, not your conversations. On Linux there’s no auto-updater; you re-run the install script manually, so there’s nothing background-phoning at all.
Account/usage metadata exists only if you use cloud features. Ollama’s policy describes collecting basic account info and limited usage metadata for the cloud service. If you never sign in, you have no account and that path is moot.

So “does Ollama phone home?” has a precise answer: not with your data during local use. The only routine outbound chatter on a desktop install is an update check, and you can disable even that by blocking the binary at the firewall — covered below. None of this is your prompt content.

How to verify zero-egress with a packet capture

Don’t trust a blog. Watch the wire. Here’s how to verify Ollama’s zero egress on your own machine in about two minutes.

Option A — tcpdump (Linux/macOS). Open a terminal and start a capture that excludes loopback and your local API port, so you only see traffic that’s actually trying to leave:

sudo tcpdump -n -i any 'not host 127.0.0.1 and not port 11434'

Now, in a second terminal, run a fully local model and have a long conversation:

ollama run llama3.1

Watch the tcpdump window while you chat. If inference is truly local, you’ll see nothing generated by your prompts — no packets to ollama.com, no packets anywhere. (You may see unrelated background OS traffic; that’s your system, not Ollama.) The decisive test: a model you’ve already pulled, with airplane mode on, still answers. If it does, egress is impossible.

Option B — per-process firewall. Tools like OpenSnitch (Linux) or Little Snitch (macOS) prompt on every outbound connection attempt by process. Install one, then use Ollama normally. During local chat, the Ollama process should generate zero outbound prompts. The only time it asks to connect is when you run ollama pull (a download) or when the desktop app checks for an update — exactly the two cases we expect.

Option C — pull the plug. The lowest-tech, highest-confidence test. Disable networking entirely, then ollama run <a-model-you-already-have>. Full conversation works = no network dependency = no egress. This is the test that ends the argument.

What still touches the network (model downloads, updates)

Being precise is the whole point of an honest audit. Three things legitimately require the internet, and none of them involve your conversations:

Model downloads (ollama pull / first ollama run). Weights are fetched over HTTPS from Ollama’s registry. The FAQ notes you can route these through a proxy with HTTPS_PROXY, and explicitly warns to “avoid setting HTTP_PROXY” because pulls use HTTPS only. This is a one-time download per model — what travels is the model to you, never your data away from you.
Update checks (macOS/Windows desktop apps). Version metadata, as described above.
Cloud models, if and only if you opted in. The -cloud path. By definition this sends prompts out — that’s the feature.

That’s the complete list. Everything else — every token of every local chat — stays on 127.0.0.1. For a fuller mental model of which architectures can and can’t leak, our AI data privacy guide maps the trust boundaries end to end.

Locking it down fully offline

If you want Ollama to be provably airtight — no update pings, no possibility of accidental cloud offload — here’s the hardening checklist:

Pre-pull every model, then go offline. Download what you need once (ollama pull <model>), then operate with networking disabled or with the Ollama binary firewalled. Cached models run forever without a connection.
Never run ollama signin. No account, no cloud offload path. Simple and total.
Block the binary at the firewall. Add an egress rule denying outbound connections for the Ollama process (OpenSnitch, Little Snitch, or a plain ufw/pf rule). Downloads will fail until you re-allow it — which is the point; you re-enable only to pull a new model, then re-block.
Keep the bind address on loopback. Leave OLLAMA_HOST at its 127.0.0.1 default unless you have a specific reason to expose the API to your LAN. If you do expose it, you’re now responsible for who on the network can reach port 11434.
Audit your model names. Before running anything, eyeball the tag. No -cloud, no surprise. If you compare runtimes, Ollama vs LM Studio vs Jan breaks down how each handles networking and cloud features differently.

Do all five and Ollama is a sealed box: your prompts physically cannot leave, because there’s no enabled path for them to take.

Why this matters for a private companion

For coding help, the stakes of a leaked prompt are low. For a companion — the thing you talk to at 2am, the running journal of your actual inner life — the stakes are the whole ballgame. Cloud companion apps, by their architecture, necessarily store your messages server-side to function; that’s not an accusation about any one product, it’s just what “the model runs on their computer” means. Retention, training use, and breach exposure then become policy promises you can’t independently verify, which is also a big part of why cloud AI censors you — a model someone else hosts answers to their rules, not yours.

Local inference inverts that completely. There’s no server to subpoena, no logs to leak, no terms-of-service change that retroactively claims your conversations. The packet capture you ran above isn’t a one-time stunt — it’s the kind of guarantee a cloud app structurally cannot offer you, at any price.

The vetted, egress-audited option (Ember)

Ollama gives you the engine and the proof tools. But wiring an engine into an actual companion — personality, persistent memory, a clean interface — and making sure that whole stack respects the same zero-egress promise is its own job, and it’s easy to get one layer wrong.

If you want the privacy you just learned to verify, already assembled and audited, Ember is a one-time-purchase companion built to run 100% on your own machine through Ollama — same loopback guarantee, no account, no -cloud surprises, nothing to phone home.

Is Ollama Actually Private? What Leaves Your Machine (and the One Setting That Doesn't)

The promise: local inference = no egress

The -cloud models trap (and how to avoid it)

Does Ollama phone home? Telemetry reality

How to verify zero-egress with a packet capture

What still touches the network (model downloads, updates)

Locking it down fully offline

Why this matters for a private companion

The vetted, egress-audited option (Ember)

Don't want to assemble it yourself?

Related guides

Is Kindroid Safe and Private? An Honest 2026 Review

Why Cloud AI Censors You — and What Local AI Does Differently

Is Nomi AI Private? What Its Memory Feature Means for Your Data