There is a version of ChatGPT that nobody can read but you. It has no usage caps, no “I can’t help with that” wall, no training on your conversations, and it costs nothing per message. The catch is that you build it yourself — and it’s genuinely a two-coffee afternoon, not a weekend. The stack is two pieces: Ollama runs the model on your hardware, and Open WebUI gives you the clean, ChatGPT-style browser interface on top of it. This guide walks the whole thing end to end, with the exact commands, what each one does, and where the rough edges are.

What you’re building: your own private ChatGPT

The mental model is simple. Ollama is the engine — a small background service that downloads open-weight language models and runs them, exposing a local API at 127.0.0.1:11434. Open WebUI is the dashboard — a self-hosted web app that looks and feels almost identical to ChatGPT (chat history, multiple models, system prompts, document upload, voice), but talks to Ollama instead of OpenAI’s servers.

Put together, you get a private ChatGPT you run yourself:

  • Nothing leaves your machine. Prompts and replies stay on localhost. No vendor logs your chats, trains on them, or hands them to anyone.
  • No subscription, no token meter. Run it ten hours a day for free; the only cost is electricity.
  • No content gatekeeper. You choose the model, including uncensored ones that won’t lecture or refuse.
  • It works offline. Once models are pulled, you can unplug the internet and keep chatting.

The trade-off is honest: quality scales with your hardware. A laptop runs small, capable models; a desktop with a 24GB GPU runs near-frontier ones. We’ll size that as we go.

Install Ollama and pull a model

Start with the engine. On macOS or Linux, one command installs Ollama and starts it as a background service:

curl -fsSL https://ollama.com/install.sh | sh

On Windows, download the installer from ollama.com and run it — same result. (For a deeper walkthrough including GPU drivers and troubleshooting, see how to run AI locally.)

Once it’s running, pull a model. ollama run downloads it on first use and drops you into a chat to confirm it works:

ollama run llama3.1:8b

Type a question, get an answer, then /bye to exit. The model now lives on disk and is served at 127.0.0.1:11434 for Open WebUI to use.

Pick your first model by VRAM, since that’s the hard limit on what fits:

Your hardwareSensible starting modelWhy
8GB VRAM / Apple M-series 16GBllama3.1:8b, qwen2.5:7bFast, fluent, fits comfortably at Q4
12–16GB VRAMqwen2.5:14b, mistral-nemoNoticeably smarter, still snappy
24GB VRAMqwen2.5:32b, gemma2:27bNear-frontier reasoning locally
No GPU (CPU only)llama3.2:3b, phi3Slower but usable for short chats

Those :8b / :14b tags are parameter counts. The download is quantized by default (typically Q4_K_M — a 4-bit compression that cuts memory roughly in half with minimal quality loss). If you’re unsure what your machine can handle, the guide to running AI locally maps machines to model sizes in detail.

Run Open WebUI with Docker (the exact command, explained)

Open WebUI ships as a Docker container, which is the cleanest way to run it — no Python environment to manage, easy to update, easy to remove. Install Docker Desktop (Mac/Windows) or Docker Engine (Linux), then run:

docker run -d \
  --network=host \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Here’s what every flag does, because you should never paste a command you don’t understand:

  • -d — detached; runs in the background so you get your terminal back.
  • --network=host — lets the container reach Ollama on the host’s 127.0.0.1:11434 directly. This is the simplest setup on Linux. Note there’s deliberately no -p here: host networking ignores published ports, so -p would have no effect. The consequence matters for the next step — Open WebUI listens on the host’s port 8080, so on Linux the UI is at :8080, not :3000. On macOS/Windows, --network=host behaves differently — instead drop it and use -p 3000:8080 to publish the port (which is what puts the UI at :3000 there), and set OLLAMA_BASE_URL=http://host.docker.internal:11434 so the container can find Ollama on the host.
  • -v open-webui:/app/backend/data — a named volume that persists your accounts, chat history, and uploaded documents. Without it, everything vanishes when the container restarts.
  • -e OLLAMA_BASE_URL=... — tells Open WebUI where Ollama lives.
  • --name open-webui — names the container so you can docker stop open-webui later.
  • --restart always — brings it back automatically after a reboot.

The cross-platform version for Mac/Windows looks like this:

docker run -d \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Now open the UI — and the URL depends on which command you actually ran:

  • On Mac/Windows (the -p 3000:8080 command): open http://localhost:3000.
  • On Linux (the --network=host command): the -p mapping doesn’t apply, so Open WebUI listens directly on the host’s port 8080 — open http://localhost:8080 instead.

The first account you create becomes the admin — there’s no cloud signup, that account lives only in your local database. You’re looking at your own private ChatGPT.

To update later: docker pull ghcr.io/open-webui/open-webui:main, then stop, remove, and re-run the container. Your volume keeps all your data.

Loading models and basic config

Inside Open WebUI, your Ollama models should already appear in the model dropdown at the top of the chat. If they don’t, go to Settings → Admin Settings → Connections and confirm the Ollama URL is reachable, then Settings → Models to pull new ones directly from the UI — type a model name like qwen2.5:14b and it downloads through Ollama without touching the terminal.

A few settings worth setting on day one:

  • System prompt — under a model’s settings or per-chat, this defines persona and behavior (“You are a blunt, technical assistant. No disclaimers.”). This is where a local model’s lack of a corporate filter really shows.
  • Temperature — lower (0.2–0.4) for factual work, higher (0.8+) for brainstorming and creative writing.
  • Default model — pin your favorite so new chats start with it.
  • Multiple models — you can register several and switch per chat, or even query two at once and compare.

That’s the whole basic loop: pick a model, set a system prompt, chat. Everything is stored in your local volume.

Adding an uncensored model

This is the part cloud tools can’t do. Hosted assistants apply a safety layer you can’t turn off; a local model does exactly what its weights do. If you want an assistant that won’t refuse legal, medical, security, fiction, or frank adult-adjacent topics, pull an uncensored or abliterated model.

“Abliterated” means the refusal direction has been surgically removed from an otherwise normal model — it keeps its intelligence but loses the reflex to say no. Pull one the same way as any other:

ollama run hf.co/<author>/<model-gguf>

Ollama can pull GGUF models straight from Hugging Face using the hf.co/... path. Browse trusted community uploads, match the quantization to your VRAM (Q4_K_M is the usual sweet spot), and set a system prompt that tells it to answer directly. For the full picture — which families exist, how abliteration works, and how to vet a download — read the rundown of the best uncensored local AI models and why cloud AI censors you. The short version: ownership of the model means ownership of its boundaries.

RAG / document upload basics

Open WebUI has retrieval-augmented generation (RAG) built in, so you can chat with your own files without any external service seeing them. Two ways to use it:

  1. Per-chat upload — click the + in the message box, attach a PDF, .txt, .md, or .docx, and ask questions about it. Open WebUI chunks the document, embeds it locally, and feeds the relevant pieces to the model.
  2. Knowledge collections — under Workspace → Knowledge, build a persistent library of documents you can attach to any chat by typing # and selecting the collection. Good for a contract pile, a codebase, research papers, or personal notes.

Because the embedding and retrieval happen inside the container on your machine, your documents never leave home — the entire reason to do this locally instead of uploading sensitive files to a cloud assistant.

Optional remote access (Tailscale) for the homelab crowd

By default your private ChatGPT is only reachable on the machine running it. If you want it on your phone or laptop while away — without exposing it to the open internet — don’t port-forward. Use Tailscale, a zero-config WireGuard mesh VPN that puts all your devices on one private network.

Install Tailscale on the host and on your phone/laptop, sign in to both with the same account, and they get stable 100.x.y.z IPs that only your devices can reach. Then browse to your host’s Tailscale IP from anywhere — using the same port the UI listens on locally: http://<host-tailscale-ip>:3000 if you ran the Mac/Windows command, or http://<host-tailscale-ip>:8080 if you ran the Linux --network=host command. Either way it’s encrypted end to end and never published to the public internet. Tailscale’s MagicDNS even lets you use a hostname instead of an IP. This keeps the whole point intact: the AI runs on your hardware, and only your devices can talk to it. The broader pattern is covered in the guide to running AI locally.

The DIY version vs done-for-you

This stack is the right answer when you want a general-purpose private assistant: a ChatGPT replacement for work, code, research, and writing that you fully control. It’s powerful, free to run, and infinitely tweakable — and it does ask you to manage Docker, models, and updates yourself.

If what you actually want is narrower — a persistent AI companion with memory, personality, and voice rather than a research console — there are two cleaner paths:

DIY (Open WebUI + Ollama)EmberFreya
Runs onYour machineYour machine (Ollama)Our hosted option
Setup~1 afternoonOne-time installInstant signup
PrivacyTotal (localhost)Total (localhost)Hosted
Best forGeneral assistant, RAG, codeOwned, uncensored companion”Want it now, no GPU”
CostFreeSold once, $49Subscription

Ember is the companion version of everything above — it runs 100% locally on Ollama, so the same privacy and no-censorship benefits apply, but it’s purpose-built as an uncensored companion with memory and a personality, bought once rather than assembled. Freya is our hosted option for the reader who wants that experience with no GPU and no terminal at all — a companion that works the moment you sign in.

Build the Open WebUI stack if you want the keys to the whole machine. If you’d rather skip the assembly and just have a private companion that already works — locally with Ember, or in the cloud with Freya — that’s the door on the other side of this guide.