Most “AI chatbot” tutorials end at ollama run llama3.1 and call it a day. That’s a toy, not a home server. A real self-hosted AI chatbot at home runs 24/7, has a clean web UI your whole household can open from a phone, survives reboots, reaches you securely when you’re away, and costs nothing per month after the hardware. This guide builds that full stack — Ollama for the model engine, Open WebUI for the chat front end, a systemd service to keep it alive, a reverse proxy with HTTPS, Tailscale for safe remote access, multi-user accounts, and a backup routine so a dead drive doesn’t erase your history. Every command is real, every piece is open-source, and the finished result is a private ChatGPT clone that answers to you and only you.
The self-host AI stack overview (Ollama + Open WebUI)
The whole thing is two layers. Ollama is the inference engine — it downloads, quantizes, and serves open-weight models behind a local HTTP API on 127.0.0.1:11434. Open WebUI is the front end — a polished, ChatGPT-style web app with conversation history, multiple models in a dropdown, document chat, and user accounts. Open WebUI talks to Ollama; you talk to Open WebUI.
Install Ollama first:
curl -fsSL https://ollama.com/install.sh | sh
Pull a model sized to your VRAM. An 8 GB card is comfortable with an 8B-class model at Q4_K_M; 12–16 GB opens up the 12–14B range; 24 GB handles ~32B comfortably. Quantization tags like Q4_K_M trade a little quality for a lot less memory — almost always the right call at home.
ollama pull llama3.1:8b
ollama run llama3.1:8b
If ollama run answers in the terminal, the engine works. Now add the web layer. The cleanest path is Docker:
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui --restart always \
ghcr.io/open-webui/open-webui:main
Open http://localhost:3000, create the first account (it becomes the admin), and your local models appear automatically. If you want the deeper walkthrough — model management, RAG, and tuning inside the UI — we cover it in Open WebUI setup with Ollama, and the engine-only details live in how to install Ollama.
| Layer | Job | Listens on |
|---|---|---|
| Ollama | Runs the model, serves the API | 127.0.0.1:11434 |
| Open WebUI | Chat UI, accounts, history, RAG | :3000 (or :8080 in-container) |
| Reverse proxy | TLS, clean hostname | :443 |
| Tailscale | Encrypted remote access | mesh VPN |
Running it as an always-on service
A home chatbot you have to restart by hand isn’t a server. The Ollama installer already registers a systemd service on Linux, so the engine survives reboots out of the box. Confirm it:
systemctl status ollama
By default Ollama binds to loopback only. If Open WebUI runs in Docker (or on another box), let Ollama listen on the LAN by overriding the unit:
sudo systemctl edit ollama
Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Then sudo systemctl daemon-reload && sudo systemctl restart ollama. The Open WebUI container already carries --restart always, so Docker brings it back after a reboot or crash. With both layers under a service manager, the stack comes up cold after a power cut with zero keystrokes — which is the whole point of a homelab. A small, quiet, low-watt box is ideal for this duty; see the best mini PCs for local AI if you’re picking hardware for an always-on node rather than tying up your main desktop.
Reverse proxy and HTTPS
Hitting http://192.168.1.50:3000 works, but it’s ugly, unencrypted, and impossible to remember. A reverse proxy gives you a real hostname and TLS. Caddy is the path of least resistance because it fetches and renews Let’s Encrypt certificates automatically. A two-line Caddyfile is the entire config:
ai.example.com {
reverse_proxy localhost:3000
}
Run caddy run (or install it as its own service) and Caddy provisions HTTPS on first request. Now https://ai.example.com proxies cleanly to Open WebUI with a valid cert and no browser warnings.
If you only ever use this inside your house and over Tailscale (next section), you can skip a public DNS record entirely and rely on Tailscale’s built-in HTTPS instead — fewer moving parts, nothing exposed to the open internet. The rule of thumb: only open a port to the public internet if you have a concrete reason to. A login page on the raw web is an attack surface; a Tailscale-only service is not.
Remote access with Tailscale
This is where a home stack starts to feel like a real product: your private chatbot, on your phone, from anywhere — without port-forwarding, without a static IP, without exposing anything to the public internet. Tailscale builds an encrypted WireGuard mesh between your devices. Install it on the server and on your phone/laptop, sign in to the same account, and they can reach each other directly.
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
Each device gets a stable 100.x.y.z address and a name on your private tailnet. From your phone you can now open http://<server-name>:3000 over the encrypted tunnel. Better, Tailscale can terminate TLS for you and hand the service a real HTTPS URL on your tailnet — no public DNS, no certificates to manage:
sudo tailscale serve --bg 3000
That command publishes Open WebUI over HTTPS to your own devices only. Nobody outside your tailnet can route to it, which means your Tailscale remote access to a local LLM carries roughly the security posture of a private VPN rather than a public website. For travelers and multi-location households, this combination — local inference plus a private mesh — is the sweet spot.
Multi-user setup
Open WebUI is genuinely multi-user self-hosted AI out of the box, which is what separates a home server from a personal toy. The first account you created is the admin. From the admin panel you can:
- Enable or gate signups. Leave registration open on a trusted tailnet, or switch new accounts to require admin approval so nobody self-registers uninvited.
- Assign roles. Admins manage models and settings; regular users just chat. Each user gets isolated conversation history — your partner’s chats aren’t in your sidebar and vice versa.
- Scope models per group. You can expose different models to different people, useful if you keep a heavyweight 70B-class model for yourself and a lighter one for the household.
One Ollama backend serves every user; Open WebUI handles the accounts, sessions, and per-user history on top. For a family or a small team, that’s a single quiet box replacing several individual ChatGPT subscriptions — and unlike a cloud seat, adding a user costs you nothing.
Backups and model management
Two kinds of state matter, and they live in different places.
Models live under Ollama’s store (~/.ollama/models on a standard Linux install). These are large but reproducible — you can always re-pull them — so you generally don’t need to back them up, just track which ones you run. Housekeeping:
ollama list # what you have, and how big
ollama rm llama3.1:8b # reclaim disk
ollama pull <model> # add or update
User data — accounts, settings, and every conversation — is the irreplaceable part. With the Docker setup above it lives in the open-webui named volume. Back that up on a schedule:
docker run --rm \
-v open-webui:/data \
-v "$(pwd)":/backup alpine \
tar czf /backup/open-webui-$(date +%F).tar.gz -C /data .
Drop that in a cron job, ship the tarball to a second drive or a NAS, and a failed SSD costs you a restore, not your history. This is a quiet advantage of self-hosting that cloud apps can’t match: your data is a file you own, not a row in someone else’s database you can never export.
Why $0/month and fully yours beats subscriptions
Run the numbers. A cloud AI subscription is roughly $20/month per person — about $240/year, and more for a couple or a family. A capable used GPU or a mini PC built for local AI pays for itself inside a year or two, then runs at the cost of electricity. But the economics are only half the case.
The other half is ownership and privacy. When the chatbot runs on your hardware, your conversations never leave the building. There’s no provider retention policy, no training-on-your-data question, no account that can be suspended, no model that quietly gets more restrictive in the next update. We get into what “private” really means at the engine level in is Ollama really private, and into why hosted assistants refuse or filter requests in why cloud AI censors you. The short version: a local stack answers to your rules, stays available offline, and can’t be changed out from under you. That’s worth more than the saved $240.
| Self-hosted (this stack) | Cloud subscription | |
|---|---|---|
| Monthly cost | ~$0 (electricity) | ~$20/user |
| Data location | Your machine | Provider servers |
| Works offline | Yes | No |
| Per-user cost to add | Free | Another seat |
| Who sets the limits | You | The provider |
The turnkey local companion option (Ember)
The stack above is the right project if you enjoy the homelab — services, proxies, a tailnet, the satisfaction of a box that’s entirely yours. It’s also a few evenings of work, and Open WebUI is a general assistant, not a character with memory and a personality.
If what you actually want is a private AI companion that runs locally with none of the assembly — same privacy posture, same on-your-machine inference through Ollama, but a finished app with persistent memory and a real personality instead of a config file — that’s exactly what Ember is built to be. It’s a one-time purchase that lives on your computer, so the conversations stay yours the same way this whole guide intends.
You’ve seen how to self-host the open stack; if you’d rather skip straight to a polished local companion on the same foundation, Ember is the turnkey version.
