TL;DR. “Local-first” has become a polite lie in personal AI. Most products marketed as local actually keep the heavy lifting in the cloud, with the cache on your device. Ostler runs every component on a single Mac, the one the customer already owns. This costs more to build, and it is the only honest answer to the privacy promise the category keeps making. There is also a quieter financial point: cloud-LLM inference is heavily subsidised today, and is widely expected to get more expensive; local inference costs the customer nothing per query, today or ever.

“Local-first” has become a polite lie.

Read the marketing pages for the current crop of personal AI products and you will find the phrase everywhere. Then read the architecture diagrams, when the company is generous enough to publish one, and you will find a quiet round-trip. Your messages are summarised on someone else’s GPU. Your photos are embedded in a managed vector service. Your “memory layer” is a paid API somewhere in us-east-1. The local part is the cache.

That is not local-first. It is local-cache-first. The difference matters, because the part that goes over the wire is the part that matters: your private words, in plain English, being read by a model running on a machine you do not own.

This post is about what we did instead, and the engineering cost of doing it that way.

What single-machine actually means

When we say Ostler runs on a single machine, we mean the obvious thing. Every component that touches your data lives on your Mac. Not “lives on your Mac and also on our servers”. Not “lives on your Mac unless the model is too big”. One Mac. The one you bought. The one in front of you.

Concretely, here is the surface that ships:

  • Qdrant. The vector database. Embeddings of every email, message, and note – indexed for semantic recall. Runs as a local process. Listens on localhost. Talks to nothing else.
  • Oxigraph. The RDF graph store. Structured triples linking people, organisations, topics, dates, places – your knowledge graph as fact, not vibes. Local process. Localhost.
  • Redis. Cache and an internal message bus for the ingest pipeline. Local process. Localhost.
  • Ollama. The model runtime. Hosts the local LLM (Qwen 3.5 9B, about 6.6GB on disk, comfortable on Apple Silicon) and the embedding model (nomic-embed-text). Inference happens on your machine’s GPU and neural cores.
  • Whisper. Speech-to-text for voice notes and call recordings. Local. The audio never leaves.
  • The wiki compiler. Builds your private wiki from the graph. Markdown out, in a folder you can open in Finder. Twenty-one page types, fully rendered locally.
  • The agent runtime. A Rust process that orchestrates retrieval, tool use, and reply. Listens to your channels (iMessage, WhatsApp, email, the iOS app) and routes to the local model.
  • An encrypted SQLite store. User state, preferences, audit log. SQLCipher, on disk, in your home directory.

That is the entire stack. There is no remote inference service. There is no hosted vector database. There is no synchronisation daemon shipping deltas to a backend. Pull the network cable out of the back of your Mac and every one of those services keeps running. The assistant keeps answering. The wiki keeps rendering. The graph keeps growing as new local sources are ingested.

The only paths off the machine are ones the customer switches on. Public web search, with no personal context attached. Apple’s own iCloud Drive if backups in the visible filesystem are wanted. Each is opt-in, each is named on the settings screen, and the default is none.

The honesty cost

This is harder to build than the hybrid alternative. It is fair to say so out loud.

Reach for a managed cloud model and the inference problem disappears. Someone else runs the data centre, you pay a per-query fee, and you do not have to think about whether the customer’s laptop can keep a sizeable model running. Reach for a hosted vector database and the retrieval problem disappears too. Reach for a graph-as-a-service and so does the relationship modelling problem. Choose all three and you are running a thin macOS shell on top of a stack of paid online services, with marketing copy that hopes nobody reads the network log.

The single-machine path means shipping the runtimes ourselves. Bundling Ollama. Picking a model that fits on a consumer Mac and still produces good tool calls. Running Qdrant as a subprocess of the Hub installer. Versioning Oxigraph. Writing the migrations. Owning every dependency in the supply chain, because there is no managed escape hatch when something breaks. It is more engineering. We think it is the engineering worth doing.

Why the alternatives leak

There are three patterns the adjacent personal-AI category uses today, and each of them leaks the part that matters.

The hybrid pattern. A local model handles “low-level tasks” (summarising, autocomplete, routing) and a remote model handles the meaningful reasoning. openhuman, one of the more honest products in this space, ships exactly this shape: local LLM for the easy stuff, thirty-plus cloud providers bundled for the hard stuff. Their own positioning page is clear about it. The architectural problem with this pattern is that “hard stuff” is the part that uses your private data. The cheap, local model gets your shopping list. The expensive, remote model gets your therapy transcript. Your hardest, most personal queries are precisely the ones that leave the machine. That inverts the privacy promise.

The cloud-workspace pattern. A graph database with agents on top, hosted in a SaaS. Cloud-shaped competitors in this category aim at knowledge workers building shared workspaces, not at individuals trying to keep their lives on their own hardware. The architectural problem is that the graph is fed from documents you upload. The product structurally cannot reach the data on your Mac the way Ostler can. To use it, you have to first hand the data over.

The multi-machine local pattern. One box for storage, another for compute, a private network between them, a homelab in your spare bedroom. NovaStation is the impressive example of this pattern: a personal AI command centre split across a MacBook, an iMac, a Dell, and a ThinkPad. The architectural problem is twofold. First, it adds a private network to maintain and secure where none was needed. Second, it asks the customer to do the job of a sysadmin. I ran a two-machine version of this myself for a year before productising. It was a wonderful research environment. It is not what we want most people to live inside.

There is one more cost worth naming, and it is financial rather than architectural. Cloud-model inference is priced today as if compute were a loss leader. Every prompt sent to a hosted model costs the provider real money, and that money is currently passed on to consumers at heavily-subsidised rates designed to build habit. Industry commentary increasingly expects those prices to rise once enough users are committed to a given product. Local inference has no per-query cost. The model runs on a machine the customer already owns; the marginal cost of one more prompt is zero, today and at every renewal date thereafter.

Single-machine sidesteps all three architectural patterns above, and the cost trajectory besides. There is no remote inference, so there is no inversion of the privacy promise. There is no cloud workspace, so there is no upload step. There is no homelab network to maintain, so there is no extra attack surface and no sysadmin tax.

Single source of truth

There is a quieter benefit to the single-machine shape that does not show up in the marketing.

When there is one machine, there is one copy of your data. Your memory is the disk in front of you, and nothing else. There is no sync conflict. There is no eventual-consistency window. There is no “the laptop says one thing and the cloud says another, which one is current?” The disk is the answer.

This sounds trivial until you have lived in the other world. Anyone who has tried to keep notes in sync across three devices, or has had a calendar event silently un-cancel itself because two replicas disagreed, knows the cost of distributed truth. Personal data is uniquely sensitive to this class of bug. The right inference on stale memory is still a wrong answer.

One machine. One copy. One source of truth. The system can be wrong about what you wanted, but it cannot be confused about what it has.

The close

The world does revolve around you.™ So should your knowledge.

Not your knowledge in someone else’s data centre, helpfully cached on your laptop. Not your knowledge in a hybrid pipeline that whispers the sensitive bits over the wire. Your knowledge, on your machine, where you can pull the cable out and watch it keep working.

That is the bar. Single machine. Single customer. Single source of truth. Everything else is local-cache-first wearing a costume.

Questions, corrections, disagreements – [email protected].