What we actually use day to day

This list will rot. We're publishing it anyway, because the question comes up every time we sit down with a new client, and a frozen snapshot is more useful than a vague answer. Caveat applied. Onward.

For writing and thinking.

For everyday writing, drafting, and thinking-out-loud, we use a mix of ChatGPT and Gemini. ChatGPT for short-form drafting and quick reframes; Gemini when we want a second opinion or a different angle on the same prompt. Watching where they disagree is often more useful than either answer on its own. For structured tasks — extraction, classification, anything where we know the schema — whichever model is cheapest and fast enough usually wins. Quality differences flatten when the problem is well-shaped.

For code.

Our default is Claude Code running on Opus, with a long context window. We have it open in a terminal alongside the editor most of the day. We've stopped using autocomplete-style assistants as the primary interface — the agentic loop, where the tool reads files, runs the tests, and iterates, is meaningfully better than completion-by-completion. The work has shifted from writing code to specifying intent.

For research, voice, and sensitive inputs.

For research and reading, we lean on Gemini. The web integration is strong, the long context handles whole reports without summarization games, and the citations tend to land where they should. For sensitive inputs, we drop to a smaller local model — slower and worse, used anyway when the data shouldn't leave the machine.

For voice, transcription, and meeting notes, we've cycled through three tools in a year and don't have a strong recommendation. Pick anything modern. The category is unsettled and the differences are small.

What we've stopped using.

Prompt-engineering platforms that wrap a chat interface in extra UI. Most no-code AI builders. Agent frameworks heavier than a few hundred lines of code. The middle layer between "the model" and "your application" keeps thinning. The frameworks that solved real problems eighteen months ago solve smaller problems now, and adding them increases your switching cost without buying you much.

What surprised us.

Cost is no longer a meaningful constraint for most of what we build. A long agent run that would have been a budget conversation two years ago is now rounding error. We've also been surprised by how much of the work is plumbing — context windows, retrieval, evaluation, observability — and how little of it is "the prompt." The interesting work moved away from the model.

A note on commitment.

We change tools quickly when something better shows up. Anything on this list could be replaced in a quarter. The principle that stays steady: pick the smallest stack you can defend, expect to swap pieces of it often, and don't fall in love with any provider.

If we update this list, you'll see it. If it's been more than six months since the date at the top, assume the specifics are stale and the principles aren't.