What's Next, LLM Token Experiments, & When AI Magic Actually Works

2025-WK44: New chapter at 🤗, building format comparison tools, and learning what's production-ready

Nov 02, 2025

Life update: I’m joining Rocket Money as VP of AI Engineering! 🥳

You can read more about it on my blog, but the tl;dr is:

I’m leading a new engineering group that’s building with AI at Rocket Money.
I start Monday, November 3rd, 2025.
Most of the Plumb team is also joining! 🚀 🚀 🚀
I’m maintaining my 10+ year streak of being remote.

What I’ve Been Up To

Friday (10/31) was my last day of shutting down Murmur Labs/Plumb. This is also my last week of “pseudo-sabbatical” while I was figuring out what was next.

I started a Discord server for AI-forward thinkers exploring how AI transforms the way we work, build, and create. If you’re interested, reach out to me at stomatiq@crca.fyi.

I did a LOT of onboarding type things this week and I’m glad to say I’m officially starting at Rocket Money on Monday.

What I’ve Been Writing

This week was a low-velocity writing week, but I built a playground for exploring LLM token efficiency.

AI Coding & Development Tools - A comprehensive guide to AI-powered coding tools for local development.
Tokenization Experiment: Format Comparison - I wanted a way to compare LLM token efficiency across CSV, JSON (pretty/compressed), YAML, and TOON formats, so I built this little playground.
Running Claude Agents in Parallel with Git Worktrees - Learn to use git worktrees to run multiple Claude agents on separate features without conflicts.

What I’ve been thinking about

I’ve been thinking a lot about reliability this week. Not the boring kind—the kind that matters when you’re deciding whether to build with AI or just build around it.

There’s so much hype around agents and MCP right now, but after spending the week stress-testing these tools against actual workflows, I keep coming back to the same question: when does the magic actually work, and when are we just making things harder than they need to be?

This week taught me that the gap between “this is cool” and “this is ready for production” is where the real work lives.

Why companies are obsessed with browsers

Someone asked “why are AI companies so obsessed with browsers?”

The most important thing you can know about AI companies (particularly the ones building foundation models) is that the “moat”1 these companies have is largely how much of a data differentiation they have.

Building a browser gives a company access to private data that most people wouldn’t automatically connect to ChatGPT / Claude and access to their intent.

Being able to capture user intent allows these companies to differentiate by not just having data but being able to discover patterns about how we traverse the internet.

I’m a little bit suspicious

On Sunday, someone shared “Meet TOON, the Token‑Oriented Object Notation.” It was originally pitched as a superior alternative to JSON (it has 6k stars on GitHub), but I was a little suspicious:

So I ended up building a Format Tokenization Exploration to demonstrate that the only place it’s more efficient than JSON is in uniform tabular data. Here’s a preview of the comparison table I made:

It’s so important to have the ability and desire to verify claims people make on the internet. Since I shared this with the author of TOON, they’ve updated the README to reflect my findings and I really appreciate his willingness to receive feedback.

How could Apple miss AI?!

Shaan Puri tweeted: What the hell is Apple doing? *Failed to make a car, *No AI investments ,*Siri still sucks, *just releasing the same phone over and over again. How can you miss AI? how is that even possible?

My answer: Apple didn’t miss AI.

I don’t know what Apple is up to, but in most of my conversations in the past two weeks about AI, the noticing that’s come up for me is that Apple has been shipping local AI processors by shipping Silicon chips in their devices.

The day they figure out how to give users access to a really good local LLM they’ll have something OpenAI and Anthropic don’t: free GPUs for inference.

Hyperlinks and Notes

A list of things I’ve found on the internet.¹

Switch to Amp from Claude Code ↗ - I’ve been using Amp Code for a lot of my work lately and asked Quinn Slack about how how to Claude and Amp map to each other. This is a great guide for that.
Continuous Planning in Linear ↗ - I love this piece from Linear on continuous planning. It replaces the quarterly scramble with an always-current backlog of candidate projects. They’re continuously triaged (now with AI) and ready to prioritize by impact and goals. It’s interesting because it eliminates blank-page whiplash and maintains a seamless flow from discovery to planning to execution through a single project-centric workflow.
How to use Claude Code like the people who built it ↗ - A short episode (from the folks at Every) explores practical techniques, workflows, and real-world tips to get more from Claude Code. It’s interesting because you learn builder-level methods to improve productivity and write better software with AI.
A Power-user guide to Obsidian Personal Knowledge Management ↗ - A pragmatic Obsidian setup: minimal folders, consistent front‑matter properties (source, type, status), and lightweight templates, to keep notes, tasks, and daily logs organized without rigid schedules. It’s interesting because it shows how to build a durable, markdown-first workflow that scales with plugins like Bases and TaskNotes while avoiding shiny‑tool rabbit holes. Similar to my Obsidian Setup.
Cartesia Sonic-3 is a new text to speech model that “sounds human”. It has a way to express laughing and emotion, in real time. I’ve seen claims that it’s more realistic than Eleven Labs but so far in my testing, I haven’t been convinced. So far it has a “laughter” non-verbalism but I haven’t gotten it to seem natural.
Speaking of Eleven Labs, I recently explored their sound effects generator with a friend. It’s a really neat way to create sound effects and we’re planning to use the soundboard for a table top RPG experience.

The Big Pattern

This week I decided to make a big career move.

The transition has me doing something I didn’t expect: stress-testing everything I believed about building with AI.

All week I’ve been poking at agents, playing with MCP, and watching where the hype diverges from what actually works in production.

On Thursday I tried turning one of our Plumb flows into a Claude Code agent/skill and it confirmed my suspicions: agents are inconsistent and unreliable compared to a deterministic workflow.

It’s becoming the north star for how I’ll approach AI in this next chapter.

I shipped a lot through the transition.A number of articles and playground style projects.

I wrote LinkedIn posts on building with AI rather than around AI. I wrote a number of Twitter threads and articles about leveraging AI to augment how we work.

The work hasn’t stopped, but the context around it has shifted completely.

At Plumb I could feel every decision. I built the thing with my hands.

Now I’m moving into a role where I need to understand the existing landscape and establish standards and direction for others to build with.

This week was the intellectual work of translating “what I learned building” into “what we should build next”.

It became a week of figuring out which parts of the startup playbook scale and which parts need complete rethinking.

Next week the real work starts: turning this week’s insights into something a team can build with.

Stay Curious,

Chase

Moat in software is “a sustainable competitive advantage that protects a company’s long-term market share and profitability from competitors”

Curiously Chase

Discussion about this post