Fable 5 is Terrifyingly Good

It is getting late here in Argentina on this Tuesday night, but I have spent the better part of the last six hours absolutely glued to my screen. I did not get the fancy early access codes for today’s drop. I logged in like everyone else this morning to find that the landscape had shifted again.

Anthropic just released Fable 5. My initial verdict after running it through a gauntlet of complex development tasks is that this thing is an absolute monster. It is slow. It is expensive. It is also happily churning through every ridiculous, convoluted problem I can think to give it. As is usually the case when we get a new frontier model, the actual challenge is figuring out what it cannot do.

Let us get the raw specifications out of the way first.

Anthropic claims that Fable 5 performs on par with another new model dropping today called Mythos 5. The difference is entirely about corporate safety. Fable has incredibly strict guardrails designed to stop it from doing anything remotely harmful. These guardrails are apparently so sensitive that the API includes brand new mechanisms to warn developers when they hit a wall. There is even a new automated fallback feature you can trigger if your request gets rejected, allowing your application to smoothly downgrade to a different model without crashing.

Mythos 5 is the exact same brain but without the hall monitors. Anthropic explicitly states it shares all the capabilities of Fable without the safety classifiers getting in the way.

Both versions boast a massive one million token context window. They cap out at 128,000 output tokens. The knowledge cutoff is relatively fresh, sitting at January 2026.

Then we get to the price. Using these tools will cost you exactly twice as much as you were paying for the Opus 4 series. You are looking at ten dollars for every million input tokens and fifty dollars for every million output tokens. They are not charging a premium for utilizing the deep ends of that context window, but at those base rates, your wallet is going to feel the burn regardless. The upgrade documentation is surprisingly light, but the price tag speaks volumes.

The best way I can describe using Fable is that it just feels incredibly heavy. I do not mean that purely in terms of latency or the hit to your bank account. I mean it in terms of how much raw information is sitting inside its brain.

Nobody publishes official parameter counts anymore. We are all left reading the tea leaves to figure out how big these models actually are. One of the best proxies we have for model size is obscure trivia. If a network has enough space to permanently memorize highly specific, localized details about the world, it usually means it has an absurdly high parameter count.

I decided to test its memory against Opus 4.8. I asked both systems to list out all of Simon Willison's open source projects in reverse chronological order, complete with rough release dates. I even left a typo in the prompt just to see how they handled it. Neither model had search access turned on.

Opus 4.8 gave me what we have come to expect from the previous generation. It started with a polite disclaimer about not having a reliable, verified list and not wanting to fabricate entries. It then gave me a decent but short rundown of the major hits. It named LLM from around 2023, Datasette from 2017, sqlite-utils, and Django. It vaguely mentioned that the developer maintains a large ecosystem of plugins and smaller utilities. It was accurate but shallow.

Fable 5 did not flinch. It politely corrected my typo, noted that a truly complete list was impossible due to the sheer volume of repositories, and then proceeded to spit out an incredibly granular, timeline-perfect catalog.

It started with files-to-prompt in April 2024 and a Datasette extraction plugin from the same year. It correctly identified the LLM library drop in mid-2023 along with the explosion of specific plugins that followed. It pulled up symbex and strip-tags. It knew about the browser-based WebAssembly version of Datasette from May 2022. It caught the shot-scraper tool and the S3 credentials CLI. It traced the entire Dogsheep analytics suite back to 2019. It even remembered early, obscure tools like soupselect from the late 2000s and accurately cited the Django creation era at Lawrence Journal-World between 2003 and 2005.

I should note that GPT-5.5 actually listed even more projects when I ran the same test, but the precision from Fable was striking.

In the past, I have argued that I do not actually care if a model memorizes the encyclopedia. I usually just want an AI that can manipulate text and logic cleanly while using external search tools to find facts. I did not want factoids baked into the weights.

But this kind of deep retention changes the calculus. The ability to recall a hyper-specific Python utility from 2008 means the model has a massive surface area. You can cram an unbelievable amount of structural understanding into a network that big. If a coding assistant has an encyclopedic grasp of modern library patterns, deprecated functions, and historical software architecture, it is going to crush complex refactoring tasks much faster than a smaller model that has to constantly run web searches to understand what it is looking at.

Given the speed, the pricing tier, and this bizarrely deep knowledge retrieval, I am fairly confident Fable is massive. It might be the largest commercial model currently available anywhere.

Anthropic pushed Fable 5 out across their entire product ecosystem today. You can access it through the web chat interface, their coding extensions, the CLI tool, and the Cowork platform. If you are paying for the premium subscription tiers, like the hundred dollar a month Max plan I use, you get access to the model until June 22nd. After that cutoff, it becomes a strictly metered, pay-as-you-go affair.

People still fundamentally underestimate the web interface. Since the fall of 2025, every single chat session on the site has been backed by a full container environment. It is not just a text box anymore. It is a secure sandbox that can execute scripts, install third-party packages, and clone remote repositories directly from the internet.

I decided to put this environment to the test with a side project I had been kicking around. Last week, I published a Python library designed to run a custom build of MicroPython inside WebAssembly. It acted as a sandbox for executing untrusted code. My goal for tonight was to see if Fable could figure out how to upgrade that setup from MicroPython to a full standard Python installation.

I gave it a single starting instruction. I told it to clone the repository from GitHub and research how we could transition to a full Python environment.

Fable immediately identified the correct path forward. It realized we needed to use specific custom builds maintained by Brett Cannon. The only hiccup was that the built-in container restrictions prevented the model from downloading those specific files directly from the source.

I stepped in manually, downloaded the two zip archives it needed, and uploaded them directly into the chat window.

The model took over from there. It spent several minutes just churning through operations inside its sandbox. It actually narrated its architectural decisions as it went. It tried to use a clean, single-zip approach to minimize the filesystem footprint, but it realized the core bootstrap process was failing to locate encoding files without some very messy path manipulations. It decided to pivot to a directory-preopen approach because it was more reliable for a proof of concept.

I decided to push it harder. I told it to go back and solve the single-zip problem. I wanted a highly specific output. I asked it for a complete wheel file that contained the entire system. I wanted the wrappers, the WebAssembly binaries, and the full standard library packed into one artifact so I could run a simple command line execution and instantly spin up a sandbox.

A little while later, it handed me a perfectly packaged fourteen megabyte file. I tested it locally. It worked flawlessly. It took a convoluted, incredibly frustrating environment configuration problem and solved it in less time than it takes me to drink a cup of coffee.

That was just the warm up. The real test came when I switched over to the local CLI environment.

Before I even realized today was release day, my main goal had been to add a specific feature to one of my agent applications. I wanted the software to be capable of pausing right in the middle of executing a tool call, pinging the human user for approval, and then seamlessly resuming its workflow once permission was granted.

This is a notoriously annoying problem to solve. Managing state across asynchronous agent loops while keeping the language model's context window perfectly aligned requires some serious structural gymnastics. It felt like the exact kind of meaty, complex task Fable was built for.

Over the course of the afternoon, Fable completely dismantled the problem. But it did not stop at adding the feature to my application. It looked at the foundational library powering the entire system, identified four separate architectural flaws that made pausing difficult, and decided to fix those too.

Initially, it got the feature working by using some truly ugly workarounds. The code was functional but brittle. However, the exact moment I told the model that we were allowed to modify the underlying core library instead of just hacking the client application, its entire approach shifted. It systematically unraveled its own workarounds and started building proper, documented features directly into the core library.

By the end of the session, my little stretch goal had mutated into a major software release. Fable wrote almost the entirety of the update.

It engineered a system where tool implementations could declare a specific parameter to receive their own invocation data directly. It solved a massive headache by guaranteeing a unique ID for every single tool call, automatically synthesizing one if the backend provider failed to supply it. It created a custom exception class designed specifically to pause a chain of execution cleanly, ensuring that no phantom model calls were made with placeholder data while waiting for human input.

It completely overhauled the failure semantics for concurrent tool execution. It ensured that asynchronous tasks running side by side would always run to completion before a pause state was allowed to propagate up the chain. It built a mechanism to resume a chat history that ended in unresolved calls, cleverly skipping anything that already had a registered result. It even found and fixed a silent bug in the asynchronous executor that was dropping calls to non-existent tools without throwing an error.

I spent several hours reviewing what it produced. The quality of the API design is staggering. The test coverage is thorough. The documentation it generated is clear and concise. It achieved in one afternoon what would have easily taken me several days of focused, uninterrupted engineering time.

Which brings us to the final, unavoidable topic. The cost of doing business.

I have been using a local tracking tool recently to keep an eye on my API usage across different coding environments. Earlier in the day, I had to write a quick script to update the tool so it could comprehend Anthropic's new pricing tiers. I suspect that particular update will be rendered obsolete very soon, but I needed to know how much money I was actually burning.

After I wrapped up the massive architecture refactor, I booted up the local server to check the damage.

The dashboard generated a massive treemap showing my activity across various projects. In just under six hours of development time, I had consumed slightly over one hundred and ten dollars worth of tokens.

Because I am currently on the premium monthly plan, this specific binge was technically covered under my subscription limit. Anthropic is eating the cost for a few more weeks. But once that June cutoff hits, this level of rapid, automated software development is going to require a very real operational budget.

If you view this technology purely as an API cost, the numbers look terrifying. Burning a hundred dollars in an afternoon on automated text generation sounds absurd.

But if you view it as contracting a senior software engineer who can jump into an unfamiliar codebase, debug WebAssembly execution paths, redesign core library interfaces, write comprehensive test suites, and deliver production-ready releases in an afternoon, it is the most ridiculously underpriced labor in the history of the industry.

Fable 5 is slow. It is expensive. But right now, it is playing an entirely different game than anything else on the market