Mourning a Model

For a few months this year, I had a favourite LLM model. I don’t normally get attached to a model. They are tools. You pick the one that fits the job, and when a better one turns up you put the old one down without ceremony.

It was cheap, fast, slightly obscure, and open-source, and it quietly did most of the thinking across half my side projects while asking for almost nothing in return. I built it into things. I leaned on it.

Then one day it was gone. No notice, no deprecation warning, no farewell. Just an endpoint that used to answer and now didn’t. I found out the way you find out about all the important things, too late and by accident.

This is partly a warning about that. But it’s mostly a thank-you note to a model nobody else seems to have noticed.

How I found it

It arrived under a false name, which should probably have told me something about how it would leave.

Back in March two odd entries appeared on OpenRouter, the place I go to try models without committing to any of them. They had codenames instead of proper names, Hunter Alpha and Healer Alpha. No documentation, no announcement, nobody owning up to having made them. For a few days the going theory was that one of the big Chinese labs was stealth-testing its next flagship.

I did what I always do with anything new, which is throw a pile of awkward jobs at it and watch for the cracks. Output has to come back perfectly formatted on the first try, and reasoning cannot be faked with a confident tone.

The answers came back clean, properly formed, no fuss wrapped around them. The reasoning followed the actual shape of the problem instead of pattern-matching its way to a plausible shrug. I noted that it was unusually good, and then got distracted and forgot about it entirely.

It came back to me weeks later, when I was knee-deep in one of those systems that hands a problem to several models at once and lets them argue it out. I remembered those two strange codenames had been better than they had any right to be, and went looking for whatever they had grown into. That is how I found MiMo, Xiaomi’s open-source reasoning family. And it turned out the one I actually fell for was a smaller, cheaper cousin nobody was talking about: MiMo V2 Flash.

Why I kept it

Flash was not the powerful one. It was the one that fit.

Most of my projects do not want a single brilliant oracle. They want a reliable workhorse they can ask the same kind of question a few hundred times a day without drama and without quietly emptying my wallet. Flash was exactly that. It was fast enough that I stopped noticing the wait. It was cheap to the point of being a rounding error. And it was honest in a way that does not show up on a spec sheet: it did the thing I asked, in the shape I asked for, without the little rebellions. I have lost more hours than I will admit to models that simply cannot stop being helpful. Flash just answered.

There is a particular pleasure in finding something good before anyone else is talking about it. I went looking for other people running MiMo in anger and reporting back, and found essentially nobody. For a while it felt like a private discovery, which is an absurd thing to feel about a model shipped by one of the largest electronics companies on the planet, but there it is.

The morning it was gone

I cannot tell you why it went, because I do not know. Endpoints come and go. A provider drops a model, a route gets pulled, a free tier quietly closes, a name gets folded into its successor. From the outside it all looks identical: the thing that answered yesterday returns an error today, and the error does not explain itself.

The systems that lean on Flash did not fall over in a satisfying, alarm-raising heap. They just got a little worse. Decisions came back slightly emptier. The arguing got quieter, less of an argument and more of a few voices nodding along, because so much of the disagreement I depended on had simply walked out of the room. My two daftest projects, the one that picks horses and the one that picks shares spent the better part of a day running on a quietly diminished brain before I noticed anything was wrong at all.

That is the part worth flagging, and it is the same lesson my data keeps teaching me this year in different outfits. Things do not fail loudly. They fail silently, and silence reads exactly like calm right up until the moment you actually check. I had built systems that quietly depended on one cheap model for most of their thinking, and I had built precisely nothing that would tap me on the shoulder to mention the cheap model had ceased to exist. I noticed because the answers felt off.

What it leaves behind

I have patched the hole. Other models hold the seats now, some pricier and some merely adequate, and the projects are back to their usual standard of being wrong in interesting rather than alarming ways. Life goes on. It always does.

But under the small grievance about how it ended there is a sharper note I should have heard much sooner. There is a difference between depending on a model and depending on the fact that the model will keep existing. The first is unavoidable, and the entire point of building on other people’s work. The second is an assumption you make without ever deciding to, and it is the one that bit me. I had not lost an API. I had lost the thing I had silently decided would always be there.

So if there is a moral, it is the dull, durable kind. Build it so that losing any single piece is a shrug and not a wound. Watch the parts you quietly rely on, because those are exactly the ones that go dark on you. And when you stumble onto something cheap and unglamorous and quietly excellent, by all means lean on it. Just keep one eye on the door. The good ones do not always tell you they are leaving.

Here’s to you MiMo V2 Flash. The light that burns twice as bright…

None of the systems described here run on real money. They are paper-trading experiments, built to learn from and to be wrong in public with.

The views expressed in this article are my own and do not represent the views of my employer.