I like Mistral, it hits the exact sweet spot between cost and my data staying in the EU, withouth a significant drop in quality, but man are their model naming conventions confusing af. They mention they have a model called Devstral 2, which is neither Codestral nor Devestral. I want to use it, but the api only lists devstral-2512, devstral-latest, devstral-medium-latest, devstral-medium-2507, devstral-small, devstral-small-2507.
I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.
It depends. The key for their vibe-cli is actually different. You need to get a separate key if you have a subscription and don't want to pay API usage prices.
I have a general impression they are not interested too much in individual devs and making it suite their workflow. They want to be a B2B company and deliver a custom workflow per company.
Or it can just be a Google like problem where a big company one part doesn't talk to the other.
But wouldn't winning devs be a neat helping point in winning b2b contacts? Or they think golf courts are enough for success? Okay they might be right here, but still they make it so confusing for no obvious reason.
In my experience devs rarely have anything to say in B2B contracts. At best they can recommend a solution to the decision maker, but in almost all deals i was a part of they didnāt have any influence on the final decision.
I wish it were otherwise but alas
> But wouldn't winning devs be a neat helping point in winning b2b contacts?
How? The largest providers that are trying to win devs are locked in a competition to get the devs to continue using the models for free!
The best way to win B2B contracts is to solve the problems that plague business, not those that plague devs. The devs are fickle, have no stickiness and will jump providers to the next free provider, to self-hosted, etc.
Selling to business using Mistral's approach is, I feel, just a good business plan.
"Giving away some credits for free, then making a loss on subscribers" is an absolutely terrible business plan.
Well different discussion, but look at the Mercosur agreement and all the opposition from farmers in the EU. They are extremely protectionist when it comes to agriculture, at least.
Well I can certainly understand them. Based on price tgey would not be able to compete and have half decent living wages so protectionism AND subsidies is a decent strategy to maintain local production which I feel allow a country / area to not lose a lever in international negociations.
Well, if every big company gets a giant EU fine for, say, preinstalling a web browser in an OS, except for EU companies, that could make it easier for the EU companies.
Well yes, but because there are approximately zero EU tech companies that can be affected by these fines and regulations there is very little political pushback against them.
In a certain sense itās a way for EU to clawback at least a small slice of all that money flowing to the US.
Apparently you aren't aware of the EU's deep regulatory protectionism and subsidies at both EU and country level. A small portion is legitimately about protecting consumers, but ultimately this stuff is all designed by and for EU industry.
Basically all economic regions get highly protectionist when it comes to key areas like agriculture, banking, steel production, energy, automotive manufacturing, etc.
On tariffs, the US is now higher, but tariffs are a tax that passes through overwhelmingly onto the consumer (by like 95%+). Given there's essentially no fully domestic US manufacturing supply chains and the US imports everything, it's a defacto VAT from the perspective of the consumer. The EU has VAT levels that are still much higher than the average US tariff level, which is a essentially a dampener on consumption.
To me it's obvious because the size of companies they are targeting (ASML being an obvious one). I think golf course marketing works well in the EU context when decisions are being made not purely on tech reasons.
> I think golf course marketing works well in the EU context when decisions are being made not purely on tech reasons.
It's not like b2b sales is more technical merit based, individual contributor led, elsewhere.
It's always the same, depending on the field individual contributors can have some flexibility on picking tools (so a developer in a mid sized company would be able to pick whatever, an accountant probably would be more constrained, meanwhile a developer at a big bank would not have any choice). But for strategic software choices, that impact the whole company, where standardisation makes sense or is even mandatory to get actual value out of it, you need to sell to high level decision makers, not individual contributors. A CTO or a VP of X can decide to buy and mandate the implementation of something as impactful, workflow changing and potentially time and money saving as a company wide AI platform. A dev can't.
you might be correct. for example, they have an intellij plugin that allows integration without the AI Assistant, but it is only available for Enterprise customers
Don't sleep on Mistral. Highly underrated as a general service LLM. Cheaper, too. Their emphasis on bespoke modelling over generalized megaliths will pay off. There are all kinds of specialized datasets and restricted access stores that can benefit from their approach. Especially in highly regulated EU.
Not everyone is obsessed with code generation. There is a whole world out there.
I also think that this is the best approach for businesses wanting to adopt AI to automate, streamline, etc their business.
The problem they have is that this is not a moat - their approach is easily reproducible.
If they can pull ahead in having the most number of pre-trained models (one for this ERP, one for that CRM, etc) and then being able to close sales to companies using these products and sell them on post-trained (give us your specific ERP customisations and we'll give you access to a model that is tailored to your business), then THAT is a moat.
But they need to do this without fanfare. Just close sales, and keep closing, basically. After all, even if other AI providers copy the process, the moat would already have been established for Mistral.
> My 2ct: Currently the moat may be that they are not US-American which is not reproducible by any of the US alternatives.
I hope you are right (I am in the process of finalising a product and one of the top-5 selling points contains "outside the jurisdiction of the US"), but in my experience, companies only pay lip service to ethics unless it hits their bottom line.
> but in my experience, companies only pay lip service to ethics unless it hits their bottom line.
Sure, Mistral AI is certainly not the market leader and probably never will be but we're not talking about being a market leader but about having a moat.
I instantly believe you when you tell me that many companies do not care. On the other hand there are companies that do. At least partially: ASML, Stellantis, AXA, BNP Paribas, the French ministry of defense, Helsing, SNCF, ... are all Mistral AI customers.
This moat doesn't seem to be much of a moat considering a non-US model doesn't even crack the top 5 by usage - except DeepSeek, which would be a strange choice for Europeans looking for data sovereignty.
> This moat doesn't seem to be much of a moat considering a non-US model doesn't even crack the top 5 by usage - except DeepSeek, which would be a strange choice for Europeans looking for data sovereignty.
Hang on, where are you getting the numbers from? I looked and I couldn't find any numbers on enterprises who opened their wallets for custom-trained models.
I looked, and because I believed that it might be a good business opportunity to explore, I did spend a bit of time trying to find numbers. I came away with the feeling that the winner in the AI space is going to be whoever successfully whitelabels their offering.
> considering a non-US model doesn't even crack the top 5 by usage
How do you measure "usage" in an enterprise/commercial context where no data on usage is available to you? I don't expect Mistral AI to make it's money on OpenRouter.
They offer self-hosted models for big corporate customers. I would also expect those serious about the security of their data to use that option.
So you would never get the usage of those customers
If you are a company based in Europe it is silly to give your data security and privacy to a company based in Europe.
If you are in Iran, you don't want to give your data to your government.
If you are in France, you don't want to give your data to your government.
etc
If you are in France, and you host your e-mails in a datacenter in Hong-Kong, well good luck for the authorities to get it.
If you host it in "secure France", on the paper you will have more privacy and laws behind you, but in reality you are jumping into the mouth of the shark.
This is why governments are promoting: "yes yes, host here don't worry, we will protect you"
This flat out isn't true. Police forces / investigative authorities have been collaborating with one another since 1923: https://en.wikipedia.org/wiki/Interpol . We have tons of examples of this working for the digital world as well (like Proton complying with Swiss legal orders at the behest of non-Swiss police forces for illegal activities in other countries).
The trick is to host your data in a country with a strong rule of law, and avoid illegal / geopolitical lines. If you're an American company hosting stuff in Russia, you can bet the GRU/SVR would be very happy to abuse it. If you're running a torrent site in Ukraine, you can bet the US would be very happy to claim extraterritorial magic jurisdiction and get you extradited from Poland.
As a French company, you're already beholden to French law and French legal decisions. "Data is hosted in Hong Kong" doesn't matter in the slightest, it only exposes you to more risk.
Mistral is still hosted on US providers, their EU centers are only in planning. Data access aside, if AWS or Azure (or Cloudflare) are ordered to pull the plug, it's still goodbye Mistral. Unless you use a third party hoster that is, or do it yourself of course - already possible.
To extend on that a little bit: they use data centers located in EU, but owned by US cloud providers.
They can still pull the plug ofc, so it's only a small difference, but still
> Except the evidence today rather points to SOTA model + harness than fine tuned models.
I have not seen that, actually. I still see most companies who want to jump into AI for the business sort of try RAG, but more often they just buy Chat accounts for their users.
The only place that harnesses appear to be used is in software development, but most companies aren't doing that either.
> Their emphasis on bespoke modelling over generalized megaliths will pay off.
Isn't the entire deal with LLMs that they are trained as megaliths? How can bespoke modelling overcome the treasure trove of knowledge that megaliths can generically bring in, even in bespoke scenarios?
ChatGPT is already a small agent that receives your message and decides which agent needs to respond. Within those, agents can have sub agents (like when it does research).
When generating images most services will have a small agent that rewrites your request and hands it off to the generative image model.
So from the treasure trove point of view, optimized agents have their place. From companies building pipelines, they also have their place.
> ChatGPT is already a small agent that receives your message and decides which agent needs to respond.
Right, but this was done to value-optimize the product, i.e. try to always give you the shittiest (cheapest) model you can bear, because otherwise people would always choose the smartest (most expensive) model for any query.
Taking away the model choice from the user introduces a lot of ways to cut down costs, but one thing it does not do is make the product give users better/more reliable answers.
> Isn't the entire deal with LLMs that they are trained as megaliths? How can bespoke modelling overcome the treasure trove of knowledge that megaliths can generically bring in, even in bespoke scenarios?
Think of it as a base model (the megalith) which then has the weights adjusted towards a specific use-case (SAP, for example).
Agreed. Iāve used their platform to train smaller, specialized models. Something I could have done in Codelab or some other tool, but their platform allows me to just upload a training set and as soon as it finishes I have a hosted model available at an endpoint. It obviously has some constraints compared to running the training yourself, but it also opens up the opportunity to way more people.
Indeed, but even for coding use cases, Vibe is more of a focused ārefactor/ write this functionā aid than āwrite me an appā and it can work locally. For me thatās a lot more valuable as an accelerator to my workflow where the developer stays in control and fully involved in the process.
It's definitely a topic of conversation in Reddit, etc... However I agree that the push to reduce US dependence by EU companies (and countries) is hampered by the fact that US stuff is already embedded (Microsoft but also Google, etc...) and that many of these companies are transnational anyway (very few European companies are solely inside the EU) and finally and most importantly just about every company will choose the option that does the job best for the right price (sovereignty is a distant second for most decision makers).
That's the public sector. I can also give examples of schools in Denmark, cities in France, education system in France, cities in Spain too, but they said "big EU companies".
While few companies announce this publicly, I know from personal experience with corporate clients that many companies are preparing for Trump to use Big Tech as a bargaining chip.
And they should. Because the US is not behaving rationally at all.
>While few companies announce this publicly, I know from personal experience with corporate clients
Well I have even more personal experience that contradicts yours, and this isn't true at all. Everyone uses Claude / Gemini / OpenAI. Mistral isn't even on the table.
Come on, compared to Google Workspace / Microsoft's whatever-it's-called-these-days, the cost of switching from one LLM provider to another is pretty much zero.
Having an option at the back of your mind is all it takes right now, until push comes to shove of course.
Not entirely, but putting more eggs in that basket would certainly be considered lack of planning. Why increase your risk even further when everyone has seen how volatile things can get quickly?
Proof: Most big EU companies use Claude or Gemini or OpenAI, not Mistral. That choice was made recently.
Things have changed in the loud echo chambers of the internet, maybe (but not really, since people were saying that EU data sovereignty was happening any time now since 2016).
I consult for various companies and have definitely seen a trend. It's not quite the rupture that some expect but clearly not nothing either. Until very recently, the risk assessment of using US providers was considered very hypothetical. Today it still doesn't feel imminent, but it does feel very real.
Of course, it will be slow and painful and Europeans will need to use their own services for them to grow and mature.
My _feeling_ is that a lot of EU/European politicians has talked a lot more about the need to be independent from the US after Trump threaten Greenland. At least in the nordic countries. Not only concerning data & privacy, but defence, communications, space etc. All areas. The wheel has started to turn. You will not see it if you look around. But in 10 years time, maybe more, Europe will have stopped depending on the US. And that will hit US hard. We pay a lot of money in services to the US.
The politicians can talk, but they needed to set up an environment that would've let a European company have a decent shot at competing with the best AI models. But they didn't. Should've thought of that before being proud of setting up those strict tech regulations.
> Proof: Most big EU companies use Claude or Gemini or OpenAI, not Mistral. That choice was made recently.
IS a statement with no supporting facts considered "proof"? Just the public list of Mistral customers (https://mistral.ai/customers) is proof alone that quite a few big EU companies are _not_ in fact using Open AI or Claude or Gemini at the strategic level.
Or OpenAI's customers, of which the only big European ones I can spot are Scania and Philips: https://openai.com/stories/
Note: I'm talking about strategic enterprise AI deployments for the company or at least a division, not individual developers being allowed to use Claude Code etc. The moat and the money will be in the former, not latter.
This sounds like an ideology based reply. Grok is underrated and I think has a better chance of long term success than most. The current growth strategy means (for me) their chat harness is not up to par for serious work.
Their API is consistently among the most used on OpenRouter. While I canāt vouch for it myself, I think this is a decent proxy for capability. You can definitely see glimmers of greatness in their chat interface, it just feels like the system prompts are focused on something that doesnāt interest me.
That's one of the possible benchmarks, not the only one. Being 59th there, on a list enriched with every variation of Model_Name X.Y (March 2025 Preview) Pro-Thinking, translates to being in the top 10 providers worldwide which is a very interesting mark of failure considering that coincidentally they're also number 1 from their economic area. If you don't know why the last part is important, go read some news.
I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.
I found it to be the best model if you want to talk about topics philosophical. It has no problems going deep and technical while other models tend to be afraid of overshooting the comprehension of the reader.
Did they make significant improvements in OCR 3? The quality I was getting from Mistral OCR 2 was nowhere near as good as what I could get from just sending the same files to Claude Sonnet via an API call.
> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.
> Post-training methods allow teams to refine model behavior for specific tasks and environments.
How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?
There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.
Pre-training mean exposing an already-trained model to more raw text like PDF extracts etc (aka continued pre-training). You wouldn't be starting from scratch, but it's still pre-training because the objective is just next token prediction of the text you expose it to.
Post-training means everything else: SFT, DPO, RL, etc. Anything that involves things like prompt/response pairs, reward models, or benefits from human feedback of any kind.
Er, then what is the "already trained" model? I thought pre-training was the gradient descent through the internet part of building foundational models.
I can imagine that, as usual, you start with a few examples and then instruct an LLM to synthesize more examples out of that, and train using that. Sounds horrible, but actually works fairly well in practice.
Mistral is doing some really great stuff lately. Sure, it's hard to compete with OpenAI and Anthropic and their models, but they are taking up some interesting takes and designing their product in unique ways.
I like a lot what they are doing and I'll be watching them a lot more closely. I'd love to work for them btw!
Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so weād probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.
How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.
You could take a model like the one referenced in the article, retool it with Forge for oh I don't know, compost, and use it to flag batches that contain too much paper for instance.
These kinds of applications would work across industries, basically anywhere where you have a documented process and can stand to have automated oversight.
You can fine tune small, very fast and cheap to run specialized models ie. to react to logs, tool use and domain knowledge, possibly removing network llm comms altogether etc.
rag basically gives the llm a bunch of documents to search thru for the answer.
What it doesn't do is make the algorithm any better. pre-training and fine-tunning improve the llm abaility to reason about your task.
For coding use cases you may want a way to search for symbols themselves or do a plain text exact match for the name of a symbol to find the relevant documents to include. There is more to searching than building a basic similarity search.
Sorry but who mentioned coding as a use-case? My comment was general and not specific to the coding use-case, and I don't understand where did you get the idea from that I am arguing that building a similarity search engine would be a substitute to the symbol-search engine or that symbol-search is inferior to the similarity-search? Please don't put words into my mouth. My question was genuine without making any presumptions.
Even with the coding use-case you would still likely want to build a similarity search engine because searching through plain symbols isn't enough to build a contextual understanding of higher-level concepts in the code.
I mentioned coding as a use case in my comment you replied to. You were asking for an example for when one wouldn't use vector search and I provided one. I did not say similarity search would be a substitute. I said that for the coding case you do not need it.
>you would still likely want to build a similarity search engine
In practice tools like Claude Code, Codex, Gemini, Kimi Code, etc are getting away with searching for code with grep / find and understanding code by loading a sufficient amount of code into the context window. It is sufficient to understand higher level concepts in the code. The extra complexity of maintaining vector database top of this is not free and requires extra complexity.
And yet your blog says you think NFTs are alive. Curious.
But seriously, RAG/retrieval is thriving. It'll be part of the mix alongside long context, reranking, and tool-based context assembly for the forseeable future.
The issue I had with RAG when I tried building our own internal chat/knowledge bot was pulling in the relevant knowledge before sending to the LLM. Domain questions like "What is Cat Block B?" are common and, for a human, provide all the context that is needed for someone to answer within our org. But vectorizing that and then finding matching knowledge produced so many false positives. I tried to circumvent that by adding custom weighting based on keywords, source (Confluence, Teams, Email), but it just seemed unreliable. This was probably a year ago and, admittedly, I was diving in head first without truly understanding RAG end to end.
Being able to just train a model on all of our domain knowledge would, I imagine, produce much better results.
I don't think RAG is dead, and I don't think NFTs have any use and think that they are completely dead.
But the OP's blog is more about ZK than about NFTs, and crypto is the only place funding work on ZK. It's kind of a devil's bargain, but I've taken crypto money to work on privacy preserving tech before and would again.
> Of course you would have to set a temperature of 0 to prevent abuse from the operator, and also assume that an operator has access to the pre-prompt
Doesn't the fact that LLM's are still non-deterministic with a 0 temperature render all of this moot? And why was I compelled to read a random blog post on the unsolved issue of validating natural language? It's a SQL injection except without a predetermined syntax to validate against, and thus a NP problem we've yet to solve.
I have no interest in anything crypto, but they are making a proposal about NFTs tied to AI (LLMs and verifiable machine learning) so they can make ownership decisions.
So it'd be alive in the making decisions sense, not in a "the technology is thriving" sense.
This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.
I think itās interesting what this approach suggests about who will profit from AI. Iām sceptical that having huge numbers of GPUs is a moat. After all, real humans ā even geniuses ā are trained on much much less data than the whole Internet. But proprietary and specialised data could very well be a moat. Itās hard to train a scientist/lawyer/analyst without reading a lot of science/law/finance. Companiesā proprietary data might encode a great deal of irreplaceable knowledge. Seems as if Mistral is taking this bet.
> Forge enables enterprises to build models that internalize their domain knowledge. Organizations can train models on large volumes of internal documentation, codebases, structured data, and operational records. During training, the model learns the vocabulary, reasoning patterns, and constraints that define that environment.
I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.
Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.
Interesting how Mistral is investing into training models for industry specific use cases. With the commoditization of intelligence by base models, they're probably looking to creating value from specialized verticals.
I find the mistral "middle" between small LMs /1T LMs compelling. Models that are sufficiently big to be performant but specialised for domains and tasks- this is what I assumed we'd always head towards.
Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely.
My bet is that the solution to continuous learning is with external storage. There is a lot of talk about context engineering - but I have not seen anyone taking context as the main bottleneck and building a system around that.
This would show that even context engineering is kind of wrong term - because context does not enter the llm in some mysterious way - it goes through prompt and the whole model of passing chat history back and forth is not the most efficient way of using the prompt limitation.
"External Storage" whatever that is can not be the same as continous learning as it does not have the strong connections/capture the interdepencies of knowledge.
That said I think we will see more efforts also on the business side to have models that can help you build a knowledge base in some kind of standardized way that the model is trained to read. Or synthesize some sort on instructions how to navigate your knowledge base.
Currently e.g. Copilot tries to navigate a hot mess of a MS knowledge graph that is very different for each company. And due to its amnesia it has to repeat the discovery in every session. No wonder that does not work. We have to either standardize or store somewhere (model, instructions) how to find information efficiently.
The key to make Copilot useful is to take the limited context problem seriously enough. There are many dimensions to it: https://zby.github.io/commonplace/notes/context-efficiency-i... and it should be the starting point for designing the systems that extensively use llms.
A knowledge base - something where the LLM knows how to find the knowledge it needs for a given task. I am working on this idea in https://zby.github.io/commonplace/
The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.
Ironically that was also the past of AI. In 2016 it was all about specialized models (not just training data, everything including architecture and model class/type) for specific tasks and that's the way things had been for a long time.
Are you suggesting that it's an aberration that from ~2019 to ~2026 the AI field has been working on general intelligence (I assume this is what you mean by "achieving benevolent knowledge")?
Personally I think it's remarkable how much a simple transformer model can do when scaled up in size. LLMs are an incredible feat of generalization. I don't see why the trajectory should change back towards specialization now.
To be more specific, I think the future is local and specialized. IBM among others thought the same way with their giant mainframe centralized computers and the original way people would utilize software in the 70s. It's an interesting parallel to today's cloud if you think about it. It's just not scalable from a resource (hardware), energy, and cost perspective. I think we're living a unique time, but it's going to change. Without continued massive funding and a pivot to sustainable, things will (and should) change.
Don't get me wrong, general intelligence will always be important and should be a part of specialist models to a degree for understanding, but it doesn't make sense to use an 800B+ parameter model to help write an email or do research on company trends. Hell, look at what China has been able to do. Qwen 3.5 9B, exceeds Claude 3.5 Haiku and nears Sonnet 3.5 levels. The 27B variation of Qwen 3.5 is superior to both in many ways and even rivals newer models. There is obviously an inherit lag behind, but we will gradually see a shift as these models become more capable.
Right now we are chasing 1-2% improvements at the cost of billions. Local are already absurdly capable (more and more by the day - same with cloud ofcourse) and smarter than most people in specific areas. To do most jobs, can we honestly say it requires a PhD or higher level understanding to perform? We're chasing something that is becoming more and more not needed from a general day to day perspective. AGI is outstanding, but not practical (at least today). I think we'll get there anyway at our current trajectory (though dangerous), but I suspect things will shift.
lol the AI-generated support reply about their own AI model is peak 2026
the naming mess is wild though. i ran into similar confusion trying to set up mistral for a side project ā ended up just guessing which endpoint was the right one
Interesting. Does this actually scale though ? I've never seen enterprises which have "internal knowledge" in proper readable form - it's often in code, and more importantly in people who wrote them.
I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.
Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).
Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.
My sense is that it sounds amazing in theory to executives who have never had to themselves look at internal data. In reality the internal knowledge base is a mix of incomplete, inaccurate, self serving lies, out of date and so on. At worst, the data is explicitly biased to hide reality from executives so the AI will look extra good to executives. Of course, a business that makes all tactical decisions based on lies is not going to do well.
Looks interesting. But how to explore or test or use? The product page (https://mistral.ai/products/forge) also does not contain anything useful. Just "Contact us"
I cannot keep up with their products, model names and releases.
What is what for? Their marketing texts do not make sense for me.
Is there a nice overview somewhere?
I am a simple stupid Le Chat user with a small mind and the Tredict MCP Server connected to it (to Le Chat, not my mind), which works ok-ish. :-)
I thought that for pretraining to work and reasoning to emerge you need internet scale data. How can forge achieve it with just internal company data (unless the said company is AT&T or something) ?
> Mistral AI has already partnered with world-leading organizations, like ASML, DSO National Laboratories Singapore, Ericsson, European Space Agency, Home Team Science and Technology Agency (HTX) Singapore, and Reply to train models on the proprietary data that powers their most complex systems and future-defining technologies.
When you can actually represent somebody like the ESA get in touch with them. Otherwise, uh, gtfo.
I like Mistral, it hits the exact sweet spot between cost and my data staying in the EU, withouth a significant drop in quality, but man are their model naming conventions confusing af. They mention they have a model called Devstral 2, which is neither Codestral nor Devestral. I want to use it, but the api only lists devstral-2512, devstral-latest, devstral-medium-latest, devstral-medium-2507, devstral-small, devstral-small-2507.
I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.
I got really lost on their site, but to help a bit according to their model page
devstral-2512 devstral-latest and devstral-medium-latest are all devstral 2 https://docs.mistral.ai/models/devstral-2-25-12
labs-devstral-small-2512 and devstral-small-latest are devstral small 2
devstral-medium-2507 is devstral 1.0
and devstral-small-2507 is devstral small 1.1
wow, thank you, this is great. I was thinking they should have a page like this, but I couldn't find myself.
I had the same experience. It's even more confusing when you want to create an API key because they are separated by product, maybe?
no, the key is actually universal, you can't choose a specific product
It depends. The key for their vibe-cli is actually different. You need to get a separate key if you have a subscription and don't want to pay API usage prices.
That's the same everywhere. At least with the Chinese coding plans
I have a general impression they are not interested too much in individual devs and making it suite their workflow. They want to be a B2B company and deliver a custom workflow per company.
Or it can just be a Google like problem where a big company one part doesn't talk to the other.
But wouldn't winning devs be a neat helping point in winning b2b contacts? Or they think golf courts are enough for success? Okay they might be right here, but still they make it so confusing for no obvious reason.
In my experience devs rarely have anything to say in B2B contracts. At best they can recommend a solution to the decision maker, but in almost all deals i was a part of they didnāt have any influence on the final decision. I wish it were otherwise but alas
> But wouldn't winning devs be a neat helping point in winning b2b contacts?
How? The largest providers that are trying to win devs are locked in a competition to get the devs to continue using the models for free!
The best way to win B2B contracts is to solve the problems that plague business, not those that plague devs. The devs are fickle, have no stickiness and will jump providers to the next free provider, to self-hosted, etc.
Selling to business using Mistral's approach is, I feel, just a good business plan.
"Giving away some credits for free, then making a loss on subscribers" is an absolutely terrible business plan.
As far as I understood the French president is pushing French most valuated companies to use Mistral. There can't be a more to down strategy :)
Also EU protectionism itself might be enough.
Like American protectionism? Heck, America even prohibits its own companies to sell to the government if the president doesn't like them enough.
Where is EU protectionist?
I feel we are way less protectionist than most other Economic Regions. Including the USA, which are very protectionist but always claim otherwise
Well different discussion, but look at the Mercosur agreement and all the opposition from farmers in the EU. They are extremely protectionist when it comes to agriculture, at least.
Yes the farmers are a very vocal and powerful minority.
They get more than 50% of their income from subsidies, are quite well off, but always find a reason to complain.
I was thinking more about stuff like "Buy American"-Regulations for public tenders. Stuff like that doesn't exist here
Well I can certainly understand them. Based on price tgey would not be able to compete and have half decent living wages so protectionism AND subsidies is a decent strategy to maintain local production which I feel allow a country / area to not lose a lever in international negociations.
Well, if every big company gets a giant EU fine for, say, preinstalling a web browser in an OS, except for EU companies, that could make it easier for the EU companies.
Every company would get fined for anticompetitive behaviour, regardless of where are based.
Well yes, but because there are approximately zero EU tech companies that can be affected by these fines and regulations there is very little political pushback against them.
In a certain sense itās a way for EU to clawback at least a small slice of all that money flowing to the US.
Why should there be pushback against antitrust measures?
It's what keeps markets alive
https://www.edpb.europa.eu/news/national-news/2023/personali...
Apparently you aren't aware of the EU's deep regulatory protectionism and subsidies at both EU and country level. A small portion is legitimately about protecting consumers, but ultimately this stuff is all designed by and for EU industry.
Basically all economic regions get highly protectionist when it comes to key areas like agriculture, banking, steel production, energy, automotive manufacturing, etc.
On tariffs, the US is now higher, but tariffs are a tax that passes through overwhelmingly onto the consumer (by like 95%+). Given there's essentially no fully domestic US manufacturing supply chains and the US imports everything, it's a defacto VAT from the perspective of the consumer. The EU has VAT levels that are still much higher than the average US tariff level, which is a essentially a dampener on consumption.
But the VAT applies to all goods regardless where they are produced. So that's not a protectionist measure
To me it's obvious because the size of companies they are targeting (ASML being an obvious one). I think golf course marketing works well in the EU context when decisions are being made not purely on tech reasons.
> being made not purely on tech reasons.
As if thatās not true in the US (not just government contracts but VC in general as well)ā¦
> I think golf course marketing works well in the EU context when decisions are being made not purely on tech reasons.
It's not like b2b sales is more technical merit based, individual contributor led, elsewhere.
It's always the same, depending on the field individual contributors can have some flexibility on picking tools (so a developer in a mid sized company would be able to pick whatever, an accountant probably would be more constrained, meanwhile a developer at a big bank would not have any choice). But for strategic software choices, that impact the whole company, where standardisation makes sense or is even mandatory to get actual value out of it, you need to sell to high level decision makers, not individual contributors. A CTO or a VP of X can decide to buy and mandate the implementation of something as impactful, workflow changing and potentially time and money saving as a company wide AI platform. A dev can't.
you might be correct. for example, they have an intellij plugin that allows integration without the AI Assistant, but it is only available for Enterprise customers
>data staying in the EU
This is really why Mistral has any support.
The models are bottom barrel, but its the best Europe has...
Although you could use Chinese models on European servers.
Don't sleep on Mistral. Highly underrated as a general service LLM. Cheaper, too. Their emphasis on bespoke modelling over generalized megaliths will pay off. There are all kinds of specialized datasets and restricted access stores that can benefit from their approach. Especially in highly regulated EU.
Not everyone is obsessed with code generation. There is a whole world out there.
I also think that this is the best approach for businesses wanting to adopt AI to automate, streamline, etc their business.
The problem they have is that this is not a moat - their approach is easily reproducible.
If they can pull ahead in having the most number of pre-trained models (one for this ERP, one for that CRM, etc) and then being able to close sales to companies using these products and sell them on post-trained (give us your specific ERP customisations and we'll give you access to a model that is tailored to your business), then THAT is a moat.
But they need to do this without fanfare. Just close sales, and keep closing, basically. After all, even if other AI providers copy the process, the moat would already have been established for Mistral.
> The problem they have is that this is not a moat - their approach is easily reproducible.
My 2ct: Currently the moat may be that they are not US-American which is not reproducible by any of the US alternatives.
> My 2ct: Currently the moat may be that they are not US-American which is not reproducible by any of the US alternatives.
I hope you are right (I am in the process of finalising a product and one of the top-5 selling points contains "outside the jurisdiction of the US"), but in my experience, companies only pay lip service to ethics unless it hits their bottom line.
> but in my experience, companies only pay lip service to ethics unless it hits their bottom line.
Sure, Mistral AI is certainly not the market leader and probably never will be but we're not talking about being a market leader but about having a moat.
I instantly believe you when you tell me that many companies do not care. On the other hand there are companies that do. At least partially: ASML, Stellantis, AXA, BNP Paribas, the French ministry of defense, Helsing, SNCF, ... are all Mistral AI customers.
Meh, I feel like we are in the "cloud is bad phase" all over again.
Companies will use US ai models without issues in a few years.
This moat doesn't seem to be much of a moat considering a non-US model doesn't even crack the top 5 by usage - except DeepSeek, which would be a strange choice for Europeans looking for data sovereignty.
> This moat doesn't seem to be much of a moat considering a non-US model doesn't even crack the top 5 by usage - except DeepSeek, which would be a strange choice for Europeans looking for data sovereignty.
Hang on, where are you getting the numbers from? I looked and I couldn't find any numbers on enterprises who opened their wallets for custom-trained models.
I looked, and because I believed that it might be a good business opportunity to explore, I did spend a bit of time trying to find numbers. I came away with the feeling that the winner in the AI space is going to be whoever successfully whitelabels their offering.
Right now that is Mistral, I think.
> considering a non-US model doesn't even crack the top 5 by usage
How do you measure "usage" in an enterprise/commercial context where no data on usage is available to you? I don't expect Mistral AI to make it's money on OpenRouter.
They offer self-hosted models for big corporate customers. I would also expect those serious about the security of their data to use that option. So you would never get the usage of those customers
If you are a company based in Europe it is silly to give your data security and privacy to a company based in Europe.
If you are in Iran, you don't want to give your data to your government.
If you are in France, you don't want to give your data to your government.
etc
If you are in France, and you host your e-mails in a datacenter in Hong-Kong, well good luck for the authorities to get it.
If you host it in "secure France", on the paper you will have more privacy and laws behind you, but in reality you are jumping into the mouth of the shark.
This is why governments are promoting: "yes yes, host here don't worry, we will protect you"
> well good luck for the authorities to get it.
"We want your data on X, here;'s a warrant."
"No."
"You are now under arrest for contempt of court."
People have some oddly silly views on what government can and can't do to people living in their territories.
And companies really really don't care if the government has their data.
> host your e-mails in a datacenter in Hong-Kong
Now China has it, gives it a competitor in China and your market share drops like a stone. Congrats! Great choice!
It's not about government but about trade secrets...
This flat out isn't true. Police forces / investigative authorities have been collaborating with one another since 1923: https://en.wikipedia.org/wiki/Interpol . We have tons of examples of this working for the digital world as well (like Proton complying with Swiss legal orders at the behest of non-Swiss police forces for illegal activities in other countries).
The trick is to host your data in a country with a strong rule of law, and avoid illegal / geopolitical lines. If you're an American company hosting stuff in Russia, you can bet the GRU/SVR would be very happy to abuse it. If you're running a torrent site in Ukraine, you can bet the US would be very happy to claim extraterritorial magic jurisdiction and get you extradited from Poland.
As a French company, you're already beholden to French law and French legal decisions. "Data is hosted in Hong Kong" doesn't matter in the slightest, it only exposes you to more risk.
Mistral is still hosted on US providers, their EU centers are only in planning. Data access aside, if AWS or Azure (or Cloudflare) are ordered to pull the plug, it's still goodbye Mistral. Unless you use a third party hoster that is, or do it yourself of course - already possible.
To extend on that a little bit: they use data centers located in EU, but owned by US cloud providers. They can still pull the plug ofc, so it's only a small difference, but still
Except the evidence today rather points to SOTA model + harness than fine tuned models.
> Except the evidence today rather points to SOTA model + harness than fine tuned models.
I have not seen that, actually. I still see most companies who want to jump into AI for the business sort of try RAG, but more often they just buy Chat accounts for their users.
The only place that harnesses appear to be used is in software development, but most companies aren't doing that either.
> Their emphasis on bespoke modelling over generalized megaliths will pay off.
Isn't the entire deal with LLMs that they are trained as megaliths? How can bespoke modelling overcome the treasure trove of knowledge that megaliths can generically bring in, even in bespoke scenarios?
ChatGPT is already a small agent that receives your message and decides which agent needs to respond. Within those, agents can have sub agents (like when it does research).
When generating images most services will have a small agent that rewrites your request and hands it off to the generative image model.
So from the treasure trove point of view, optimized agents have their place. From companies building pipelines, they also have their place.
> ChatGPT is already a small agent that receives your message and decides which agent needs to respond.
Right, but this was done to value-optimize the product, i.e. try to always give you the shittiest (cheapest) model you can bear, because otherwise people would always choose the smartest (most expensive) model for any query.
Taking away the model choice from the user introduces a lot of ways to cut down costs, but one thing it does not do is make the product give users better/more reliable answers.
> Isn't the entire deal with LLMs that they are trained as megaliths? How can bespoke modelling overcome the treasure trove of knowledge that megaliths can generically bring in, even in bespoke scenarios?
Think of it as a base model (the megalith) which then has the weights adjusted towards a specific use-case (SAP, for example).
The companies I work want onprem models, and no Chinese ones. Does mistral support onprem? ( For a price)
Agreed. Iāve used their platform to train smaller, specialized models. Something I could have done in Codelab or some other tool, but their platform allows me to just upload a training set and as soon as it finishes I have a hosted model available at an endpoint. It obviously has some constraints compared to running the training yourself, but it also opens up the opportunity to way more people.
Indeed, but even for coding use cases, Vibe is more of a focused ārefactor/ write this functionā aid than āwrite me an appā and it can work locally. For me thatās a lot more valuable as an accelerator to my workflow where the developer stays in control and fully involved in the process.
I agree. Just started using it. Can you give some examples of fields you maybe even prefer Mistral?
I use a pretty lightweight local Mistral model in LM studio for both creative and technical writing/iterating and itās fantastic.
Yes, since it's not American, it will be the de-facto choice for most big European companies.
Why would that be? Most big EU companies use ms teams or google workspace, for example.
They use those because the decision to use them was made years ago. Things have changed since then
I want to believe... but I also need proofs of that "trend", any reference I could read on please?
It's definitely a topic of conversation in Reddit, etc... However I agree that the push to reduce US dependence by EU companies (and countries) is hampered by the fact that US stuff is already embedded (Microsoft but also Google, etc...) and that many of these companies are transnational anyway (very few European companies are solely inside the EU) and finally and most importantly just about every company will choose the option that does the job best for the right price (sovereignty is a distant second for most decision makers).
Multiple Government organisations ditching Microsoft? Including entire German states?
My University also migrated to OpenExchange
That's the public sector. I can also give examples of schools in Denmark, cities in France, education system in France, cities in Spain too, but they said "big EU companies".
While few companies announce this publicly, I know from personal experience with corporate clients that many companies are preparing for Trump to use Big Tech as a bargaining chip.
And they should. Because the US is not behaving rationally at all.
https://nltimes.nl/2026/02/10/rabobank-ing-abn-amro-seek-eur...
https://www.theregister.com/2025/11/13/gartner_cio_cloud_sov...
https://www.independent.co.uk/news/world/europe/europe-zoom-...
https://www.theglobeandmail.com/business/commentary/article-...
https://sherwood.news/tech/europe-wants-to-break-up-with-us-...
>While few companies announce this publicly, I know from personal experience with corporate clients
Well I have even more personal experience that contradicts yours, and this isn't true at all. Everyone uses Claude / Gemini / OpenAI. Mistral isn't even on the table.
Just a sample: https://mistral.ai/customers
And you can Google for "We use Mistral" to find thousands of usecases by startups and other companies.
Come on, compared to Google Workspace / Microsoft's whatever-it's-called-these-days, the cost of switching from one LLM provider to another is pretty much zero.
Having an option at the back of your mind is all it takes right now, until push comes to shove of course.
I don't think big business is genuinely planning for a world where US tech becomes completely unavailable.
2 years ago I would have agreed with you, but after Greenland the vibe is very different. And it's not like the situation is improving.
Not entirely, but putting more eggs in that basket would certainly be considered lack of planning. Why increase your risk even further when everyone has seen how volatile things can get quickly?
Not at all. We continue taking that decision today.
No they haven't. Every company just buys ChatGPT Enterprise.
No they haven't.
Proof: Most big EU companies use Claude or Gemini or OpenAI, not Mistral. That choice was made recently.
Things have changed in the loud echo chambers of the internet, maybe (but not really, since people were saying that EU data sovereignty was happening any time now since 2016).
I consult for various companies and have definitely seen a trend. It's not quite the rupture that some expect but clearly not nothing either. Until very recently, the risk assessment of using US providers was considered very hypothetical. Today it still doesn't feel imminent, but it does feel very real.
Of course, it will be slow and painful and Europeans will need to use their own services for them to grow and mature.
My _feeling_ is that a lot of EU/European politicians has talked a lot more about the need to be independent from the US after Trump threaten Greenland. At least in the nordic countries. Not only concerning data & privacy, but defence, communications, space etc. All areas. The wheel has started to turn. You will not see it if you look around. But in 10 years time, maybe more, Europe will have stopped depending on the US. And that will hit US hard. We pay a lot of money in services to the US.
The politicians can talk, but they needed to set up an environment that would've let a European company have a decent shot at competing with the best AI models. But they didn't. Should've thought of that before being proud of setting up those strict tech regulations.
> Proof: Most big EU companies use Claude or Gemini or OpenAI, not Mistral. That choice was made recently.
IS a statement with no supporting facts considered "proof"? Just the public list of Mistral customers (https://mistral.ai/customers) is proof alone that quite a few big EU companies are _not_ in fact using Open AI or Claude or Gemini at the strategic level.
Contrast with Antrhopic's Europe based customers, the majority of which are small companies (only big one I can identify from a skim is L'Oreal): https://claude.com/customers?f80ce999_sort_date=desc&f80ce99...
Or OpenAI's customers, of which the only big European ones I can spot are Scania and Philips: https://openai.com/stories/
Note: I'm talking about strategic enterprise AI deployments for the company or at least a division, not individual developers being allowed to use Claude Code etc. The moat and the money will be in the former, not latter.
Is this the best Grok alternative?
Any model is.
This sounds like an ideology based reply. Grok is underrated and I think has a better chance of long term success than most. The current growth strategy means (for me) their chat harness is not up to par for serious work.
Their API is consistently among the most used on OpenRouter. While I canāt vouch for it myself, I think this is a decent proxy for capability. You can definitely see glimmers of greatness in their chat interface, it just feels like the system prompts are focused on something that doesnāt interest me.
Grok is not SOTA, but its so obviously better than Mistral. Mistral is just some European patriotism or something.
Grok is nice for asking morally gray questions. ChatGPT will lie in these cases.
What lies have you seen? ChatGPT is the most censored one, but Iāve only seen rejections, not lies.
My other complaint is that ChatGPT ends every response with a teaser to ask more questions.
Ask game theory questions with real humans where its best to defect.
> Grok is nice for asking morally gray questions. ChatGPT will lie in these cases.
Are you really that oblivious to the painfully cringy manipulation tactics by the man who partied at Epstein's island? https://www.theguardian.com/technology/2025/nov/21/elon-musk...
If you couldn't use the words Europe to describe why you'd chose Mistral, you'd have no good reasons to choose Mistral.
Its just not good. Its bottom floor for LLMs.
> Its bottom floor for LLMs.
What? That's just demonstrably false. The market doesn't consist of 5 providers.
You know about LMarena? I just looked it up, Mistral is number 59 on the list.
Free Chinese models are better than it.
That's one of the possible benchmarks, not the only one. Being 59th there, on a list enriched with every variation of Model_Name X.Y (March 2025 Preview) Pro-Thinking, translates to being in the top 10 providers worldwide which is a very interesting mark of failure considering that coincidentally they're also number 1 from their economic area. If you don't know why the last part is important, go read some news.
I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.
I found it to be the best model if you want to talk about topics philosophical. It has no problems going deep and technical while other models tend to be afraid of overshooting the comprehension of the reader.
their ocr model is goated
Did they make significant improvements in OCR 3? The quality I was getting from Mistral OCR 2 was nowhere near as good as what I could get from just sending the same files to Claude Sonnet via an API call.
I have been finding Voxtral useful though.
Better than Qwen? I guess the best overall is Gemini, right?
Gemini? Not anywhere near.
Gemini is the worst
Really? This article was gushing about it:
https://generativehistory.substack.com/p/gemini-3-solves-han...
Which one's the best?
probably yes. considering that even some of their non-ocr models can recognize my shitty handwritten math
also offering support for local deployments
Go Mistral !
first, there was .ai
next, it sounds like it's going to be .eu
but what about ai.eu
> but what about ai.eu
oh, .. why?
> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.
> Post-training methods allow teams to refine model behavior for specific tasks and environments.
How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?
There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.
Pre-training mean exposing an already-trained model to more raw text like PDF extracts etc (aka continued pre-training). You wouldn't be starting from scratch, but it's still pre-training because the objective is just next token prediction of the text you expose it to.
Post-training means everything else: SFT, DPO, RL, etc. Anything that involves things like prompt/response pairs, reward models, or benefits from human feedback of any kind.
Er, then what is the "already trained" model? I thought pre-training was the gradient descent through the internet part of building foundational models.
Probably marketing speak for full fine-tuning vs PEFT/LoRA.
I would guess:
Pre-training: refining the weights in an existing model using more training data.
Post-training: Adding some training data to the prompt (RAG, basically).
I think they are referring to ācontinued pretrainingā.
I can imagine that, as usual, you start with a few examples and then instruct an LLM to synthesize more examples out of that, and train using that. Sounds horrible, but actually works fairly well in practice.
Probably just means SFT fine-tuning a base model, vs behavioural dpo and/or SFT fine-tuning a instruction model.
Mistral is doing some really great stuff lately. Sure, it's hard to compete with OpenAI and Anthropic and their models, but they are taking up some interesting takes and designing their product in unique ways.
I like a lot what they are doing and I'll be watching them a lot more closely. I'd love to work for them btw!
Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so weād probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.
How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.
I'm thinking stuff like this:
https://denverite.com/2026/03/12/ai-recycling-facility-comme...
You could take a model like the one referenced in the article, retool it with Forge for oh I don't know, compost, and use it to flag batches that contain too much paper for instance.
These kinds of applications would work across industries, basically anywhere where you have a documented process and can stand to have automated oversight.
You can fine tune small, very fast and cheap to run specialized models ie. to react to logs, tool use and domain knowledge, possibly removing network llm comms altogether etc.
rag basically gives the llm a bunch of documents to search thru for the answer. What it doesn't do is make the algorithm any better. pre-training and fine-tunning improve the llm abaility to reason about your task.
RAG is dead
Using tools and skills to retrieve data or files is anything but dead.
I think people just mean "using vector databases to enable RAG".
Even that doesn't make sense. Why would you not build a vector database to complement your RAG engine?
For coding use cases you may want a way to search for symbols themselves or do a plain text exact match for the name of a symbol to find the relevant documents to include. There is more to searching than building a basic similarity search.
Sorry but who mentioned coding as a use-case? My comment was general and not specific to the coding use-case, and I don't understand where did you get the idea from that I am arguing that building a similarity search engine would be a substitute to the symbol-search engine or that symbol-search is inferior to the similarity-search? Please don't put words into my mouth. My question was genuine without making any presumptions.
Even with the coding use-case you would still likely want to build a similarity search engine because searching through plain symbols isn't enough to build a contextual understanding of higher-level concepts in the code.
I mentioned coding as a use case in my comment you replied to. You were asking for an example for when one wouldn't use vector search and I provided one. I did not say similarity search would be a substitute. I said that for the coding case you do not need it.
>you would still likely want to build a similarity search engine
In practice tools like Claude Code, Codex, Gemini, Kimi Code, etc are getting away with searching for code with grep / find and understanding code by loading a sufficient amount of code into the context window. It is sufficient to understand higher level concepts in the code. The extra complexity of maintaining vector database top of this is not free and requires extra complexity.
And yet your blog says you think NFTs are alive. Curious.
But seriously, RAG/retrieval is thriving. It'll be part of the mix alongside long context, reranking, and tool-based context assembly for the forseeable future.
The issue I had with RAG when I tried building our own internal chat/knowledge bot was pulling in the relevant knowledge before sending to the LLM. Domain questions like "What is Cat Block B?" are common and, for a human, provide all the context that is needed for someone to answer within our org. But vectorizing that and then finding matching knowledge produced so many false positives. I tried to circumvent that by adding custom weighting based on keywords, source (Confluence, Teams, Email), but it just seemed unreliable. This was probably a year ago and, admittedly, I was diving in head first without truly understanding RAG end to end.
Being able to just train a model on all of our domain knowledge would, I imagine, produce much better results.
I don't think RAG is dead, and I don't think NFTs have any use and think that they are completely dead.
But the OP's blog is more about ZK than about NFTs, and crypto is the only place funding work on ZK. It's kind of a devil's bargain, but I've taken crypto money to work on privacy preserving tech before and would again.
Not OP, but...
> Of course you would have to set a temperature of 0 to prevent abuse from the operator, and also assume that an operator has access to the pre-prompt
Doesn't the fact that LLM's are still non-deterministic with a 0 temperature render all of this moot? And why was I compelled to read a random blog post on the unsolved issue of validating natural language? It's a SQL injection except without a predetermined syntax to validate against, and thus a NP problem we've yet to solve.
I have no interest in anything crypto, but they are making a proposal about NFTs tied to AI (LLMs and verifiable machine learning) so they can make ownership decisions.
So it'd be alive in the making decisions sense, not in a "the technology is thriving" sense.
Wait, what does NFTs have to do with RAG?
I, for one, find NFT-shilling to be a strong signal that I should downgrade my trust in everything else a person says.
Nothing, I think they're just pointing out a seeming lack of awareness of what really is or isn't dead.
Is it??
In what, X's hype circles? Embeddings are used in production constantly.
This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.
I think itās interesting what this approach suggests about who will profit from AI. Iām sceptical that having huge numbers of GPUs is a moat. After all, real humans ā even geniuses ā are trained on much much less data than the whole Internet. But proprietary and specialised data could very well be a moat. Itās hard to train a scientist/lawyer/analyst without reading a lot of science/law/finance. Companiesā proprietary data might encode a great deal of irreplaceable knowledge. Seems as if Mistral is taking this bet.
> After all, real humans ā even geniuses ā are trained on much much less data than the whole Internet.
It's certainly different data, but one could argue that real humans have been trained on 3.5 billion years of evolution data.
> Forge enables enterprises to build models that internalize their domain knowledge. Organizations can train models on large volumes of internal documentation, codebases, structured data, and operational records. During training, the model learns the vocabulary, reasoning patterns, and constraints that define that environment.
I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.
I was under this impression as well - I'd love to hear from someone who's deeper in the know about this!
Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.
Interesting how Mistral is investing into training models for industry specific use cases. With the commoditization of intelligence by base models, they're probably looking to creating value from specialized verticals.
ASML and ESA as clients means something. I dont expect to see the first name somewhere else on the logo list
I find the mistral "middle" between small LMs /1T LMs compelling. Models that are sufficiently big to be performant but specialised for domains and tasks- this is what I assumed we'd always head towards.
They mention pretraining too, which surprises me. I thought that was prohibitively expensive?
It's feasible for small models but, I thought small models were not reliable for factual information?
Typical stages of training for these models are:
Foundational:
- Pretraining - Mid/post-training (SFT) - RLHF or alignment post-training (RL)
And sometimes...
- Some more customer-specific fine-tuning.
Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely.
My bet is that the solution to continuous learning is with external storage. There is a lot of talk about context engineering - but I have not seen anyone taking context as the main bottleneck and building a system around that. This would show that even context engineering is kind of wrong term - because context does not enter the llm in some mysterious way - it goes through prompt and the whole model of passing chat history back and forth is not the most efficient way of using the prompt limitation.
"External Storage" whatever that is can not be the same as continous learning as it does not have the strong connections/capture the interdepencies of knowledge.
That said I think we will see more efforts also on the business side to have models that can help you build a knowledge base in some kind of standardized way that the model is trained to read. Or synthesize some sort on instructions how to navigate your knowledge base.
Currently e.g. Copilot tries to navigate a hot mess of a MS knowledge graph that is very different for each company. And due to its amnesia it has to repeat the discovery in every session. No wonder that does not work. We have to either standardize or store somewhere (model, instructions) how to find information efficiently.
The key to make Copilot useful is to take the limited context problem seriously enough. There are many dimensions to it: https://zby.github.io/commonplace/notes/context-efficiency-i... and it should be the starting point for designing the systems that extensively use llms.
What do you mean when you say "external storage?"
A knowledge base - something where the LLM knows how to find the knowledge it needs for a given task. I am working on this idea in https://zby.github.io/commonplace/
A form of context engineering
The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.
Ironically that was also the past of AI. In 2016 it was all about specialized models (not just training data, everything including architecture and model class/type) for specific tasks and that's the way things had been for a long time.
Are you suggesting that it's an aberration that from ~2019 to ~2026 the AI field has been working on general intelligence (I assume this is what you mean by "achieving benevolent knowledge")?
Personally I think it's remarkable how much a simple transformer model can do when scaled up in size. LLMs are an incredible feat of generalization. I don't see why the trajectory should change back towards specialization now.
I don't think that's true. Nothing points to specialized LLMs being better. General purpose LLMs are just much more useful in daily work.
To be more specific, I think the future is local and specialized. IBM among others thought the same way with their giant mainframe centralized computers and the original way people would utilize software in the 70s. It's an interesting parallel to today's cloud if you think about it. It's just not scalable from a resource (hardware), energy, and cost perspective. I think we're living a unique time, but it's going to change. Without continued massive funding and a pivot to sustainable, things will (and should) change.
Don't get me wrong, general intelligence will always be important and should be a part of specialist models to a degree for understanding, but it doesn't make sense to use an 800B+ parameter model to help write an email or do research on company trends. Hell, look at what China has been able to do. Qwen 3.5 9B, exceeds Claude 3.5 Haiku and nears Sonnet 3.5 levels. The 27B variation of Qwen 3.5 is superior to both in many ways and even rivals newer models. There is obviously an inherit lag behind, but we will gradually see a shift as these models become more capable.
Right now we are chasing 1-2% improvements at the cost of billions. Local are already absurdly capable (more and more by the day - same with cloud ofcourse) and smarter than most people in specific areas. To do most jobs, can we honestly say it requires a PhD or higher level understanding to perform? We're chasing something that is becoming more and more not needed from a general day to day perspective. AGI is outstanding, but not practical (at least today). I think we'll get there anyway at our current trajectory (though dangerous), but I suspect things will shift.
lol the AI-generated support reply about their own AI model is peak 2026
the naming mess is wild though. i ran into similar confusion trying to set up mistral for a side project ā ended up just guessing which endpoint was the right one
The fine tuning endpoint is deprecated according to the API docs. Is this the replacement?
https://docs.mistral.ai/api/endpoint/deprecated/fine-tuning
Interesting to see. I thought they were promoting fine tuning
Interesting. Does this actually scale though ? I've never seen enterprises which have "internal knowledge" in proper readable form - it's often in code, and more importantly in people who wrote them.
I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.
Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).
Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.
My sense is that it sounds amazing in theory to executives who have never had to themselves look at internal data. In reality the internal knowledge base is a mix of incomplete, inaccurate, self serving lies, out of date and so on. At worst, the data is explicitly biased to hide reality from executives so the AI will look extra good to executives. Of course, a business that makes all tactical decisions based on lies is not going to do well.
Looks interesting. But how to explore or test or use? The product page (https://mistral.ai/products/forge) also does not contain anything useful. Just "Contact us"
Dissapointing.
I cannot keep up with their products, model names and releases. What is what for? Their marketing texts do not make sense for me. Is there a nice overview somewhere?
I am a simple stupid Le Chat user with a small mind and the Tredict MCP Server connected to it (to Le Chat, not my mind), which works ok-ish. :-)
I was enthusiastic but itās "contact us" priced for now. I was expecting a classic cloud LLM forge with a public pricing.
This looks good but how much money are we talking here? Are we 'retraining' an entire model but adding enterprise data to the public data set?
I thought that for pretraining to work and reasoning to emerge you need internet scale data. How can forge achieve it with just internal company data (unless the said company is AT&T or something) ?
I wasnt able to find a way to access this - is this something accessible only to enterprises ?
Would love to take it for a spin, if that is even possible.
Good for them. Really hope they find market fit
Go EU!
How does this compare to fine tuning?
It seems to me that it is broadly the same thing, except they give you the resources to do it and expert knowledge.
can i use mistral to read my source code and teach it so i don't need to inject the whole doc every single time and consume token every single time?
> Code agents are becoming the primary users of developer tools, so we built Forge for them first, not
... for humans.
Id training or FT > context? Anyone have experience.
Is it possible to retrain daily or hourly as info changes?
where sample notebook/script? where github? where signup?
...learn a thing or two from NVIDIA or gtfo
lol
> Mistral AI has already partnered with world-leading organizations, like ASML, DSO National Laboratories Singapore, Ericsson, European Space Agency, Home Team Science and Technology Agency (HTX) Singapore, and Reply to train models on the proprietary data that powers their most complex systems and future-defining technologies.
When you can actually represent somebody like the ESA get in touch with them. Otherwise, uh, gtfo.