Anyone got a success story they can share about fine-tuning an LLM? I'm looking for examples that produced commercial value beyond what could be achieved by prompting an existing hosted model - or waiting a month for the next generation of hosted models to solve the same problem

Here's a great example of the kind of story I'm looking for

Ravin Thambapillai
@ravin_tham

Oct 17 25

View on Twitter

Vlad Bukhin from Checkr worked on a really succesful project there for fine tuning that was written up: linkedin.com/pulse/genai-ar I haven't seen a writeup, but also heard from friends at Ramp that they got a lot of lift out of finetuning an OS model for extraction, a while ago

LLMOPs Micro-Summit, San Francisco youtube.com/watch?v=IPDqbfZT1qg&t=1s As the demand for more efficient, accurate, and cost-effective background checks grows, the ability to transform messy,...

linkedin.com/pulse/genai-ar…

GenAI Architecture Series: Streamlining Background Checks with Fine-Tuned Language Models

And for cost, how much did it cost in research time and training costs, and how much would it need to save you in production to have a good ROI - plus how do you think about the risk that something like a GPT-5.1 might come along that outperforms it?

Another good one: fine-tuning a model to work better with a niche programming language Jane Street did this as well for OCaml: youtube.com/watch?v=0ML7ZL

Brendan Hogan
@brendanh0gan

Oct 17 25

View on Twitter

we fine-tuned a model for a niche programming language in finance called q: x.com/brendanh0gan/s

Getting <500ms response times for a UI that updates as you type seems like a very strong justification for fine-tuning a small, fast custom model

Tim Brown
@_brimtown

Oct 18 25

View on Twitter

We built Datadog’s natural language querying features (variant of text->SQL) using a fine-tuned model, replacing prompted OpenAI models. We did this explicitly for latency and cost purposes: the feature actually translates as you type in the UI, which required both <500ms latency, and would have been wasteful to do on a pay-per-token model like the hosted providers. We run it on our own pay-per-hour GPUs, allowing real time translation. Any UX that’s trying to feel like a tab-completion model (fast, user accept/reject) likely would benefit from similar approaches docs.datadoghq.com/logs/explorer/

v0 is running on some credible looking fine-tuned models, specializing in their Next.js stack

Max Leiter
@max_leiter

Oct 18 25

View on Twitter

v0 uses a finetuned LLM to fix various issues models have with Next.js and code generation in general Some details here: vercel.com/blog/v0-compos

Learn how v0's composite AI models combine RAG, frontier LLMs, and AutoFix to build accurate, up-to-date web app code with fewer errors and faster output.

vercel.com/blog/v0-compos…

Introducing the v0 composite model family - Vercel

Here's a neat one: warehouse automation, using a vLLM (fine-tuned Gemini 2.5 Flash, big cost saving over 2.5 Pro) to check that containers on a conveyor belt are carrying the expected items