Thread Reader
Simon Willison

Simon Willison
@simonw

Oct 17, 2025
8 tweets
Tweet

Anyone got a success story they can share about fine-tuning an LLM? I'm looking for examples that produced commercial value beyond what could be achieved by prompting an existing hosted model - or waiting a month for the next generation of hosted models to solve the same problem

Here's a great example of the kind of story I'm looking for
Vlad Bukhin from Checkr worked on a really succesful project there for fine tuning that was written up: linkedin.com/pulse/genai-ar I haven't seen a writeup, but also heard from friends at Ramp that they got a lot of lift out of finetuning an OS model for extraction, a while ago
And for cost, how much did it cost in research time and training costs, and how much would it need to save you in production to have a good ROI - plus how do you think about the risk that something like a GPT-5.1 might come along that outperforms it?
Another good one: fine-tuning a model to work better with a niche programming language Jane Street did this as well for OCaml: youtube.com/watch?v=0ML7ZL
Brendan Hogan

Brendan Hogan
@brendanh0gan

we fine-tuned a model for a niche programming language in finance called q: x.com/brendanh0gan/s
Getting <500ms response times for a UI that updates as you type seems like a very strong justification for fine-tuning a small, fast custom model
Tim Brown

Tim Brown
@_brimtown

We built Datadog’s natural language querying features (variant of text->SQL) using a fine-tuned model, replacing prompted OpenAI models. We did this explicitly for latency and cost purposes: the feature actually translates as you type in the UI, which required both <500ms latency, and would have been wasteful to do on a pay-per-token model like the hosted providers. We run it on our own pay-per-hour GPUs, allowing real time translation. Any UX that’s trying to feel like a tab-completion model (fast, user accept/reject) likely would benefit from similar approaches docs.datadoghq.com/logs/explorer/
v0 is running on some credible looking fine-tuned models, specializing in their Next.js stack
Max Leiter

Max Leiter
@max_leiter

v0 uses a finetuned LLM to fix various issues models have with Next.js and code generation in general Some details here: vercel.com/blog/v0-compos
Here's a neat one: warehouse automation, using a vLLM (fine-tuned Gemini 2.5 Flash, big cost saving over 2.5 Pro) to check that containers on a conveyor belt are carrying the expected items
Jelmer Borst

Jelmer Borst
@japborst

We fine-tuned Gemini VLM for 2.5 pro performance, for 2.5 flash speed & cost blog.picnic.nl/adding-eyes-to
Shopify have deployed fine-tuned vision LLMs based on "LlaVA 1.5 7B, LLaMA 3.2 11B, and currently Qwen2VL 7B" to help process product photos at scale
Simon Willison
Creator @datasetteproj, co-creator Django. PSF board. Hangs out with @natbat. He/Him. Mastodon: https://t.co/t0MrmnJW0K Bsky: https://t.co/OnWIyhX4CH
Follow on 𝕏
Missing some tweets in this thread? Or failed to load images or videos? You can try to .