Why Sommo Is Getting a Knowledge Graph

A quick note before I start. This is not the usual Sommo post. Most weeks I write about pairings, new producers, or what is happening in a specific wine region. Today I want to do something different and share a piece of research I have just finished. It directly shapes the next version of the app you have on your phone.

It is also the kind of work I want to be more open about in general. Sommo is a one-person project, and if the AI that powers it is going to make suggestions about what you drink, the least I can do is tell you when and how that AI is getting better.

The short version

I ran a small controlled study at the University of Oxford as a water-test for broader research on grounding language models in structured knowledge. The question was simple: does giving Sommo a verified database of real wineries, real regions, and real grape-to-place relationships meaningfully reduce the number of bottles it invents?

The answer was yes, by a lot.

Sommo’s underlying model	Wineries named	Invented	Hallucination rate
Without the knowledge graph	60	16	26.7%
With the knowledge graph	52	4	7.7%

A 3.5x reduction in invented wineries, with no change to the model itself. Same prompts, same temperature, same model. The only difference was a short list of real candidate wines, pulled from a structured graph, attached to each prompt.

For Sommo, that is the difference between “this might be a real bottle” and “this is a real bottle you can walk into a shop and ask for”.

If you want the rigorous version, the full paper is here as a PDF, the experiment code is open-source on GitHub, and there is a 12-minute technical walkthrough on my personal blog.

A brief history

To understand why I am doing this, the version history helps:

Sommo 7B v1 was the original experimental model: a fine-tuned open-source 7B-parameter model built in a weekend. It hallucinated more than I liked, but it proved a small model could hold its own on wine conversation. (Building a Sommelier in a Weekend tells that story.)
Sommo v2 is the private model powering the iOS app today. For a model of its size, v2 is genuinely strong in the wine niche, and that is not by accident. Behind v2 sits a custom evaluation harness with domain-specific test sets, deliberate work on prompt design and instruction tuning, careful curation of training data, and the kind of LLM best practices I would apply to any production system: structured evals, regression tests on previous failures, calibration against expert wine references. Every release is measured before it ships.
Sommo v3 is the next version. It pairs the model with a knowledge graph. That is what this post is about.

The pattern across the three versions is the same: each step makes Sommo less of a guesser and more of a reliable assistant. v2 took the language model approach about as far as testing and tuning alone can take it, and that is meaningfully far. v3 is the step that adds structured truth on top, because there is a class of mistake (an almost-real winery name, a vintage just out of range) that evals struggle to catch on their own. That is exactly the class of mistake a knowledge graph is built to eliminate.

What a knowledge graph actually is

A language model is a very good guesser. Give it enough patterns and it learns what a wine critic would say next about a Burgundy. That is wonderful for explaining what terroir means or recommending something rich for a cold evening.

A knowledge graph is the opposite. It is a list of true things and the relationships between them. Château Latour exists. It is in Pauillac. Pauillac is in Bordeaux. Bordeaux is in France. Latour makes Cabernet Sauvignon-dominant blends. No guessing.

v3 combines the two. The language model handles the conversation. The graph handles the facts. When Sommo recommends a wine, that wine is real, that producer is real, and that pairing has structural reasons behind it.

What the experiment looked like

I took the public WineEnthusiast wine reviews dataset (130,000 critic reviews) and built a knowledge graph from it. After cleaning, the graph holds:

34,189 wines from France, Italy and Spain
6,181 wineries (after merging duplicates like Château Latour, Chateau Latour and so on)
363 grape varieties
806 wine regions, organised under 29 provinces and 3 countries

Then I asked an off-the-shelf frontier language model to recommend three wines similar to each of 20 query wines, in two conditions:

Without the knowledge graph. Just the question.
With the knowledge graph. The same question, plus a short list of eight candidate wines pulled from the graph for context.

For every winery the model named, I checked whether it actually appears in the 6,181-winery list (with sensible normalisation for accents and prefixes). That is where the 3.5x came from.

What else the graph does

Hallucination reduction is the user-visible win, but the graph quietly does more.

It knows the geography. Bordeaux is a region in France. Pauillac is a sub-region in Bordeaux. The graph encodes this hierarchy explicitly, so when you ask for a wine from “France” Sommo can include Pauillac wines without having to be told the relationship. Three lines of logic do what a language model needs millions of training examples to approximate.

It can derive new things. From the rule “a winery is premium if at least three of its wines are rated Outstanding or Classic”, the graph automatically labels 247 wineries as premium, including names you would expect (Zind-Humbrecht, Louis Jadot, Leflaive, Latour) and a few you might not. That label was never in the source data; the rule found it.

It catches inconsistencies. When the model wants to recommend a Spanish-only grape variety for a Bordeaux wine, the graph can simply reject the suggestion. No retraining needed.

It supports better search. The same graph that grounds the model also powers cleaner filters (by region, by price band, by quality tier, by grape) because the relationships are explicit and queryable rather than baked into millions of model weights.

What this means for the app

For users, mostly: things just get more reliable.

When Sommo scans a label and tells you what is in the bottle, the producer name will be one that exists. When it suggests a wine to pair with your dinner, the suggestion will be a wine you can actually buy. When it explains why a Riesling works with Thai food, the underlying chain of facts is grounded in a structured source rather than reconstructed each time from scratch.

I will be rolling this out gradually. The label scanner will see the graph first, since that is where invented producers are most visible, followed by the recommendation engine, then the food-pairing flow. Existing app features keep working throughout; the graph runs alongside the model rather than replacing anything.

A note on what is not changing

Sommo’s voice is not changing. The warm, knowledgeable, approachable tone that people tell me they like comes from the model and the curated training data, not from the underlying retrieval. The graph just makes sure that when Sommo says “try a 2018 Crozes-Hermitage from such-and-such”, that producer actually exists and that wine actually fits.

The free tier stays free. The graph is infrastructure. It does not unlock anything that was previously paywalled.

Sommo’s purpose has not changed either: make wine approachable for people who want to enjoy it without pretending they understand what “grippy tannins with notes of pencil shavings” means. v3 is a step toward making sure the wines I recommend are wines you can actually find.

Try Sommo today

If you have not yet, Sommo is a free download on iOS. Scan a label, get a recommendation, ask a question. It works straight away, no account required. The knowledge-graph work above is the next step. What is in your hand today is what I am most proud of right now.

Download Sommo on the App Store →

sommo.app · Technical write-up · Read the paper (PDF) · Source code on GitHub