"HI" is Home Improvement, not Hawaii

No. Not Hawaii. For the third time today: "HI" means home improvement.
If you've used an AI for more than a week, you've typed this. Or something like it. And maybe with explitives.
You ask for "total loans for HI this quarter." The agent does something reasonable: it finds a state column, filters state = 'HI', sums the balances, and hands you Hawaii. You correct it. Tomorrow it does the exact same thing, because the correction lived in a chat thread, not in your data.
The worst type of a error is the one you don't get red flashing text for. You got a result back and it might even look right. But instead what you have is an expensive (token$), well-formatted, completely wrong answer, and there is nothing suggesting otherwise.
This is the most dangerous failure mode in AI analytics.
This isn't hypothetical: Sunlight Financial is a lender. If you ask anyone on their team what "HI" means, 100% will answer "Home Improvement".
Why LLMs guess "Hawaii"
When you point an LLM at a database and ask about "HI", the model has to map two letters onto something in your schema. It has two things working against it.
First, its prior. Across everything an LLM has ever read, "HI" in a data context overwhelmingly means Hawaii. The model isn't confused. It's confident(ly wrong) here!
Second, your schema cooperates with the mistake. Your dataset has a state column! It probably has dozens of them. "HI" finds a perfect logical home. The model filters on it and moves on. It never had a reason to doubt itself.
That second point is what makes business vocabulary so treacherous for AI. The failures aren't obscure terms the model has never seen. They're short, ordinary tokens that land in a filter or group-by slot where the wrong reading is just as grammatical as the right one. "Loans for HI" reads as "Loans made in the state of Hawaii".
Here's a few more we've seen in state abbrevations alone:
| You ask | The AI assumes | You actually meant |
|---|---|---|
| "Loss ratio for PA" | Pennsylvania | Personal Auto (line of business) |
| "Claims in CA" | California | Commercial Auto |
| "Procedures in OR yesterday" | Oregon | Operating Room |
| "Patients per MD" | Maryland | the attending doctor |
| "Orders shipped ND" | North Dakota | Next-Day shipping |
Every one of these is an official internal code that happens to collide with a US state abbreviation. Every one appears in millions of rows. And in every case the lakehouse has a state column, so the model finds a plausible place to filter and hands back a number that looks right.
The vocabulary problem isn't limited to acronyms, either. The same confident-wrong pattern shows up whenever a business word has one specific meaning at your company:
- Ask for "revenue" and the AI sums
retail_price(the list price) instead ofsale_price(what you actually collected). The chart looks fine. The number is inflated. - Ask "how many customers do we have" and the AI runs
COUNT(*)over an order-items table, counting line items instead of people. You get a number 10x too high. - Ask for "churn" and the AI picks one of logo churn, gross revenue churn, or net revenue churn at random, because you never told it which one your company means.
None of these throw an error. All of them are wrong. Better prompts don't fix it, and neither will a bigger model. The information the AI needs isn't in the question or in the schema. It's in your team's heads.
The missing layer: ontology
The fix is to write that knowledge down in a place the AI is required to check. At Definite, that place is the ontology layer.
We've had a semantic layer, since we launched and it does it's job. Think of ontology as the layer directly above it. The semantic layer tells an agent how to compute a metric: the SQL, the joins, the filters. The ontology layer tells it what the business words mean and where to go for each one. It's the top of the meaning stack:
Raw tables → Semantic layer (certified metrics and dimensions) → Ontology (the business vocabulary that points to them)
An ontology is made of concepts. A concept is one business term: revenue, customer, churn, home_improvement. Each concept carries three things that matter here:
- Aliases: every way people say it.
revenuemight carry[sales, gmv, gross revenue].home_improvementcarries[HI]. When someone says "HI," the agent resolves it to the concept, not the state column. - Guidance: plain-language steering. What the term is preferred for, and what to avoid. "Don't filter on borrower state when someone asks for HI." "Don't use
retail_priceas realized revenue." - Links: soft, typed pointers to whatever actually backs the concept. A certified measure, a dimension, a raw column, a transformation script, a doc, or another concept.
Here's the home-improvement concept that fixes our opening example:
name: home_improvement
kind: concept
label: Home Improvement
aliases: [HI, home improvement loans, reno]
description: Loans originated for the home improvement product line.
guidance:
preferred_for:
- Loan volume and balances for the home improvement product.
avoid:
- Filtering on borrower state when someone asks for "HI".
links:
- relation_type: source_column
target: column:lending.loans.product_code
- relation_type: measured_by
target: measure:loans.total_originated
Save that, and "total loans for HI" stops being a coin flip. The agent searches the ontology, lands on home_improvement, reads the guidance that explicitly steers it off the state column, and routes to the certified measure. Same question, right answer, every time.
And here's the revenue concept, the one that quietly protects every "what was revenue" question from the list-price trap:
name: revenue
kind: concept
label: Revenue
aliases: [sales, gmv, gross revenue]
description: Realized sales from order item sale prices.
guidance:
preferred_for: [sales reporting, revenue trends]
avoid: [products.retail_price as realized revenue]
links:
- relation_type: measured_by
target: measure:order_items.total_revenue
- relation_type: source_column
target: column:order_items.sale_price
Soft links, so you can start today
You don't need a fully modeled warehouse to start. Links in the ontology are soft references: when you save a concept, Definite validates the shape of each link, not whether the target exists yet. You can name measure:loans.total_originated before that measure is built. The vocabulary can lead; the data catches up.
This unlocks the most underrated capability of an ontology: teaching the AI to admit when it doesn't know.
A concept can have zero links. It's still first-class.
name: territory
kind: concept
label: Sales Territory
aliases: [region, patch]
description: A sales territory. Real business term, not yet modeled.
links: []
Now when someone asks to "break revenue down by territory," the agent finds the concept, sees there's nothing backing it, and says so: "Territory is a real concept but it isn't modeled in the warehouse yet. Want me to use state as a proxy?" Compare that to the alternative, where the AI invents a territory column and silently returns a wrong breakdown. The bravest thing an AI analyst can do is tell you it doesn't know, and a standalone concept is how it learns to.
How Fi uses it
Fi, Definite's AI analyst, is told to go to the ontology first. The flow is:
- A business term comes in ("revenue," "HI," "churn").
- Fi searches the ontology for a matching concept.
- It reads the concept's guidance and follows the links to a certified measure or dimension.
- Only if nothing matches does it fall back to writing raw SQL against the tables.
That single redirection, business term to certified definition before touching a raw table, is what turns the failure modes above into non-events. The disambiguation, the routing, and the honest "not modeled yet" all happen before a single byte of wrong SQL gets written.
Building your ontology
You define concepts as YAML, one file per concept, and manage them from the CLI or the UI:
definite ontology list # every concept
definite ontology search HI # find by name, alias, guidance, or link
definite ontology get revenue # show one concept as YAML
definite ontology save -f hi.yaml # create or replace a concept
Start with the ten words your team argues about most, or the ten that an AI would most obviously get wrong, and write those down first. Each concept you add is one more confidently-wrong answer your AI will never give again.
When Fi makes a mistake, tell her, she'll probably (give her a break, she's non-deterministic) suggest updating the ontology. And can do it for you from chat.
The vocabulary your company speaks is real, specific, and almost never written down anywhere a machine can read it. Your AI analyst is only as good as the meaning you give it. The ontology layer is where that meaning lives.
Want to see it on your own data? Book a demo and we'll map your first ten concepts together.