Your AI Analyst Is Only as Good as Your Metric Definitions
Mike Ritchie

Three people on your team calculate revenue. One includes refunds. One excludes trial accounts. One counts recognized revenue on a cash basis. All three are "right" according to their own logic. All three produce different numbers.
This is metric drift. It's older than any AI trend. But in 2026, it's urgent for a new reason: AI assistants are inheriting your confusion.
When you point an LLM at a database and ask "what was revenue last quarter?", the model has to decide which tables, which filters, and which business rules to apply. If your team can't agree on the answer, why would an AI do any better?
The answer is a semantic layer. Not a new concept, but one that's suddenly central to how modern analytics works. This post explains what a semantic layer does, why it matters more now than ever, and how to build one that makes both humans and AI smarter.
The metric drift problem
Every growing company hits this wall. Marketing says the customer count is 4,200. Sales says 3,800. Finance says 3,950. Nobody is lying. They're just counting differently.
Marketing includes free trial signups. Sales only counts accounts with a signed contract. Finance counts accounts that have generated at least one invoice. Each definition is reasonable in context. But when the CEO asks "how many customers do we have?" in a board meeting, nobody agrees.
This isn't a data quality problem. The data is fine. It's a definitions problem. "Customer," "revenue," "churn" mean different things to different teams because nobody wrote down the canonical definition and enforced it across every query, dashboard, and report.
The traditional fix was a data dictionary or a wiki page that nobody read. More rigorous teams built dbt models or created views in the warehouse. But these were advisory. Nothing stopped a sales ops person from writing a one-off query that defined "active customer" differently.
This is where a semantic layer changes the game.
What a semantic layer actually does
A semantic layer is a rule book that sits between your raw data and everyone who queries it.
Your database stores rows and columns: orders.total_amount, users.created_at, subscriptions.status. A semantic layer translates those raw fields into business concepts: "Monthly Recurring Revenue," "Active Customers," "Net Revenue Retention." It defines exactly which tables, joins, filters, and calculations produce each metric.
Here's a concrete example. Your database has an orders table with columns like amount, currency, refunded_at, and tax. Your finance team defines revenue as:
Sum of
amountwhererefunded_atis null, converted to USD, excluding tax, for orders withstatus = 'completed'.
In a semantic layer, that definition gets encoded once. Every dashboard, every report, every API call, and every AI query uses the same definition. Nobody can accidentally include refunded orders or forget the currency conversion. The rule book enforces it.
Three things make this different from just writing a SQL view:
-
It's declarative. You describe what "revenue" means, not how to compute it for every possible query. The semantic layer engine handles the SQL generation, optimization, and caching.
-
It's multi-dimensional. You can slice revenue by region, product, time period, or customer segment without rewriting the logic. The semantic layer knows how to join the right tables and apply the right filters.
-
It governs access. The semantic layer can enforce who sees what (row-level security, column masking) and ensure that every consumer of data, whether human or AI, plays by the same rules.
A semantic layer turns tribal knowledge into executable code.
Why AI assistants give wrong answers without one
This is why semantic layers are suddenly everywhere in 2026.
The Metabase 2025 Community Data Stack Report surveyed 330+ teams and found that average confidence in AI query results was just 5.5 out of 10. People in more technical roles trusted AI results even less. Nearly everyone is trying AI-powered analytics. Almost nobody trusts the output.
Because most AI-to-SQL systems work like this:
- User asks a question in plain English.
- The LLM reads the database schema (table names, column names, types).
- The LLM generates SQL.
- The system runs the SQL and returns results.
Step 3 is where things fall apart. The LLM sees a column called amount and has to guess: does that include tax? Is it in cents or dollars? Does it represent gross or net? Should refunds be subtracted? The model has no way to know. It picks something plausible and generates SQL that looks correct but produces the wrong number.
This is "garbage in, garbage out" applied to AI analytics. The garbage isn't bad data; it's missing definitions. The AI doesn't know that your team defines "active customer" as "has logged in within 30 days AND has an active subscription." It sees users.last_login and subscriptions.status and guesses.
A semantic layer solves this by giving the AI a curated menu instead of a raw ingredient list. Instead of reading 200 tables and 3,000 columns, the AI reads a set of defined metrics and dimensions: "Revenue (monthly, USD, excludes refunds)," "Active Customers (logged in within 30 days with active subscription)," "Churn Rate (monthly, by cohort)."
The AI's job shifts from "figure out how to calculate revenue" to "pick the right pre-defined metric and apply the right filters." That's a much easier problem, and it's why semantic layers are the key to trustworthy AI analytics.
How Definite's semantic layer works
At Definite, the semantic layer isn't optional. It's the foundation.
Definite uses Cube as its semantic layer engine. Cube is an open-source framework for defining metrics, dimensions, joins, and access rules in YAML files. When you connect a data source to Definite, many sources come with prebuilt Cube models. For custom sources, you define models yourself (or our data team does it for you).
Here's what a simple Cube model looks like:
cubes:
- name: orders
sql_table: public.orders
measures:
- name: revenue
sql: amount
type: sum
filters:
- sql: "{CUBE}.refunded_at IS NULL"
- sql: "{CUBE}.status = 'completed'"
- name: order_count
type: count
dimensions:
- name: created_at
sql: created_at
type: time
- name: region
sql: region
type: string
This model says: "Revenue is the sum of amount where the order is not refunded and has a completed status." Every query that touches revenue, whether from a dashboard, an API call, or Fi (Definite's AI assistant), uses this exact definition.
Fi is built to work with the semantic layer first. When you ask Fi a question, it doesn't freestyle SQL against raw tables. It looks at the defined metrics and dimensions, picks the right ones, and generates a query through Cube. This means:
- Fi can't accidentally include refunded orders in revenue.
- Fi can't join tables incorrectly, because joins are predefined.
- Fi always applies the same filters, aggregations, and business rules that your team agreed on.
The result is that Fi gives the same answer your finance team would give. Not because the AI is smarter, but because it's constrained to the right definitions.
This is deliberate. Most analytics tools treat the semantic layer as optional or advanced. At Definite, you can't query data without going through the semantic layer (or consciously choosing to write raw SQL). That constraint is what makes the AI reliable.
Comparison: semantic layer approaches in 2026
The semantic layer market has exploded. Here's how the major approaches stack up:
Looker (LookML)
Looker pioneered the "semantic layer as code" approach with LookML, a proprietary modeling language. LookML is powerful and mature, with strong governance features. The catch: it's proprietary to Google Cloud, requires specialized expertise, and locks you into the Looker ecosystem. If you want a different BI tool, your LookML models don't travel with you. Google recently introduced Looker Modeler to decouple the semantic layer from Looker's BI interface, but adoption is early.
dbt Semantic Layer (MetricFlow)
dbt Labs open-sourced MetricFlow under the Apache 2.0 license at Coalesce 2025, making it the first widely adopted open-source metric engine. If you already use dbt for transformations, adding metric definitions is a natural extension. The strength is ecosystem integration: tools like Hex, Lightdash, and others can consume dbt metrics natively. The limitation is that MetricFlow is still primarily a metric definition layer, not a full query engine with caching, access control, and API delivery. You need additional infrastructure around it.
Cube
Cube is an open-source semantic layer that doubles as a query engine. It handles metric definitions, caching, access control (row-level and column-level), and serves data via REST, GraphQL, and SQL APIs. In 2025, Cube launched its D3 platform with native AI agents and MCP (Model Context Protocol) integration, making it one of the most AI-ready semantic layers available. Cube's strength is that it's both the definition layer and the execution layer. This is why Definite chose it.
AtScale
AtScale targets enterprise-scale deployments with a "universal semantic layer" that sits across multiple data platforms (Snowflake, Databricks, BigQuery). It virtualizes data without moving it and provides an MDX/DAX-compatible interface for tools like Excel and Tableau. In January 2026, AtScale joined the Open Semantic Interchange (OSI) initiative and introduced its Semantic Modeling Language (SML), the first open-source language designed for defining semantic models. AtScale is the right choice for large enterprises that need to layer governance across multiple warehouses without ripping and replacing existing tools.
Hex (Semantic Authoring)
Hex introduced Semantic Authoring in 2025, letting teams define measures, dimensions, and joins directly inside Hex's notebook environment. It also syncs with external semantic models from dbt, Cube, and Snowflake. The strength is the workflow: analysts can build and iterate on models in the same tool where they do analysis. The limitation is that Hex's semantic layer is primarily consumed within Hex itself (though they're building push-to-warehouse capabilities).
Warehouse-native (Snowflake Semantic Views, Databricks Metric Views)
Both Snowflake and Databricks made bets in 2025 that the semantic layer should live inside the warehouse. Snowflake launched Semantic Views; Databricks introduced Metric Views. The appeal is simplicity: no external tool to manage. The risk is vendor lock-in and the fact that these are still early, with fewer features than dedicated semantic layer tools.
How to choose
| If you... | Consider |
|---|---|
| Already use dbt heavily | dbt Semantic Layer (MetricFlow) |
| Need full query engine + caching + access control | Cube |
| Have multiple warehouses at enterprise scale | AtScale |
| Want semantic modeling inside a notebook | Hex Semantic Authoring |
| Are locked into Google Cloud / Looker | LookML |
| Want everything in one platform (warehouse + BI + AI) | Definite (Cube-based) |
Getting started: your first five metrics
You don't need to model your entire warehouse on day one. Start with five metrics that matter to your business. Here's a framework:
1. Revenue The number everyone asks about first. Define it precisely: gross or net? Including or excluding refunds? Which currency? What date counts (order date, payment date, recognition date)?
2. Active users/customers Define what "active" means. Logged in within 30 days? Made a purchase? Has an active subscription? Pick one definition and make it canonical.
3. Conversion rate From what to what? Free trial to paid? Visitor to signup? Be specific about the numerator and denominator.
4. Churn rate Monthly or annual? By customer count or revenue? Does a downgrade count as churn? Define the edge cases.
5. Average deal size or order value Simple but often inconsistent. Include or exclude discounts? One-time or recurring? Define it once and move on.
For each metric, write down:
- Name: What you call it (e.g., "Monthly Recurring Revenue")
- Definition: One sentence, plain English (e.g., "Sum of all active subscription amounts, normalized to monthly, in USD")
- Source tables: Which database tables feed into it
- Filters: What gets excluded (trials, refunds, internal accounts)
- Dimensions: How can it be sliced (by region, product, time period)
This exercise takes an afternoon. It will save you months of "wait, which revenue number are we looking at?" conversations.
How this enables agentic analytics
The industry has been talking about "agentic analytics" since late 2025: AI systems that don't just answer questions but proactively monitor metrics, surface anomalies, and recommend actions. Cube launched AI agents with MCP integration. dbt open-sourced MetricFlow explicitly to "power trustworthy AI and agents." Gartner's 2025 guidance identifies semantic technology as non-negotiable for AI success.
The part that gets lost in the hype: agentic analytics is impossible without a semantic layer.
An AI agent that monitors your business needs to know what "revenue dropped 15%" actually means. It needs to know whether a 15% drop is unusual (requires historical context and the right calculation). It needs to know who to alert (requires understanding of team structure and metric ownership). And it needs to explain itself in terms the recipient understands (requires business-friendly metric names, not SQL expressions).
Without a semantic layer, an AI agent is just a bot running arbitrary SQL against your warehouse and hoping for the best. With a semantic layer, it's an analyst that speaks your company's language.
At Definite, Fi uses the semantic layer to do exactly this. When you ask Fi a question, it doesn't improvise. It queries governed, pre-defined metrics through Cube. That means:
- Consistency: Fi gives the same answer your dashboards give.
- Auditability: Every query Fi runs can be traced back to a specific metric definition.
- Trust: When Fi says revenue is $1.2M, you know exactly how that number was calculated, because the definition is in your Cube model.
This separates a party trick from a production-grade AI analyst. The AI isn't smarter; it's better informed.
The bottom line
Semantic layers are not new. What's new is their role. In 2024, a semantic layer was a nice-to-have for data teams that valued consistency. In 2026, it's the foundation that makes AI analytics trustworthy.
The thesis is simple: your AI analyst is only as good as your metric definitions. If those definitions live in people's heads, scattered across SQL files and Slack threads, your AI will give inconsistent answers and your team won't trust it. If those definitions are codified in a semantic layer and enforced across every query, your AI becomes a reliable extension of your data team.
You don't need to boil the ocean. Start with five metrics. Write them down. Encode them in a semantic layer. Point your AI at the semantic layer instead of raw tables. The difference in output quality will be immediate and obvious.
If you want to see this in practice, try Definite. The semantic layer is built in, Fi is ready to query it, and your first five metrics are an afternoon's work.
- See connectors: definite.app/connector-db
- View pricing: definite.app/pricing