Sync Shopify to a Data Warehouse: 4 Options, 3 Dead Ends, 1 Gotcha

You search for "sync Shopify to a data warehouse" and the SERP is twelve tabs of vendor landing pages that all promise the same three-step flow. None of them mention what it costs at 50K orders a month. None of them mention the Shopify plan tier that silently kills half the streams. One of the top results is a 404.
You want to see blended CAC across Meta and Google, contribution margin by SKU with COGS from NetSuite, and refunds joined back to the original line items — and Shopify's own reporting can't answer any of it. This guide cuts through the AI-Overview-grade answer ("just use Fivetran or Airbyte") and walks the four paths that actually ship, plus the three dead ends you're going to find anyway. Most searchers use warehouse and lake interchangeably — this covers both (and explains the distinction below if you want it).
Here's what actually works in 2026, who each option is for, and how to choose.
For most teams with multi-source analytics needs and no existing warehouse, Definite is the shortest path. If you're already running Fivetran or Airbyte into a warehouse you're happy with, just add Shopify as a source. Custom GraphQL scripts only pay off with dedicated engineering.
- No warehouse yet? Definite handles Shopify sync, storage, and analytics in one platform — no Snowflake or BigQuery bill to add on top. Fastest path from zero to queryable data.
- Already have a warehouse (Snowflake, BigQuery) or lake (S3/Iceberg, Delta)? Fivetran is the zero-maintenance managed pipe. Airbyte is the open-source alternative (Cloud or self-hosted). A custom GraphQL Bulk script if you've got engineering headcount and a specific reason to own the code.
- Shopify-only reporting good enough? ShopifyQL plus the in-admin Analytics editor can cover pipeline-only questions — just not cross-source analytics.
Which Shopify sync method fits your setup?
Do you already have a data warehouse?
What Doesn't Work (and Why You'll Find It Anyway)
Before the real options, three dead ends that eat an afternoon if you don't know to skip them.
Shopify's Admin CSV export works once, then dies. You can pull a one-time CSV of orders, products, or customers from the Shopify admin. That's it. There's no scheduling, no incremental sync, no way to consolidate across two storefronts, and nothing joins to Stripe or ad platforms. Fine for a one-off audit. Not a pipeline.
Hightouch and Census go the wrong direction. Both are reverse ETL — they push data from a warehouse into Shopify, not out of it. If you land on a Hightouch or Census page about Shopify and think you've found your answer, you haven't. Also relevant: Census was acquired by Fivetran, and Census accounts had to migrate to Fivetran Activations by April 1, 2026. If a 2024 blog post recommends Census for Shopify extraction, it was wrong then and it's even more wrong now.
Google's BigQuery Data Transfer Service for Shopify — useful only if you're already on GCP. Two things worth getting right here, because most blogs get them wrong: (1) this is a Google product, not a Shopify product; (2) it's in Preview / pre-GA as of 2026, and Shopify Plus is required only for the GiftCards object — not the connector itself. Even if you qualify, it lands in BigQuery only (not a lake), and does nothing for the multi-source problem. If "get Shopify into BigQuery" is the entire scope, it's worth a look. Otherwise, skip.
One more heads-up: Airbyte's Shopify → Apache Iceberg landing page currently 404s (checked April 2026). The connector path officially exists; the marketing page doesn't resolve. Validate the current state before you put it in a planning doc.
With those cleared, here are the four paths that actually work.
Lake vs. Warehouse — Does It Matter for Shopify?
Most searchers use lake and warehouse interchangeably, so it's worth pinning down what each means in 2026 and whether the distinction actually matters at Shopify scale.
A data warehouse (Snowflake, BigQuery, Redshift) is an integrated product — compute, storage, catalog, and SQL engine in one bundled offering. A data lake in 2026 is an open table format (Iceberg, Delta, or DuckLake) sitting on object storage (S3, GCS, Azure Blob), plus a catalog, plus a query engine that can read the format. The common misconception is that "lake" means "S3 bucket with Parquet files" — it doesn't. Without a catalog and a query engine, you just have files no one can reliably query. (The what-is-a-data-lakehouse primer goes deeper if this is fuzzy.)
For a Shopify-sized workload — tens to hundreds of millions of rows across orders, line items, metafields, and events — both architectures technically work. At your volume, the choice is more about your query-engine ecosystem than raw capability. DuckDB-native shops pick DuckLake or Iceberg. Databricks shops pick Delta. Trino shops pick Iceberg. If you've already got BigQuery or Snowflake running well and the bill isn't hurting, "lake-izing" isn't worth a migration — the question is whether adding clickstream, ad-platform event data, or other cheap-but-high-volume sources later will change the cost math. Lake architectures win for cheap event-scale data; warehouses win for fast, small-query analytics out of the box.
Shopify Inc. themselves migrated an internal 1.5 PB analytics table to Iceberg on Trino in 2023, dropping a query from 3 hours to 1.78 minutes — but that's their internal infrastructure, not a merchant-facing feature. If you want the same Iceberg + Trino architecture without building it, Starburst Galaxy Icehouse (a managed lakehouse that launched in April 2024) is a reasonable option. For the Databricks-alternatives lake question in general, see the comparison piece. For the Iceberg query-engine deep-dive, this one.
Stay in Shopify: enough for pipeline-only questions
Some readers genuinely land here and never need more — worth naming up front because the rest of this guide assumes multi-source, and not everyone is.
Shopify's in-admin Analytics is better than it gets credit for. You can write ShopifyQL directly in the Analytics editor, save custom reports, and schedule email delivery. Sidekick (Shopify's AI assistant) translates natural language into ShopifyQL on your behalf, so the "write SQL against my store" UX is already there, inside the admin.
The critical caveat: the ShopifyQL Admin GraphQL API was sunset in API version 2024-07. ShopifyQL is an in-admin reporting tool only now — you cannot extract ShopifyQL query results over the API into a lake or warehouse. If you're remembering a guide that had you pulling ShopifyQL results via GraphQL, that path is gone. For actual pipeline extraction, use the standard Admin GraphQL Orders / Web Pixels / Customers APIs — which is what every managed connector below is already doing.
When this is enough:
- Pipeline-only questions: order velocity, top products, customer segments by recency, basic cohort retention
- Your business lives inside Shopify and the only "outside" number is ad spend you can squint at in the Meta and Google dashboards
- You don't have the team bandwidth to run a real pipeline yet
When you'll outgrow it:
- Blended CAC across ad platforms, contribution margin by SKU (needs NetSuite or QuickBooks for COGS), multi-store consolidated reporting, clickstream attribution
- Anything that joins Shopify with Stripe, Klaviyo, or ad platforms
- The CFO asking a question that sounds simple ("What's our payback on the August Meta spend?") but structurally requires four systems joined together
Verdict: Strong for Shopify-native questions. Not designed for cross-source analytics. If you've got two Shopify stores and need them in one table, you're already past this.
Definite: when you don't already have a warehouse
If you don't already have a warehouse, Definite handles Shopify sync, storage, and analytics end-to-end. You paste a Shopify API access token; Definite auto-catalogs the streams into its built-in DuckLake (an open, Iceberg-compatible table format built on DuckDB — a modern warehouse on a lake foundation), and you can write SQL, build dashboards, or ask Fi — the AI assistant — conversational questions a few minutes later.
DuckLake v1.0 ships "data inlining," which stores small updates directly in the catalog — eliminating the small-files compaction work that Shopify's constant order/refund/metafield updates otherwise create in Iceberg or Delta workflows.
Here's a one-minute walkthrough of connecting Shopify and querying your data:
What Definite's Shopify connector captures:
ordersandtransactionsas flat tables. Line items, discount allocations, and refund line items land as JSON columns insideordersandorder_refunds— youUNNESTorjson_extractfor per-line analysis. Refund line items are nested insideorder_refundsalongsidetransactionsandorder_adjustments.- Dedicated metafield tables for products, product variants, customers, and orders (plus collections, locations, shops, product images, and smart collections — nine metafield tables total). No shop-level-only compromise.
- Multi-store: one schema per brand (e.g.,
WOLF_SHOPIFY,CEB_SHOPIFY,CLEANOMIC_SHOPIFY). A unified analytics view exposesbrand,source_schema, andsource_platformcolumns so cross-store queries go through one view instead of hand-writtenUNION ALLjoins. - Historical backfill: 7+ years of history confirmed on live stores (earliest order
2018-11-26).
Who maintains it when something breaks. Definite handles the Shopify API version upgrades (including the 2026-04 → 2026-07 returns_* → sales_reversals_* rename), connector schema migrations, and lake catalog maintenance. You don't file a ticket with Fivetran plus a ticket with Snowflake plus a ticket with your BI vendor when something misfires at 11pm — one vendor shipped all the pieces, and that's who you page. Stripe, Klaviyo, Meta Ads, Google Ads, and NetSuite all land in the same DuckLake alongside Shopify, so blended CAC is a real query instead of a three-week project. DuckLake is open format — Parquet underneath, Iceberg-compatible, exportable. No proprietary trap.
Cost framing. Definite is credit-based with a real free tier and no per-seat or per-tool licensing — one bill replacing ETL + warehouse + BI contracts. Published plans: Free (5 credits/mo, 2 connectors), Standard $250/mo (or $230/mo annual, with 100 credits + $1/credit + $0.05/GB overage), and Enterprise (contact). For a realistic 50K-orders-per-month Shopify workload plus four or five other sources, expect Standard plus some credit overage — typically in the $250–$800/mo range, against a Fivetran-plus-Snowflake-plus-BI stack that usually runs $2,000+ at the same volume. Run the cost calculator with your actual numbers.
Best for: Teams with no existing warehouse. Teams who want one bill and one vendor. Teams where the ops person is quietly worried about becoming the pipeline person.
Not for: Teams already running Fivetran into a Snowflake they love. If that's you, skip to the Fivetran section.
Try Definite free — the free tier handles two connectors, enough to sync Shopify alongside Stripe and see the shape of queries before you commit to a real pilot on Standard.
Fivetran: if you already have a warehouse and a contract
If you already have a warehouse (Snowflake, BigQuery, Redshift) or a lake (S3 with Iceberg, Databricks Delta, a Trino cluster) and you already run Fivetran, adding Shopify is a one-click source.
How it works: Fivetran's Shopify connector is a Standard-tier managed ELT pipeline. You authenticate with Shopify, pick a destination (S3 + Iceberg, Delta on Databricks, Snowflake, BigQuery, etc.), and Fivetran handles the extraction, normalization, and incremental syncing. It's the quietest option in this list — once it works, you forget it exists.
When it fits: You already have a warehouse (or lake) and a Fivetran contract. You're adding sources, not building infrastructure.
When it doesn't: You don't have a warehouse yet. Fivetran is only the pipe — you still need the destination (Snowflake, BigQuery, or a full lake stack: object storage + catalog + query engine) and a BI layer on top. Teams routinely underestimate the total assembly cost here; that's half the reason the "I built a modern data stack and it's killing us" posts exist.
Pricing reality: Fivetran is Monthly Active Rows (MAR) with a $5 base per connection (1 to 1M MAR). Two things that specifically bite Shopify pipelines:
- Deletes now count toward paid MAR when Capture Deletes is enabled — a change that caught many Shopify users off guard. Shopify generates a lot of delete events (order cancellations, refund creations, draft-order cleanup, abandoned-checkout purges) and this inflates bills. If yours suddenly doubled, this is often why.
- MAR is hard to estimate for Shopify because every order update, refund, and metafield write is a MAR. Public benchmarks at 50K orders/month land in the $500–$3,000/month range depending on sync cadence and data churn, but don't trust a round number — run Fivetran's calculator with your actual update frequency.
Shopify fidelity: Solid on core streams — orders, line items, customers, products, inventory. Metafield coverage, refund handling, and backfill window by tier are not cleanly documented on the public docs as of research, so spot-check your own data before scoping downstream models against a Fivetran schema you haven't seen yet.
Best for: Teams already running Fivetran plus a working warehouse or lake.
Not for: Teams assembling from scratch. Read the Fivetran vs. Airbyte vs. Definite breakdown or the cost-effective Fivetran alternatives piece before you sign.
Airbyte: open-source leverage, Cloud or self-hosted
Airbyte is the open-source-flavored option — the connector ecosystem is broad, the Shopify source is maintained, and you can run it two ways depending on how much infrastructure you want to own.
Airbyte Cloud. Managed, hosted by Airbyte. Standard plan is $10/mo plus $2.50 per additional credit; Plus and Pro are sales-led. Setup is a paste-the-token affair: pick Shopify as the source, pick S3 (Parquet/Iceberg), Databricks, or Azure Data Lake as the destination, set a cadence. Operational reality: Airbyte Cloud dropped API Password auth for Shopify in favor of OAuth 2.0, so any pre-2025 guide you find on this is already stale.
Airbyte Self-Hosted (OSS). The open-source version. Free software, but you run it — Kubernetes or Docker, upgrades, sync-health monitoring, the works. The real cost is engineering time, not license. Teams that want control over connector behavior or need to fork a source for a custom metafield shape pick this path.
Gotcha: Airbyte's public Shopify → Apache Iceberg landing page currently 404s (verified April 2026). The connector path officially exists — Iceberg is a supported destination — but the marketing page doesn't resolve. If that page is your only evidence the path is stable, check the status directly in the Airbyte registry before committing.
Shopify fidelity: The current Shopify source connector (v3.3.0) ships with dedicated metafield streams for products, product images, product variants, collections, customers, draft orders, locations, orders, and fulfillment orders — not just shop-level metafields, which is a real upgrade over older connector versions. Line-item discount_allocations are listed explicitly in the orders schema. Older GitHub issues (2021–2022) flagged discount_allocations returning empty and refunds not unrolling correctly against line items; the current docs suggest those are resolved, but spot-check your own data before you trust it.
Best for: Teams with an open-source preference, teams where Airbyte is already running, technical teams that want the ability to fork a connector when a business-critical field doesn't unroll the way they need.
Not for: Teams that don't want to maintain connector infrastructure or debug someone else's. If the answer to "who runs Kubernetes here" is "me, alone, also I do everything else," Airbyte self-hosted becomes the 11pm problem in a different costume.
Custom GraphQL Bulk: cheapest until it isn't
The DIY path. Shopify's GraphQL Admin API supports a Bulk Operations primitive that's actually well-designed for warehouse-style extraction — you submit a bulk query, poll or webhook-subscribe for completion, download JSONL from a signed URL, convert to Parquet, land in your destination.
How it works:
# Kick off a bulk query against Shopify's GraphQL Admin API
requests.post(URL, headers=HEADERS, json={"query": """
mutation { bulkOperationRunQuery(query: "{ orders { edges { node { id lineItems { ... } } } } }") { bulkOperation { id } } }
"""})
# Poll currentBulkOperation (or subscribe to the bulk_operations/finish webhook)
# When status == COMPLETED, the response gives you a signed URL to JSONL results.
# Convert JSONL → Parquet, land in S3 / DuckLake / Iceberg. Repeat on a schedule.
For real-time event capture (not scheduled snapshot extracts), Bulk is the wrong hammer — you'd use Shopify webhooks into Kafka or Redpanda and land events continuously. The Bulk path is for catch-up backfills and periodic full or incremental pulls.
2026 improvements worth knowing:
- As of API 2026-01, 5 concurrent bulk operations per shop (was 1). This is a real unlock for parallel backfills of orders, customers, products, and inventory levels.
- Bulk results are stored for 7 days at the signed download URL. If your downstream fails for 8 days, you re-run the bulk op from scratch.
- The 1,000-point cost bucket with 50 pts/sec restore applies to the mutation that starts the bulk op. The bulk execution itself is exempt from the rate limit, which is why this primitive exists at all.
The hidden maintenance tax:
- API versioning is operational, not academic. Shopify releases a new API version every quarter with a 12-month minimum support window and 9-month overlap between consecutive versions. Currently supported: 2026-04 (through April 2027), 2026-07, 2026-10, 2027-01. Every pipeline you build is on a 12-month renewal cycle.
- Concrete renewal example, happening right now: API 2026-04 renamed the
returns_*analytic fields tosales_reversals_*. The old fields are removed in 2026-07. A custom script calling the old field names breaks on the 2026-07 cutover. You find out late on a Tuesday. - Schema drift (new fields appearing, deprecated fields vanishing) requires active monitoring, not a one-time "I wrote the extractor."
- Rate limits are GraphQL cost points, not QPS. The math is non-obvious.
- Incremental logic is yours to own —
updated_atcursors, webhook state, reconciliation against missed events. - Shopify Plus merchants get early-API access and higher quotas. Worth noting if you're a Plus shop; the Bulk semantics are the same, the headroom is different.
The 11pm problem. A custom GraphQL script is cheaper than Fivetran until the week Shopify ships a breaking API change and the only person who understands the extractor is on vacation. Who on your team gets paged? If the honest answer is "me, because I built it," read the next section twice.
Best for: Teams with dedicated data engineering headcount and a specific reason to own the code — a unique transformation, a compliance requirement, or an order volume where MAR pricing becomes structurally punitive.
Not for: Stack-builder-ops with no engineering team behind them. This is the path that turns analysts into pipeline babysitters.
The screenshot-ready comparison
The table below is the version to drop into your recommendation doc. Some cells carry genuine uncertainty — noted in the footnote — because the public docs for Fivetran's Shopify schema specifics aren't pinned down at research time.
| Definite | Fivetran | Airbyte Cloud | Airbyte Self-Hosted | Custom GraphQL Bulk | Google BigQuery Transfer | |
|---|---|---|---|---|---|---|
| Requires a lake/warehouse? | No (DuckLake included) | Yes | Yes | Yes | Yes | No (lands in BigQuery) |
| Line-item granularity | orders flat + line items / discounts / refund lines as JSON (UNNEST to flatten) | Core streams¹ | Full (v3.3.0)¹ | Full (v3.3.0)¹ | Whatever you query | Full |
| Metafields | 4 dedicated tables (products, variants, customers, orders) | Coverage not clearly documented¹ | Yes, dedicated streams | Yes, dedicated streams | Yes, you query them | Yes |
| Multi-store → unified view | Per-brand schemas + unified view (brand, source_schema, source_platform) | Separate schemas (you UNION) | Separate schemas (you UNION) | Separate schemas (you UNION) | Your code, your call | Separate datasets |
| Historical backfill | Full (7+ years verified) | Varies by tier¹ | Full | Full | Full (via Bulk) | Full |
| Sync frequency | Daily (Free) → hourly (Standard) → near real-time (Enterprise) | 1hr–24hr (plan-dependent) | 1hr–24hr (plan-dependent) | You schedule | You schedule | Daily minimum |
| Destinations | DuckLake (built-in) | S3/Iceberg, Delta, Snowflake, BigQuery, many more | S3/Parquet, S3/Iceberg², Databricks, ADLS | S3/Parquet, S3/Iceberg², Databricks, ADLS | Whatever you write to | BigQuery only |
| Est. $/mo at 50K orders | $250–$800 (Standard + overage) | $500–$3,000 (MAR-dependent) + your warehouse | $10+ plus credits + your warehouse | Engineering time + infrastructure | Engineering time + infrastructure | Free (Google's side); BigQuery storage/compute applies |
| Setup time | Minutes | Minutes (if warehouse exists) | Minutes (if warehouse exists) | Days | Days → weeks | Minutes |
| Who maintains it | Definite | Fivetran | Airbyte | You | You |
¹ Fivetran's Shopify cheatsheet wasn't publicly accessible at research time — spot-check your specific metafield needs and backfill window against Fivetran's docs before committing. ² Airbyte's Shopify → Iceberg landing page currently 404s. The destination path exists in the registry; check status.
Who owns what, in each path
Blue = vendor owns and maintains. Grey = you own and maintain. The Definite row is the architecture reframe in one picture — no separate warehouse or BI layer to assemble.
Shopify Gotchas Nobody Puts on the Landing Page
The questions that kill pilots. None of these are on page one of the SERP.
The Basic-plan gotcha. On Shopify Basic, custom-app API access to customers, orders, and fulfillment_orders streams silently degrades. Estuary's Shopify-native connector docs are the only vendor page that documents this explicitly — those streams won't be discovered. Shopify's own framing is softer: customer PII is "stripped" on Basic. Different wording, same broken pilot. If you're piloting on Basic, budget the upgrade to Grow (currently $105/month monthly / $79/month annual) before you build anything. This kills more Shopify pilots than any technical issue in this guide.
Every pipeline is on a 12-month renewal cycle. Shopify's API versioning calendar is quarterly. New version every three months; 12-month minimum support; 9-month overlap between versions. Currently supported are 2026-04 (through April 2027), 2026-07, 2026-10, and 2027-01. If you're on a managed connector, the vendor eats the upgrade work. If you wrote the extractor yourself, congratulations — you've signed up for a quarterly review loop forever. The 2026-04 → 2026-07 change removing the returns_* analytic fields is a worked example, not a hypothetical.
The ShopifyQL Admin GraphQL API is gone. The ShopifyQL Admin GraphQL API was sunset in API version 2024-07. If you remember pulling ShopifyQL query results over GraphQL into a warehouse, that path doesn't exist anymore. ShopifyQL is the in-admin editor and Sidekick-powered natural-language interface only. For warehouse extraction, use the standard Admin GraphQL (Orders, Web Pixels, Customers, etc.).
"Real-time Shopify CDC" has fine print. Shopify's Bulk Operations API is asynchronous — you submit a query, it processes on Shopify's side, results land at a signed URL when done. Best-case polling is seconds-to-minutes. Vendors advertising "sub-second Shopify CDC" are usually webhook-driven under the hood, and webhooks have their own delivery semantics (retries, ordering, missed deliveries you have to reconcile). Real-time is possible; the fine print is that "real-time" means "eventually-consistent within a few seconds, usually, if nothing upstream hiccups."
Fivetran's recent delete-pricing change. Fivetran now counts deletes toward paid MAR when Capture Deletes is enabled (docs) — a shift that caught users who remembered deletes being free. Shopify pipelines generate a lot of deletes (order cancellations, refunds creating void events, draft-order cleanup, abandoned-checkout purges), and if your bill jumped recently and you run Shopify through Fivetran, this is very likely why.
Four questions that pick the right option for you
- Who gets paged when sync breaks at 11pm? Most evaluation docs skip this. Every option in this guide has a different answer (Fivetran / Airbyte / you and your team / Definite), and that answer matters more than any feature cell in the comparison table. Lead with it.
- Do you already have a warehouse? If no, Definite collapses sync-plus-storage-plus-analytics into one decision. If yes, skip to Q4.
- What's your order volume and source count? Shopify-only at low volume keeps the native reporting path alive. Multi-source at any volume sends you to the managed-connector or custom-script paths.
- What's your engineering capacity? Zero-maintenance preferred → Fivetran or Definite. Open-source preferred → Airbyte. Dedicated data engineering and a reason to own the code → custom GraphQL Bulk.
FAQ
I have 2 Shopify stores, Stripe, Klaviyo, Meta Ads, Google Ads, and NetSuite. What's the cheapest way to query all of this with SQL?
The honest answer depends on what you're willing to own. If you have an engineering team and existing warehouse infrastructure, Airbyte self-hosted plus your own catalog and query engine is the lowest license cost — but the engineering time is real. If you don't, Definite is the lowest total cost because you're paying one bill for sync, storage, and analytics instead of stitching Fivetran + Snowflake/BigQuery + a BI tool. Run it through the cost calculator with your actual source list and volumes before committing either way.
Does Fivetran sync Shopify metafields?
Core objects yes, full coverage across every object is not cleanly documented in the public Fivetran docs as of research. Pull Fivetran's Shopify ERD / cheatsheet and verify against your specific metafield needs (product, variant, customer, order) before you scope downstream models against it. Airbyte's v3.3.0 connector explicitly ships dedicated metafield streams for most objects; Definite handles metafields on products, variants, customers, and orders.
Is it worth writing a custom Python script against Shopify's GraphQL Bulk API instead of paying Fivetran?
At 50K orders/month with a team of 1.5 data people: usually no. The Bulk API itself is well-designed — the problem is the maintenance tax. Every quarterly API version, every schema change, every rate-limit edge case, every reconciliation bug is yours to own. The custom-script path makes sense when you have dedicated data engineering headcount and a specific reason to own the code (unusual transformations, compliance, extreme cost sensitivity, a data volume where MAR pricing becomes structurally punitive). Without both, a managed connector is cheaper than the engineer-hours you'll sink into it.
What's the difference between Iceberg, Delta, and DuckLake for a Shopify analytics stack?
At Shopify volumes, the capability differences don't matter much — pick the format that matches your query engine. DuckDB-native shops pick DuckLake (or Iceberg). Databricks shops pick Delta. Trino shops pick Iceberg. DuckLake's v1.0 "data inlining" feature handles the small-files problem that Shopify's order-update churn otherwise creates; Iceberg and Delta workflows usually need a compaction job to handle the same pattern. None of this is a reason to switch engines if one is already working for you.
How do I model Shopify orders, line items, refunds, and discounts for contribution margin analysis?
The core joins: orders → line_items (one-to-many) → refund_line_items and discount_allocations (one-to-many on line_items). Contribution margin = line item net revenue − allocated discounts − refunds − COGS (from NetSuite or QuickBooks). If you're layering ad spend on top for blended analysis, the real ROAS across ad platforms piece walks the Shopify-plus-Meta-plus-Google attribution join. Which connector you use matters here because a connector that doesn't unroll discount_allocations forces you to rebuild half the schema yourself — hence the fidelity row in the comparison table above.
Can I sync two Shopify stores into one table with a store_id?
With Definite, each store gets its own schema (e.g., WOLF_SHOPIFY.orders, CEB_SHOPIFY.orders), and a pre-built unified analytics view joins them with brand, source_schema, and source_platform columns so cross-store queries go through one view instead of hand-rolled UNION ALL statements. With Fivetran and Airbyte, each store typically lands as its own schema and you're writing the UNION yourself in dbt or views. For the fuller multi-store analytics pattern, the multi-Shopify analytics guide covers it in depth.
If you don't have a warehouse yet and the above sounds like a lot of evaluation work to reach a foregone conclusion: it's because it is. For most teams with multi-source needs and no existing stack, Definite is the honest answer — one platform, one bill, one vendor to page when sync breaks. For teams with a working warehouse and a working Fivetran contract, adding Shopify is a one-click source and you don't need to read a blog post about it. For teams with engineering capacity and an opinion about how their extraction should work, the GraphQL Bulk path is genuinely good and now gets 5 concurrent operations per shop — just budget for the quarterly API renewal cycle.
If you want to see what your current stack actually costs versus the collapsed version, paste your stack here for an instant read. Or start free with Definite and sync Shopify alongside one other source — the free tier is enough to answer the question "would this actually handle our data?" before anyone signs anything.