The Best ETL Tools for Salesforce in 2026: A Decision Framework
Definite Team

If you Google "best ETL tools for Salesforce" in 2026, most of the top results recommend tools that no longer exist.
Blendo — acquired by RudderStack, product shut down. Stitch — deprioritized under Qlik after the Talend acquisition. Half the "11 best Salesforce ETL tools" listicles from 2021 are actively misleading.
The landscape looks nothing like it did five years ago. ETL (transform before loading) gave way to ELT (load first, transform in the warehouse). "Loading data into Salesforce" is now called reverse ETL and has its own category of tools. Salesforce itself launched Data Cloud — a native data platform that didn't exist when those guides were written. Open-source tools like Airbyte and dlt weren't on the map. And the survivors are consolidating fast — Fivetran merged with dbt Labs, then acquired Census for reverse ETL and Tobiko Data for advanced transformations.
The right Salesforce ETL tool in 2026 depends entirely on who you are and what you're optimizing for. This guide is organized by that — not by alphabet.
| Tool | Type | SF Direction | Best For | OSS | Pricing |
|---|---|---|---|---|---|
| Definite | All-in-one platform | Source | Startups & SMBs wanting insights fast | — | Free tier; $250/mo Platform tier |
| Fivetran | Managed ELT | Source + reverse (Census) | Teams with existing warehouses | — | MAR-based (consumption) |
| Airbyte | Open-source / Cloud ELT | Source | DIY teams wanting flexibility | ✅ | Free (OSS) or credits-based |
| dlt | Python EL library | Source | Python devs, LLM-assisted pipelines | ✅ | Free (open source) |
| Hightouch | Reverse ETL | Destination | Teams pushing warehouse data → SF | — | Usage-based |
| Apache Airflow | Orchestration | Source | Eng teams needing custom workflows | ✅ | Free (open source) |
| Salesforce Data Cloud | Native platform | Bi-directional | Salesforce-centric orgs | — | Salesforce add-on |
| Salesforce Data Loader | Native CLI/GUI | Bi-directional | SF admins, bulk operations | ✅ | Free (GitHub) |
| Singer / Meltano | DIY scripting | Source | Non-standard sources, max control | ✅ | Free (open source) |
If You Want Answers Tomorrow, Not Next Quarter
For startups and SMBs, the real cost of Salesforce ETL isn't the tool — it's the time, complexity, and engineering burden of assembling a multi-tool pipeline. You need an ELT tool to pull data out of Salesforce, a data warehouse to store it, a BI platform to visualize it, maybe a semantic layer for governed metrics, maybe dbt for transformations — and someone to maintain all of it.
By the time you've stitched that together, you've spent weeks (or months) and thousands of dollars — before answering a single business question about your pipeline, your deals, or your revenue.
That's the problem all-in-one platforms solve.
Definite
Definite is a complete data platform — ingestion, storage, modeling, visualization, and AI in one system — that replaces the fragmented stack entirely.
- Salesforce connector — OAuth (Connected App), REST/Bulk/Bulk 2.0 APIs, standard and custom sObjects, incremental syncing every 5–15 minutes, automatic schema drift detection. Requires Enterprise Edition or higher (same as Fivetran and Airbyte — Salesforce Professional Edition doesn't expose the API).
- Built-in warehouse — powered by DuckDB's columnar engine. No Snowflake or BigQuery bill.
- Governed semantic layer — define "pipeline velocity" or "win rate" once in Cube.dev; every dashboard, alert, and AI query uses the same definition.
- AI analyst (Fi) — ask "Which deals closed fastest this quarter?" in plain English, get an answer grounded in governed metrics.
- Full SQL access — join Salesforce data with Stripe, HubSpot, Postgres, or any of 500+ sources.
Setup takes under 30 minutes. Managed syncs include monitoring and alerting — if a connector fails, you get notified, not surprised. Direct support, not a community forum.
What about Salesforce Data Cloud? Powerful for Salesforce-native workflows, but if your analytics span Stripe, your product database, Google Ads, and CRM — you need a platform that unifies all of it. More on Data Cloud below.
Best for: Startups and SMBs that need Salesforce data alongside everything else, don't have a data engineer, and want one platform instead of four.
If You Want Managed Pipelines Without Building Everything
If you already have a data warehouse (Snowflake, BigQuery, Redshift) and a BI tool, you may just need a reliable way to extract data from Salesforce and land it in that warehouse. That's where managed ELT platforms come in.
Fivetran
Fivetran is the market leader in managed ELT. Its Salesforce connector uses the Salesforce REST and Bulk APIs to extract data from standard and custom objects, with incremental syncing that captures only changes since the last pull.
- 500+ connectors, fully managed and maintained
- Syncs via REST and Bulk APIs with incremental updates every 5 minutes on paid plans (6 hours on free). Uses Bulk API for large objects, which keeps Salesforce API consumption low — important if Pardot or other integrations compete for your org's daily call limit.
- Automatic schema drift handling — Fivetran adapts when your Salesforce admin adds custom fields or objects
- Pre-built analytics-ready data models for Salesforce (opportunity history, lead conversion funnels, activity timelines)
- Merged with dbt Labs, combining ingestion and transformation under one company
- Reverse ETL built in — after acquiring Census, Fivetran can now push data back into Salesforce from your warehouse (lead scoring, enrichment, audience syncs)
The caveat: Fivetran uses Monthly Active Rows (MAR) pricing. Salesforce orgs with high-volume objects — Activities, Events, EmailMessage — can easily 3x your expected MAR count. And Fivetran is just the ingestion layer — you still need a warehouse ($200–500/month), a BI tool ($200–500/month), and potentially a semantic layer. The full stack cost adds up fast. Requires Salesforce Enterprise Edition or higher for API access.
Best for: Data teams with an existing warehouse and budget for consumption-based pricing who want reliable, zero-maintenance Salesforce extraction at scale.
Airbyte Cloud
Airbyte started as an open-source project and has grown into one of the two dominant ELT platforms. Airbyte Cloud is the managed version — you get the same connector catalog without running infrastructure.
- 350+ connectors with a strong Salesforce source connector (REST API–based, supports standard and custom objects, incremental syncing)
- Credits-based pricing (more transparent than MAR, but still usage-dependent)
- Growing community, active connector development
- Option to start on Cloud and migrate to self-hosted later if costs matter
- Open source — the entire codebase is on GitHub, so you can inspect exactly how the Salesforce connector works
The caveat: Same as Fivetran — Airbyte Cloud moves data but doesn't store or visualize it. You're still assembling a multi-tool stack. Also requires Salesforce Enterprise Edition or higher.
Best for: Teams that want managed ELT with the flexibility to self-host later, and who are comfortable building the rest of the stack.
A Note on Stitch
You'll still see Stitch recommended in older Salesforce ETL guides. Stitch was acquired by Talend in 2018, Talend was acquired by Qlik in 2023, and Stitch has been progressively deprioritized since. Its free tier was eliminated, feature investment has stalled, and many users have migrated to Fivetran or Airbyte. If you're evaluating Salesforce ETL tools today, look elsewhere.
If You Have Engineers and Want Control
Some teams want — or need — to own their Salesforce data pipelines. Maybe you have strict compliance requirements, need to transform data before it lands, or simply prefer code over configuration. Here's what the code-first landscape looks like in 2026.
dlt (data load tool)
dlt is a Python-first, lightweight ELT library that's become the fastest-growing open-source data loading tool. It has a verified Salesforce source that handles objects, SOQL queries, and incremental loading.
import dlt
from dlt.sources.salesforce import salesforce_source
pipeline = dlt.pipeline(
pipeline_name="sf_pipeline",
destination="duckdb",
dataset_name="salesforce_data",
)
source = salesforce_source()
pipeline.run(source)
pip install dltand go — no backends, no containers, no orchestration platform required- Works inside Jupyter notebooks, Cursor, and any AI code editor
- 8,800+ supported sources — many generated via LLM-assisted pipeline building
- 3M+ PyPI downloads, 6,000+ companies in production
- Open source (Apache 2.0)
dlt is purpose-built for the era of AI-assisted development. You can describe what Salesforce objects you need to an LLM, and it can scaffold a working dlt pipeline. That's a fundamentally different workflow than clicking through connector UIs.
Best for: Python-savvy teams who want to write Salesforce pipelines as code, especially those leveraging LLMs for development.
Airbyte OSS (Self-Hosted)
Airbyte's open-source version gives you the same Salesforce connector as Airbyte Cloud but running on your own infrastructure.
- Full control over data — nothing leaves your network
- Free forever (you pay for infrastructure, not licenses)
- Same 350+ connectors, same Salesforce source
- Requires Docker or Kubernetes and ongoing ops maintenance
- Open source (MIT / ELv2)
The trade-off: "Free" means free of license cost, not free of engineering time. You'll need someone to handle deployment, upgrades, monitoring, and scaling. For teams with DevOps capacity, it's a strong choice. For teams without, the hidden cost is real.
Best for: Engineering teams with DevOps capacity who want open-source flexibility and full data control.
Apache Airflow + Salesforce Provider
Apache Airflow is the gold standard for workflow orchestration. The apache-airflow-providers-salesforce package is actively maintained and gives you operators for:
- Querying Salesforce via SOQL
- Extracting data from Salesforce objects to your warehouse
- Triggering Salesforce API calls as part of broader data pipelines
- Scheduling and monitoring with full DAG visibility
Airflow is most useful when Salesforce extraction is one piece of a larger orchestrated pipeline — not as a standalone ETL tool. You'll pair it with a destination (warehouse) and typically dbt for transformations.
- Open source (Apache 2.0)
- Managed options available via Astronomer or cloud-provider services (MWAA, Cloud Composer)
Best for: Teams already running Airflow who need to add Salesforce as a source within existing DAGs. Not a starting point for teams without orchestration infrastructure.
Singer / Meltano
The Singer specification defines a standard for ETL scripting: taps extract data, targets load it. There are Singer taps for Salesforce (both REST and Bulk API variants). Meltano provides a modern CLI, orchestration, and managed Cloud offering on top of Singer taps and targets.
- Maximum flexibility — write or fork taps for any Salesforce object or custom logic
- Maximum responsibility — you own the code, the orchestration, and the maintenance
- Meltano adds structure and deployability to what was previously a DIY scripting ecosystem
- Open source (MIT)
Best for: Teams with highly custom Salesforce integration needs who are comfortable writing and maintaining pipeline code.
If You Need to Push Data INTO Salesforce
The 2021 guides called this "loading data into Salesforce." In 2026, the industry calls it reverse ETL — syncing enriched, modeled data from your warehouse back into operational tools like Salesforce. Think: pushing lead scores, customer health metrics, or audience segments into Salesforce fields so your sales team can act on analytics data without leaving their CRM.
Hightouch
Hightouch is the leading standalone reverse ETL platform. It connects to your data warehouse (Snowflake, BigQuery, Redshift, Postgres, Databricks) and syncs modeled data into Salesforce — and 200+ other destinations.
- Model-based syncing: Define audiences, lead scores, or enrichment logic in SQL in your warehouse, then map those results to Salesforce objects (leads, contacts, accounts, custom objects)
- Visual audience builder for non-technical users
- Real-time and scheduled sync modes
- Field-level mapping with conflict resolution
Hightouch is the right tool when your analytics warehouse is your source of truth and you need to operationalize that data inside Salesforce. If you're computing lead scores, building account health models, or segmenting customers in your warehouse, Hightouch pushes those insights where your sales team actually works.
Best for: Teams with a data warehouse who want to operationalize analytics data inside Salesforce without building custom integrations.
A Note on Census, Jitterbit, and dataloader.io
Census was the other major reverse ETL player — acquired by Fivetran and folded into their platform. If you're already on Fivetran, reverse ETL is now a built-in capability. No need for a separate vendor.
Jitterbit Data Loader is still free on the Salesforce AppExchange for basic CSV → Salesforce imports. It's useful for admins doing one-off bulk loads, not for production data pipelines.
dataloader.io persists as Salesforce's MuleSoft-powered web-based data loader, but Salesforce has been steering users toward MuleSoft Composer as the modern low-code alternative for simple integration flows.
Salesforce's Own Tools
Before reaching for third-party tools, know what Salesforce itself offers natively. The platform has invested heavily in reducing the need for external ETL — especially for orgs that live primarily inside the Salesforce ecosystem.
Salesforce Data Cloud
Salesforce Data Cloud is Salesforce's native data platform — formerly called Customer Data Platform (CDP). It's designed to harmonize data from Salesforce CRM, Marketing Cloud, Commerce Cloud, and external sources into a unified customer profile.
- Native connectors to ingest data from outside Salesforce (cloud storage, streaming, databases)
- Einstein AI built in for segmentation, next-best-action, and predictive analytics
- Zero-copy data sharing with Snowflake and Databricks
- Real-time data processing and activation
The caveat: Data Cloud is powerful for Salesforce-centric organizations. But it's an add-on with enterprise pricing, and it's fundamentally designed to serve the Salesforce ecosystem. If your analytics needs span well beyond CRM — product usage, payments, marketing attribution, support tickets — Data Cloud becomes one piece of a larger puzzle rather than the whole solution.
Best for: Large organizations whose business processes are centered on Salesforce and who want to enrich CRM data with external signals without leaving the Salesforce ecosystem.
Salesforce Data Loader
Salesforce Data Loader is a native client application for bulk importing and exporting data between Salesforce objects and CSV files or database connections.
- GUI and CLI modes — cross-platform via Zulu OpenJDK (no longer Windows-only)
- Handles millions of records via the Bulk API
- Drag-and-drop field mapping, logging, and scheduling
- Open source on GitHub (BSD license)
- Requires the SOAP API, which is only available with Enterprise, Unlimited, and Developer editions
Data Loader is the right tool for Salesforce admins who need to run bulk data operations — importing leads from a CSV, mass-updating fields, or exporting objects for offline analysis. It's not a pipeline tool; it's a utility.
Best for: Salesforce admins running bulk import/export operations. Not a substitute for a data pipeline.
The Real Payoff: Joining Salesforce With Everything Else
Every tool in this guide gets data out of Salesforce. The real question is what you do with it once it's out.
Salesforce's native reporting is limited to CRM data. The moment you need to answer "what's our bookings-to-cash ratio?" (Salesforce Opportunities vs. Stripe payments), or "which marketing channels drive the highest-value deals?" (HubSpot attribution vs. Salesforce pipeline), or "do high-usage customers renew at higher rates?" (product database vs. Salesforce renewals) — you need Salesforce data in a place where it can be joined with other sources.
That's what ETL into a warehouse or all-in-one platform unlocks: cross-source SQL joins.
The hardest part isn't the join syntax — it's key resolution. Salesforce Account IDs won't match your internal database's customer IDs. Your options:
- Deterministic keys (best): email address, company domain, or an external ID field you populate in Salesforce. These match reliably.
- Fuzzy matching (fragile): company name matching breaks on "Acme Inc." vs. "Acme, Inc." vs. "ACME Corporation." Avoid this as a primary strategy.
- Semantic layer mapping: Once you've established key relationships, a governed semantic layer lets you define cross-source metrics ("net revenue retention," "pipeline-to-close ratio") once and reuse them across every dashboard.
If you're evaluating Salesforce ETL tools, test this early: connect Salesforce and one other source (Stripe is a good candidate), try to join them on a shared key, and see how painful it is. That exercise tells you more about the platform than any feature checklist.
How to Choose: The Decision Framework
The Salesforce ETL landscape has consolidated dramatically. In 2021, you had 11+ point tools to evaluate. In 2026, the real question is how much infrastructure you want to own — and whether you're pulling data out of Salesforce, pushing data in, or both.
| Your Situation | Best Starting Point | Time to First Insight |
|---|---|---|
| Startup, need Salesforce + everything else analyzed, no data engineer | Definite | 30 minutes |
| Have a warehouse, want managed no-code Salesforce extraction | Fivetran or Airbyte Cloud | Days to weeks |
| Python team, want lightweight code-first pipelines | dlt | Hours to days |
| Eng team, want full open-source control | Airbyte OSS + Airflow | Days to weeks |
| Need to push warehouse data INTO Salesforce | Hightouch or Fivetran (Census) | Days |
| Salesforce-only org, want native analytics | Salesforce Data Cloud | Weeks |
| SF admin, one-off bulk loads | Salesforce Data Loader | Hours |
| Maximum DIY, non-standard sources | Singer / Meltano | Days |
What It Actually Costs
The sticker price of your ETL tool is never the full cost. Three things consistently blindside teams:
-
Salesforce MAR inflation. Consumption-priced tools (Fivetran, Airbyte Cloud) charge by rows synced. Salesforce orgs generate far more rows than people expect — Activity, Event, and EmailMessage objects can easily 3x your projected bill. A 50-user org that looks like 200K records on paper often syncs 600K+ MAR when you include activity history.
-
Warehouse compute creep. Snowflake and BigQuery charge per query. Once your team has dashboards they love, query volume grows faster than data volume. A dashboard that refreshes every 15 minutes across 5 filters runs thousands of queries per month — and that's just one dashboard.
-
Engineering time on "free" tools. Open-source options (Airbyte OSS, dlt, Airflow) cost $0 in licenses but 10–20 hours/month in maintenance, monitoring, and upgrades. At $150/hr fully loaded, that's $18,000–36,000/year of engineer time that could ship product.
Here's the full-stack math for a mid-size startup (~50 Salesforce users, ~200K records, 5–10 data sources):
| Approach | Tool Cost (Annual) | You Also Need | Estimated Total (Annual) |
|---|---|---|---|
| Definite | ~$12,000 | Nothing — warehouse, BI, semantic layer, and AI included | ~$12,000 |
| Fivetran + warehouse + BI | ~$12,000–18,000 (Fivetran MAR) | Snowflake/BigQuery ( | ~$20,000–31,000 |
| Airbyte Cloud + warehouse + BI | ~$6,000–12,000 (credits) | Same as Fivetran | ~$14,000–25,000 |
| dlt + warehouse + BI | Free | Warehouse + BI + engineering time (10–20 hrs/mo) | $8,000–15,000 + eng time |
| Airbyte OSS + warehouse + BI | Free | Infrastructure (~$2,400–4,800) + warehouse + BI + DevOps time | $10,000–18,000 + eng time |
| Salesforce Data Cloud | Enterprise add-on (varies widely) | Potentially BI tool for non-SF data | $24,000–60,000+ |
These are estimates, not quotes — your numbers will vary by volume, frequency, and stack choices. Use our data stack cost calculator for a personalized breakdown.
The pattern: the less infrastructure you manage, the faster you get to insight — but the more you pay in platform fees. The more control you want, the more engineering time you invest. For most startups and SMBs, the math favors the all-in-one approach. For engineering-heavy organizations with specific requirements, the code-first tools are genuinely excellent — and better than they've ever been.
Salesforce ETL FAQ
Do Salesforce ETL connectors pull custom objects, fields, and add-on data (CPQ, Pardot)?
All major connectors in this guide — Definite, Fivetran, Airbyte, and dlt — sync standard and custom sObjects, including custom fields and add-on objects like SBQQ__Quote__c (CPQ) or Field Service Lightning, as long as they're API-accessible. Edge cases to verify: compound fields (BillingAddress) are typically flattened, multi-select picklists arrive semicolon-delimited, and formula fields sync as computed values. Pardot is the exception — it has its own API and requires a separate connector. Check your Salesforce edition before evaluating any tool: Professional Edition doesn't include API access by default, which blocks all third-party ETL.
How fresh will my Salesforce data be — can I get near real-time?
Most managed platforms (Fivetran, Airbyte Cloud, Definite) offer incremental syncs down to every 5–15 minutes — sufficient for daily standups and pipeline reviews. Initial backfills for a ~500K record org typically take 1–4 hours. True real-time streaming requires Salesforce Platform Events or Change Data Capture, which only Data Cloud and custom Airflow/Kafka setups support natively.
What happens with Salesforce API rate limits?
Well-built connectors use the Bulk API — which consumes far fewer calls than REST — and implement backoff logic near the limit. A 50-user Enterprise Edition org gets 1,000,000 API calls/day; incremental syncs on 10–20 objects rarely use more than a few thousand. The risk is real only if Pardot, external integrations, and other tools compete for the same pool. Check your current usage in Salesforce Setup → Company Information before connecting a new ETL tool.
How do I define governed metrics like "pipeline" or "win rate" on top of Salesforce data?
Use a semantic layer — Cube, Looker's LookML, MetricFlow, or the built-in layer in all-in-one platforms. Define "pipeline" as a filtered aggregation once (sum of Opportunity Amount where Stage is not "Closed Lost"), and every dashboard uses that definition. This matters because Salesforce's Amount field is notoriously inconsistent across reps — a governed metric layer applies normalization once instead of per-report.
What does migration look like if I'm already running Fivetran + Snowflake + Metabase?
You don't have to switch all at once. Run your existing stack in parallel during a 2–4 week pilot — connect Salesforce to the new platform, rebuild 2–3 key dashboards, and compare output. If you've built dbt models, the SQL logic translates to any platform's transformation layer. The harder part is dashboards: no universal import format exists, so budget 1–2 days per complex dashboard. If the pilot fails, your existing stack is untouched.
How does pricing scale as our Salesforce data grows?
Flat-rate platforms (like Definite) don't charge more when you sync additional objects or run more queries. Consumption-priced tools scale with volume: Fivetran's MAR pricing grows linearly with row count, and warehouse compute (Snowflake, BigQuery) grows with query frequency. If your org is growing fast — adding reps, expanding pipeline, logging more activity — model your projected data volume at 6 and 12 months, not just today's snapshot. See the cost breakdown above for current estimates.
Can my non-technical CEO and sales managers use the dashboards independently?
It depends on the tool. Native Salesforce reports are self-service but limited to CRM data. Metabase and Looker offer drag-and-drop exploration, but cross-source dashboards usually require SQL. All-in-one platforms aim to close that gap with spreadsheet-like interfaces and governed semantic layers that prevent wrong-number problems. The real test during any pilot: have a non-technical stakeholder try to answer a question on their own. If they can't, you'll be the bottleneck regardless of what the marketing page says.
How do I run a proof-of-concept I can demo to my exec team this week?
Start with one dashboard, not a full migration. Connect Salesforce, build your most-requested report (pipeline by stage, by rep, by period), and cross-reference 3–5 numbers against native Salesforce reports to validate accuracy. Managed platforms take 30 minutes to a few hours; code-first tools (dlt, Airbyte OSS) need about a day including infra setup. Use your production org, not sandbox — sandbox data is stale and won't reflect real pipeline numbers, which undermines the demo. If numbers don't match, check for hidden Salesforce report filters by record type or owner.
What about security — can I control which Salesforce fields get synced?
All major tools authenticate via Salesforce OAuth (Connected App) — strongly preferred over username/password because it respects session policies. For PII, the first line of defense is Salesforce itself: create a dedicated integration user with field-level security that excludes sensitive fields, and the ETL tool can only sync what that user can see. Some platforms also offer field selection at setup; others sync everything and let you filter post-load. On the platform side, verify SOC 2 compliance, encryption at rest, and data residency options (US vs. EU) — especially for GDPR. For deletion propagation, incremental syncs with hard-delete detection will remove records downstream, but expect a sync-interval delay.
What could your data tell you?
Enter your domain and we’ll show you the business questions your tools can already answer — you just can’t ask them yet.
Try it with any company domain — no signup required.
The Definite Advantage
If you've read this far and thought "I just want to connect Salesforce and start getting answers" — that's exactly what Definite is built for.
Unified: All your data in one place. 500+ connectors bring Salesforce alongside Stripe, HubSpot, Postgres, your product database, and every other tool your team runs — without a separate ETL tool, warehouse, or BI platform.
Simple: True self-service. The Canvas works like a spreadsheet. The governed semantic layer ensures everyone sees the same pipeline velocity, win rates, and revenue numbers. Your team doesn't need SQL expertise or data engineering skills.
AI-Powered: Ask "What's driving churn this quarter?" in plain English and get an instant answer. Fi, Definite's AI analyst, summarizes trends, finds anomalies, and automates reports — no prompt engineering required.
Open: Built on open standards — DuckDB, Iceberg/Parquet, Cube.dev. Export your data and queries anytime. No vendor lock-in.
Get started with Definite — 30 minutes from signup to Salesforce analytics. Or request a demo to see how it compares to building your own Salesforce ETL pipeline.