DuckLake: A Pretty Good Lakehouse on GCP in an Afternoon

We run DuckLake in production at Definite, so the engine side is well-trodden for us. What I wanted was a clean reference setup on GCP and a like-for-like benchmark against Snowflake: AlloyDB as the catalog, GCS for parquet, Cloud Run for compute. Stood it up over a weekend, loaded TPC-H SF=100 (867M rows), and ran the 22 canonical queries.
The stack
Three pieces:
| Layer | Service | Role |
|---|---|---|
| Catalog | AlloyDB (2 vCPU, Zonal) | Postgres DB holding DuckLake metadata |
| Storage | Cloud Storage bucket | Parquet files, via DuckDB's httpfs extension with GCS HMAC keys |
| Compute | Cloud Run | Python + FastAPI + DuckDB with ducklake, postgres, httpfs, tpch extensions |
The Cloud Run container does one thing at startup:
INSTALL ducklake; INSTALL postgres; INSTALL httpfs;
LOAD ducklake; LOAD postgres; LOAD httpfs;
CREATE SECRET gcs_hmac (TYPE gcs, KEY_ID '...', SECRET '...');
ATTACH 'ducklake:postgres:dbname=ducklake_catalog host=... port=6432 user=postgres password=...' AS lake
(DATA_PATH 'gs://your-bucket/');
USE lake;
Cloud Run reaches AlloyDB's private IP via Direct VPC egress (no Serverless VPC Connector required anymore). GCS goes over the internet through the S3-compat layer. Idle cost: near zero with min-instances=0. Active cost at 8 vCPU / 32 GiB: about $1/hour.
What trips up first-timers
If you've run DuckLake before, skip this section. If you haven't, three things will cost you about an hour each on the first pass.
1. AlloyDB Managed Connection Pooling: nested flags map
AlloyDB docs say default max_client_conn is 5,000, configurable up to 262,042. True, but the payload shape is undocumented in the examples. Every snippet in the wild (including the gcloud alpha --connection-pooling-max-client-connections=5000 flag) produces something like {"connectionPoolConfig": {"maxClientConn": "5000"}} and gets silently ignored by the v1 API.
The correct shape:
{
"connectionPoolConfig": {
"enabled": true,
"flags": {
"pool_mode": "transaction",
"max_pool_size": "50",
"max_client_connections": "5000"
}
}
}
Keys inside flags are snake_case strings. Values are strings, even numeric ones. pool_mode is lowercase. Send it any other way and the pooler stays stuck at PgBouncer's default 100 concurrent clients, while the instance happily reports connectionPoolConfig.enabled=true.
With the right shape, a single Cloud Run container opens 1,000+ concurrent clients through the pooler and backend Postgres connections hold around 100. 10x client-to-backend reduction.
2. DuckLake 1.0 data inlining
New users load a test table, do a small INSERT, open the bucket, and find it empty. DuckLake 1.0 inlines small writes into the catalog Postgres and flushes to parquet later, either when they hit a size threshold or when you call ducklake_flush_inlined_data('lake').
This is a deliberate tradeoff to avoid tiny parquet files in object storage (which are awful for read performance), but the empty bucket surprises people the first time.
3. dbgen(sf, step, children) is 0-indexed
Anyone chunking a TPC-H load will write for step in range(1, 11): and discover afterward that every table is 10% short. nation has 22 rows instead of 25, region has 4 instead of 5. CALL dbgen(sf=100, step=N, children=K) wants N in 0..K-1, not 1..K. Step 10 with children=10 is out of range and produces nothing silently. range(CHUNKS) fixes it.
TPC-H SF=100 results
Data loaded via a Cloud Run Job (8 vCPU, 32 GiB, 30 minutes). Queries run sequentially through the public Cloud Run service, timed client-side, against a fresh revision for the cold case.
| Engine | SF=100 total (s) | Note |
|---|---|---|
| Snowflake X-Small (native, warm) | ~102 | Published |
| BigQuery native, first run | ~136 | Published |
| DuckDB local parquet, 4 vCPU / 32 GB | 166 | r6id.xlarge, published |
| DuckDB local parquet, 8 vCPU / 64 GB | ~240 | Published |
| Our stack, 8 vCPU / 32 GiB, warm best | ~410 | |
| Databricks 2x-Small (Delta) | ~700 | Published |
| Our stack, warm median | 912 | |
| DuckDB local parquet, 2 vCPU / 16 GB | 571 | r6id.large, published |
| Our stack, cold first run | ~1,800 |
About 4x slower than Snowflake X-Small on native format. Beats Databricks 2x-Small on Delta. Same ballpark as local-disk DuckDB at similar core counts. Against Snowflake reading external Iceberg parquet (historically 2-3x slower than their native format), the gap closes to roughly 1.5x.
Cold runs pay the biggest tax. Remote parquet from GCS, first access pays for DuckLake metadata fetches from AlloyDB and per-file GCS opens. Warm runs reuse DuckDB's file cache and hash tables, which is where most of the speedup lives.
What moved the needle
In order of impact:
- Cloud Run RAM 16 → 32 GiB: 4-8x faster on multi-way joins (Q3, Q4, Q13). At 16 GiB those queries spilled to disk and warm runs didn't help because DuckDB had to rehydrate hash tables every time. At 32 GiB, everything fit. Biggest single-variable win.
- Managed Connection Pooling configured correctly: 100 → 1,000+ concurrent clients. Doesn't speed up TPC-H (single-query workload), but it's the difference between this stack scaling past a handful of users and not.
- File compaction (
CALL ducklake_merge_adjacent_files(...)): 21x faster on pure-scan queries like Q6, marginal on join-heavy ones. The 10-chunk load leftlineitemwith 51 parquet files. Compaction plusducklake_cleanup_old_filescut total files from 122 to 65.
Permissions: DuckLake's biggest gap
DuckLake doesn't ship with access control. By design: enforcement lives in the two systems it leans on (the catalog database and the object store).
On the catalog side, AlloyDB gives you standard Postgres 15 GRANT/REVOKE and ENABLE ROW LEVEL SECURITY. The official DuckLake role pattern:
-- reader: can ATTACH and SELECT
CREATE USER ducklake_reader WITH PASSWORD '...';
GRANT USAGE ON SCHEMA public TO ducklake_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO ducklake_reader;
-- writer: can also INSERT/UPDATE/DELETE
CREATE USER ducklake_writer WITH PASSWORD '...';
GRANT USAGE ON SCHEMA public TO ducklake_writer;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO ducklake_writer;
Postgres RLS on catalog tables like ducklake_table and ducklake_column can hide metadata based on the connecting user:
ALTER TABLE ducklake_table ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_tables ON ducklake_table
FOR SELECT USING (schema_name = current_setting('app.tenant'));
The catch: the catalog only tells DuckDB which parquet files exist. DuckDB reads those files directly from GCS with its own HMAC secret. If that secret leaks, every policy on the catalog is bypassable. Catalog RLS hides metadata. It doesn't enforce data access.
What actually contains a tenant:
- Per-tenant schema or catalog DB. Each tenant gets its own connection string and its own
ATTACHalias. Simplest model. Scales to a few dozen tenants cleanly. - Per-tenant GCS prefix plus scoped IAM. Separate HMAC keys per tenant, bucket policies that only allow reads under
gs://.../tenant=X/. Combined with the catalog split, that's a real boundary. - Proxy the queries. Clients never hold the catalog credential or the HMAC. An API in front authenticates the user, rewrites the SQL (
WHERE tenant_id = ?), and returns results. This is what we run at Definite.
Row-level filtering is table stakes for multi-tenant analytics. DuckLake deliberately leaves that to the layers underneath, so access control has to be designed in from day one.
Where it nets out
You're not going to beat Snowflake's native columnar format on cold first-run queries.
You can get within 2-4x of Snowflake on the same public benchmark using only managed GCP services, on a box that idles at $0, for 5-10% of the monthly cost.
The lakehouse pattern is simple: columnar storage on object store, fast single-node engine, metadata in boring Postgres. DuckLake is the cleanest implementation I've seen. It works on AWS, Azure, or GCP. It works on commodity hardware. DuckLake 1.0 shipped with a stable on-disk format.
If you're running a 5-figure Snowflake bill and most of your queries would fit in 32 GiB of RAM, this shape is worth a Saturday.