MAISON CODE .
/ Data · CDP · Reverse ETL · Measurement · Architecture

The Composable CDP: Why Your Warehouse is the Source of Truth

Stop paying Segment $100k/year. A technical guide to the Composable CDP stack: Snowflake, dbt, and Hightouch (Reverse ETL).

AB
Alex B.
The Composable CDP: Why Your Warehouse is the Source of Truth

The “Customer Data Platform” (CDP) industry is one of the biggest rackets in SaaS. Tools like Segment, mParticle, or Salesforce CDP charge you based on “Monthly Tracked Users” (MTU). If a user visits your site once, you pay. If you have 10 million dusty emails in your database from 2015, you pay. Enterprise bills often exceed $200,000/year just to store data you already own.

In 2025, the best engineering teams are killing the Monolithic CDP. They are moving to the Composable CDP. The logic is simple: Your Data Warehouse (Snowflake/BigQuery) is the CDP. It is cheap, scalable, and you own it. You just need a pipe to move the data out of the warehouse to your marketing tools (Klaviyo/Meta). That pipe is Reverse ETL (Hightouch).

Why Maison Code Discusses This

At Maison Code Paris, we act as the architectural conscience for our clients. We often inherit “modern” stacks that were built without a foundational understanding of scale. We see simple APIs that take 4 seconds to respond because of N+1 query problems, and “Microservices” that cost $5,000/month in idle cloud fees.

We discuss this topic because it represents a critical pivot point in engineering maturity. Implementing this correctly differentiates a fragile MVP from a resilient, enterprise-grade platform that can handle Black Friday traffic without breaking a sweat.

1. The Architecture: Unbundling Segment

The Monolithic CDP does three things:

  1. Event Collection: analytics.track()
  2. Identity Resolution: Merging user_123 with cookie_abc.
  3. Activation: Sending audiences to Facebook Ads.

The Composable CDP splits this:

  1. Collection: Rudderstack (Open Source) or Snowplow.
  2. Storage: Snowflake (Cheap storage).
  3. Transformation: dbt (SQL logic).
  4. Activation: Hightouch (The “Reverse ETL”).
graph LR
    subgraph Sources
        Store[Shopify] -->|Fivetran| Warehouse
        Web[Web Events] -->|Rudderstack| Warehouse
    end
    
    subgraph Warehouse[Snowflake]
        Raw[Raw Tables] -->|dbt| Gold[Gold Customer Table]
    end
    
    subgraph Activation
        Gold -->|Hightouch| FB[Facebook Ads]
        Gold -->|Hightouch| Email[Klaviyo]
    end

2. The Power of SQL: Identity Resolution

In Segment, you are stuck with their Identity Graph logic. In Snowflake, you write the logic in SQL (dbt). You have infinite flexibility.

Scenario: You want to link “Offline Store Purchases” to “Online Web Browsing”. Segment struggles with this if the email doesn’t match perfectly. In dbt, you can write fuzzy matching logic.

-- models/gold/dim_users.sql
WITH web_users AS (
    SELECT DISTINCT email, cookie_id FROM raw.web_events
),
pos_users AS (
    SELECT email, phone, loyalty_card FROM raw.pos_transactions
)
SELECT
    COALESCE(w.email, p.email) as master_email,
    w.cookie_id,
    p.loyalty_card,
    -- Custom logic: If they bought in-store, they are VIP
    CASE WHEN p.loyalty_card IS NOT NULL THEN 'VIP' ELSE 'Standard' END as segment
FROM web_users w
FULL OUTER JOIN pos_users p ON w.email = p.email

You now have a gold.dim_users table which is the Single Source of Truth for the entire company.

3. Activation: Syncing to the Edge

Marketing tools (Klaviyo) are dumb databases. They need us to tell them who to email. Instead of building a custom python script snowflake_to_klaviyo.py (which breaks every week), we use Hightouch. Hightouch simply queries your Gold Table and maps the fields.

Query:

SELECT email, first_name, favorite_color
FROM dim_users
WHERE segment = 'VIP' AND last_purchase_date < NOW() - INTERVAL '90 DAYS'

Mapping:

  • email -> Klaviyo email
  • favorite_color -> Klaviyo custom_properties.color

Hightouch runs this every 15 minutes. It handles rate limits, retries, and API changes.

4. Operational Analytics: Slack Alerts

CDPs are usually “Marketing only”. But the Composable CDP serves Engineering and Sales too. We can use Hightouch to send data to Slack.

Use Case: High Value Failures If a user with LTV > $5000 gets a Payment Failed error. Standard Flow: User sees error. Leaves. We lose a VIP. Composable Flow:

  1. dbt models failures_last_hour.
  2. Hightouch syncs this to Slack channel #vip-support.
  3. Support Agent sees: “VIP Alex Failed Payment. Phone: 555-0199”.
  4. Agent calls Alex immediately. “Can I help you complete the order?”

This is Data Activation. It turns a massive database into actionable revenue.

5. Privacy and Governance (GDPR)

In a Monolithic CDP, deleting a user is a nightmare. You have to ask Segment to delete it, then hope they propagate it. In Composable, you delete the row in Snowflake. Hightouch detects the deletion (diff) and sends a DELETE request to Facebook, Google, and Klaviyo automatically. One query enforces GDPR across your entire stack.

Apple (Safari) kills client-side cookies after 7 days (ITP). If a user visits on Monday and returns next Wednesday, Segment thinks they are a New User. Your Attribution is broken. Server-Side Tracking fixes this. Because we control the domain (data.maisoncode.paris), we can set HttpOnly cookies that last 2 years. Rudderstack handles this out of the box. This recovers 20% of lost attribution for clients with high Apple traffic (Fashion/Luxury).

7. Identity Resolution Algorithms

How do you know user_123 is alex@gmail.com? There are two strategies:

  1. Deterministic: Exact match. (Email = Email). accuracy 100%. match rate 40%.
  2. Probabilistic: “Same IP + Same Device Model + Same Location”. accuracy 80%. match rate 90%. For CDPs, we prefer Deterministic. We do not want to email the wrong person. However, for Ad Targeting, we check Probabilistic. It’s okay if 10% of people see the wrong ad, if it means doubling your reach. Snowflake allows you to run both graphs simultaneously.

8. The Cost Equation

Let’s compare a client with 500k MTUs.

Segment (Business Plan):

  • Protocol: Included
  • Personas: Add-on
  • Total: ~$60,000 / year.

Composable Stack:

  • Rudderstack (Open Source): $0 (Hosted on AWS).
  • Snowflake: $500 / month (Storage + Compute).
  • Hightouch: $800 / month.
  • Total: ~$15,000 / year.

Savings: 75%. Plus, you own the data. If you cancel Hightouch, you still have your Snowflake tables. If you cancel Segment, you lose your graph.

7. The “Real-Time” Myth

Marketers love to scream: “We need Real-Time Personalization!” Engineers must ask: “Do you really?” Scenario A: User abandons cart.

  • Need: Send email in 1 hour.
  • Tool: Warehouse (Batch). Sufficient. Scenario B: User clicks “Red Shoes”. Homepage Hero should change to “Red Shoes” immediately.
  • Need: < 200ms latency.
  • Tool: Edge Middleware (Vercel/Cloudflare). The Warehouse is for Strategic Data (Email, Ads, Analysis). The Edge is for Tactical Data (UI Personalization). Don’t try to force Snowflake to do sub-second queries. That is not its job.

8. The Cost Trap of “Free” Analytics

Google Analytics 4 (GA4) is free. But it is sampled. And the BigQuery export can get expensive ($0.05 per GB queried). But compared to Adobe Analytics ($100k+), it is a steal. The Trap: Storing everything. Engineers tend to log mouse_move, scroll_depth_10%, scroll_depth_20%. This creates “Data Swamps”. Billions of rows of noise. Rule: Only track an event if you have a Business Question attached to it. “If we track scroll depth, what decision will we change?” If the answer is “None”, delete the tracking code. Save the bytes.

9. Conclusion

Data is gravity. The more data you put into a proprietary SaaS (Segment/Salesforce), the harder it is to leave. The Database is the only technology that has survived 40 years. Bet on SQL. Bet on the Warehouse. Build pipes, not silos.


Reducing Data Spend?

Are you paying for “MTUs” that don’t convert?

Build a Composable Stack. Read about Attribution SQL and Server-Side Tagging.

“But Segment is real-time. Snowflake is batch.” True. Data warehouses have latency (loading data + dbt build). Usually 15-30 minutes. If you need sub-second personalization (e.g., showing a popup based on the click they just did 1 second ago), the Composable CDP is too slow. Solution: Use Client-Side edge personalization (Edge Middleware) for the “Hot” path. Use Composable CDP for the “Cold” path (Email, Ads, Retention).

8. Conclusion

Data is gravity. The more data you put into a proprietary SaaS (Segment/Salesforce), the harder it is to leave. The Database is the only technology that has survived 40 years. Bet on SQL. Bet on the Warehouse. Build pipes, not silos.

Reducing Data Spend?

Are you paying for “MTUs” that don’t convert?

Hire our Architects.