Attribution Modeling with SQL: Beyond "Last Click"
GA4 Default Channel Grouping is lying to you. How to write raw SQL in BigQuery to calculate Time-Decay and Position-Based attribution.
Your Dashboard says: “Email Revenue: $50k”. Your Facebook Manager says: “FB Revenue: $80k”. Your Shopify Total is: “$100k”. The math doesn’t add up. Why? Double Attribution. Both Facebook and Email claim credit for the same sale. Most tools use Last Click attribution. The last touch gets 100% credit. This kills your Top-of-Funnel marketing (TikTok, Influencers) because they rarely drive the last click. They drive the first click.
Why Maison Code Discusses This
Marketing is no longer “Arts and Crafts”. It is “Data Science”. If you spend $1M a year on Ads without a Data Warehouse, you are flying blind. We implement Custom Attribution Models. We don’t trust Google’s black box. We trust raw SQL. We help CFOs understand where the money is actually going.
1. The Raw Data Pipeline
To fix this, we need raw clickstream data.
We export GA4 data to BigQuery.
We have a table events with user_pseudo_id, event_name, timestamp, source, medium, campaign.
Standard GA4 UI aggregates this. BigQuery gives us the raw log.
Now we can reconstruct the User Path.
2. The Models
1. Linear Attribution (Socialist)
Every touchpoint gets equal credit.
User path: TikTok -> Google Ads -> Email -> Buy ($100).
- TikTok: $33.33
- Google: $33.33
- Email: $33.33
-- BigQuery SQL for Linear Attribution
WITH paths AS (
SELECT
user_id,
transaction_id,
ARRAY_AGG(source ORDER BY timestamp) as touchpoints,
ANY_VALUE(revenue) as total_revenue
FROM `analytics.events`
GROUP BY 1,2
)
SELECT
source,
SUM(total_revenue / ARRAY_LENGTH(touchpoints)) as attributed_revenue
FROM paths, UNNEST(touchpoints) as source
GROUP BY 1
2. Time-Decay (Realist)
Touches closer to conversion are worth more.
We apply a half-life formula (e.g., 7 days).
User path: TikTok (10 days ago) -> Email (Today) -> Buy.
Email gets 80%. TikTok gets 20%.
This respects the “Closing” power of the channel.
3. Position-Based (U-Shaped)
The First Touch (Discovery) and Last Touch (Closer) are the heroes. The middle is the “Maintainer”.
- First: 40%
- Last: 40%
- Middle: 20% (Shared). This is the standard for Growth teams. It rewards the “Hunter” (TikTok) and the “Closer” (Email).
3. The Window Problem (Lookback)
How far back do you look?
- Facebook default: 7-day click, 1-day view.
- Luxury purchase cycle: 45 days.
If you sell expensive furniture, a 7-day window is useless.
With SQL, we can set the
LOOKBACK_WINDOWto 90 days. We can analyze “Time to Convert”. We see that TikTok ads take 20 days to mature, while Google Ads convert in 2 hours.
4. Identity Resolution (Cross-Device)
The hardest part.
User clicks ad on iPhone (Mobile).
User buys on Laptop (Desktop).
GA4 sees 2 users.
We use User-ID stitching.
When they log in or click an Email link (with user_id hash), we map device_id_A and device_id_B to master_user_X.
This unifies the path.
Without this, you over-count users and under-count mobile impact.
5. The Privacy Sandbox (The Death of Cookies)
Safari (ITP) deletes cookies after 7 days. Chrome is deprecating 3rd party cookies. Client-side tracking is dying. Server-Side Tracking (CAPI) is the answer. We send events from the Server (Shopify/Node.js) directly to Facebook/Google. We bypass the browser. This improves data accuracy by 20%. It is robust against Ad Blockers.
5. The Cookie Apocalypse (Detailed)
Cookies are dying. ITP (Intelligent Tracking Prevention) on Safari caps cookie life at 7 days (or 24 hours). Firefox blocks them by default. Chrome is phasing them out. This means “Return Users” look like “New Users”. Your “New User Acquisition” metrics are inflated. Your “Retention” is deflated. The Fix:
- First Party Cookies: Set
HttpOnlycookies from your own domain (api.maisoncode.paris). ITP treats these better. - Persistent IDs: Ask users to log in early. Use Email as the ID, not the Cookie.
6. Data Clean Rooms (Snowflake / Ads Data Hub)
In a privacy-first world, you can’t share User Level Data with Google. Enter Data Clean Rooms. You upload your data (hashed emails) to a neutral secure environment (Snowflake). Google uploads their data (hashed emails who saw ads). The Clean Room matches them and gives you Aggregate Results. “500 people who saw the ad bought the product.” You never see who they are. Google never sees your CRM. This is the future of Ad Measurement.
7. Incrementality Testing (Ghost Ads)
Attribution models are theoretical. Incrementality is scientific. The Question: “Would they have bought anyway?” The Test:
- Group A: Sees the Ad.
- Group B (Control): Sees a “Ghost Ad” (or a generic PSA) or nothing.
Compare conversion rates.
Lift = (Conv Rate A - Conv Rate B). If Lift is 0, your ads are useless, even if GA4 says they drove $1M. Stop paying for conversions you would have got for free.
8. The Markov Chain Model (Algorithmic)
Linear and U-Shaped are heuristic (rules we made up). Markov Chains are probabilistic. The algorithm analyzes 10,000 paths. It calculates the “Removal Effect”. “If we remove TikTok from the mix, how many conversions do we lose?” This is the “True” value of the channel. It requires Python/R, but it is the gold standard.
9. The Conversion Window Analysis
SQL allows us to answer questions GA4 can’t. “Does a user who visits 5 times buy more than a user who visits 2 times?” (Surprisingly, often No. They might be bots or indecisive). “What is the average time between First Click and Purchase?” If it is 3 minutes, your SEO is working. If it is 30 days, your Email Nurture is working. We segment this by product category. “Socks” buy fast. “Sofas” buy slow.
10. Offline Conversions (POS Integration)
If you have physical stores, your online ads drive offline sales. If you don’t track this, you under-invest in ads. Workflow:
- Capture
customer_emailorloyalty_idat the Point of Sale (POS). - Upload this list to Facebook/Google (Offline Events API).
- The Ad Platform matches the user.
- “Acquired via Instagram -> Bought in Paris Store”. This closes the loop. It often doubles the reported ROAS of your campaigns.
11. The ROAS Trap (Revenue vs Profit)
Marketing Managers optimize for ROAS (Revenue / Ad Spend).
But Revenue is not Profit.
If you sell a low-margin product with high ROAS, you lose money.
We calculate POAS (Profit on Ad Spend).
POAS = (Revenue - COGS - Shipping - Tax) / Ad Spend.
We pull Margin data from Shopify into BigQuery.
We tell the Ad Manager: “Kill the campaign with 4.0 ROAS because it has 0.8 POAS.”
“Scale the campaign with 2.0 ROAS because it has 1.5 POAS.”
Optimize for the bank account, not the vanity metric.
12. The Future: Media Mix Modeling (MMM)
As tracking dies, we go back to the 1960s. Econometrics. We correlate “Ad Spend Spikes” with “Revenue Spikes”. “Every time we spend $10k on TV, organic search goes up 5%.” We use open-source libraries like Meta Robyn (R) or Google LightweightMMM (Python) to run Bayesian regression models. This model doesn’t care about cookies. It cares about correlation. It is the ultimate source of truth in a privacy-first world.
13. Conclusion
Attribution is not Truth. It is a negotiation. There is no “Correct” model. But “Last Click” is definitely the “Wrong” model for growth. By moving to SQL, you own the logic. You can Stop guessing. Start querying.
Burning money on Ads?
We audit your attribution models to find true ROAS using BigQuery and dbt.