Traditional A/B testing tools (Optimize, VWO) work by injecting JavaScript.

Browser loads Page A.
JS runs, checks cookie.
JS rewrites DOM to “Page B”. This causes FOOC (Flash of Original Content). The user sees the old headline for 0.5s, then it snaps to the new one. It destroys your Core Web Vitals (CLS) and lowers trust. The solution? Edge Middleware.

Why Maison Code Discusses This

We refuse to compromise performance for data. Marketing wants experiments. Engineering wants speed. Edge Testing gives both. We execute the logic on Cloudflare/Vercel servers (The Edge), 5ms away from the user. The HTML arrives pre-rendered. The user never sees the switch. This is the only way to test on a luxury site.

1. The Architecture

We move the logic to the CDN (Vercel Edge / Cloudflare Workers). The decision happens before the HTML is generated.

Request: User hits /.
Edge: Checks experiment-id cookie.
Edge: If missing, rolls dice (50/50). Sets cookie.
Edge: Rewrites the response (Server Side).
Browser: Receives the specific HTML for Variant B only. Zero CLS. Zero Flicker.

2. Implementation in Next.js / Middleware

We use the middleware.ts file to intercept the request.

// middleware.ts
import { NextResponse } from 'next/server';
import { getBucket } from '@lib/ab-testing';

export function middleware(request: Request) {
  const COOKIE_NAME = 'ab-hero-test';
  let bucket = request.cookies.get(COOKIE_NAME)?.value;

  // If no bucket, assign one
  if (!bucket) {
    bucket = Math.random() < 0.5 ? 'control' : 'variant';
  }

  // Rewrite the URL internally (Invisible to user)
  const url = request.nextUrl.clone();
  if (bucket === 'variant') {
    url.pathname = '/variants/b';
  } else {
    url.pathname = '/variants/a';
  }

  const response = NextResponse.rewrite(url);
  
  // Set the sticky cookie
  response.cookies.set(COOKIE_NAME, bucket);
  return response;
}

3. Statistical Significance (The Math)

Don’t just run a test for 2 days. You need Statistical Power. If you have 100 visitors and 5 conversions (5%) vs 7 conversions (7%), that is noise. Use a Bayesian calculator. We typically require:

Minimum Sample: 1,000 visitors per variant.
Duration: 2 full business cycles (2 weeks). The Peeking Problem: Don’t stop the test the moment it looks green. This is “P-hacking”. Commit to the duration before you start.

4. The SEO Impact (Google Safety)

Google hates duplicate content. If you have / and /variant-b, use canonical tags. Point both valid versions to the Canonical URL /. Or, since we are doing Edge Rewrites, the URL stays / for the user. GoogleBot will usually see the Control (unless it persists cookies). Warning: Don’t “Cloak” (Show Google one thing and users another). Google checks the rendered JS. Edge testing is safer because it serves distinct server responses.

5. Feature Flags vs A/B Tests

Feature Flag: “Turn on the new checkout for 10% of users to test for bugs.” (Safety).
A/B Test: “Show a Red Button vs Blue Button to test conversion.” (Growth). We use tools like LaunchDarkly or Statsig to manage both. They share the same underlying logic (Conditional Rendering), but the Intent is different. Feature Flags are for Engineering. A/B Tests are for Product.

6. The Flicker Analysis (UX Cost)

If you use Client-Side A/B testing… And your flicker is 500ms… You lose 10% of users before they even see the variant. Your data is corrupted. You are testing “Control vs (Variant + Delay)”. You are not testing “Control vs Variant”. Edge testing removes the Delay variable. It is the only scientific way to test.

7. The Holdout Group

If you run 10 experiments at once… how do you know the total impact? Create a Global Holdout Group. 5% of users never see any experiment. Compare the “All Experiments” group vs the “Holdout” group after 6 months. This proves the long-term value of your CRO program.

9. The Metrics that Matter (Beyond Click Rate)

Don’t just measure “Clicks”. This is a vanity metric. Measure Revenue per Visitor (RPV). Variant A might have fewer clicks, but higher AOV (Average Order Value). If you optimize for Clicks, you might just be creating “Clickbait”. We track:

Conversion Rate: Did they buy?
AOV: How much did they spend?
RPV: Combined value.
Retention: Did they come back?

10. The Segmented Test (Personalization)

A/B testing on “All Users” is blunt. Test on Segments.

Test A: Show “Free Shipping” to returning VIPs.
Test B: Show “10% Off” to new visitors. Different cohorts behave differently. Use Edge Middleware to detect the segment (via Cookie or Geo) and serve the appropriate test. “One size fits all” is dead.

11. A/B/n Testing (Multivariate)

Why test only A vs B? Test A vs B vs C vs D. The Bandit Algorithm: Instead of a fixed 50/50 split… The algorithm dynamically routes traffic to the winning variant while the test is running. If Version C is winning, send 80% of traffic to C. This maximizes revenue during the test. This is Machine Learning at the Edge.

“Can we test if they reject cookies?” No. If a user rejects tracking, you cannot assign them a persistent ID. Strategy:

Strict Mode: If no consent, show Control. Do not track.
Session Mode: Use a session-only cookie (cleared on close). This is legally grey but safer.
Anonymous Mode: Bucketing based on Request ID (random). No persistent history. We default to Privacy. If they say no, they see the default site. Respect the user first, optimize the revenue second.

13. The Analytics Integration (GA4 / Mixpanel)

The Edge decides the variant. But GA4 needs to know. We inject the decision into the window object.

window.ab_test = {
  experiment_id: 'hero_test',
  variant: 'B'
};

Then, GTM (Google Tag Manager) picks it up and sends it as a “User Property”. This allows you to slice your GA4 reports by Variant. “Show me the Retention Rate of Variant B.” Without this link, your data is blind.

14. The Pricing Test Strategy (Dangerous Revenue)

Testing pricing is dangerous. If users find out, they get angry. The “Discount” Test: Test A: Standard Price ($100). Test B: “Limited Time Offer: $90”. This is safe. The “Premium” Test: Test A: Standard Price ($100). Test B: Premium Packaging included ($120). Test value propositions, not just price points. If you test raw price ($100 vs $110) for the exact same item, you risk a PR nightmare.

15. The Mobile First Test (Thumb Zone)

Most A/B tests fail on Mobile because they are designed for Desktop. The “Thumb Zone”: On Mobile, the CTA must be reachable with one thumb. Test A: Standard Sticky Button. Test B: “Floating Action Button” (FAB) in the bottom right. We often see +15% conversion just by moving the button 50 pixels down. Test the physical ergonomics, not just the colors.

16. Conclusion

A/B Testing is the scientific method applied to revenue. But if your “Science” breaks the user experience (Flicker), you invalidate the results. Test at the Edge. Keep the speed. Respect the math. Growth is a game of inches, not miles.

17. The Final Flicker Warning

If you take one thing from this article: Do not accept Flicker. Flicker is not just “ugly”. It is data corruption. It biases your test towards users with fast internet (who It biases against mobile users. It invalidates your entire hypothesis. Move to the Edge. Or don’t test at all.

Guessing what works?

We implement flicker-free Edge experimentation pipelines.

Hire our Architects.