MAISON CODE .
/ Tech · Performance · QA · Scaling · Backend

Load Testing: Breaking Your Site on Purpose

Your site works with 1 user. Does it work with 10,000? How to use k6 to simulate massive traffic spikes and identify bottlenecks before Black Friday.

AB
Alex B.
Load Testing: Breaking Your Site on Purpose

The “Success Disaster” Pattern

The most dangerous moment for a startup is not failure. It is Success. Imagine this scenario: You launch a new marketing campaign. Or you get featured on TechCrunch. Or an influencer with 5M followers posts about your product. Traffic spikes 100x overnight. You are ready to make millions. And then… the site loads white. 504 Gateway Timeout. 503 Service Unavailable. Your database CPU hits 100%. Your connection pool is exhausted. Users refresh the page, adding more load. The system enters a death spiral. You panic. You try to upgrade the database on AWS, but it takes 30 minutes to apply. By the time the site is back up, the traffic is gone. The users are angry. The reputation is damaged.

This is a Success Disaster. You succeeded in Marketing, but failed in Engineering. Load Testing forces this scenario to happen in a controlled environment, days or weeks before the real event. We simulate the crash. We find the bottleneck. We fix it. We repeat.

Why Maison Code Discusses This

At Maison Code, we specialize in “High Pressure” e-commerce. Our clients are not typical blogs. They are brands dropping limited-edition sneakers at 10:00 AM. They go from 0 visitors to 50,000 visitors in 3 seconds. We have learned through experience that Scalability is not magic; it is math. If you have not tested your system at 10x your expected load, you do not have a robust system; you have a prayer. We write about this because we want to save founders from the heartbreak of watching their “Big Moment” turn into a public outage.

The Tool: k6 (Grafana)

For a long time, the standard was JMeter. It was powerful but painful (XML-based GUI). Then Locust (Python) became popular. But today, the industry standard is k6. Why k6?

  1. JavaScript: Tests are written in JS/TS. Developers are comfortable with it.
  2. Performance: The engine is written in Go. One laptop can simulate 30,000 concurrent users.
  3. CI/CD: It runs easily in a GitHub Actions pipeline.
  4. Metrics: It integrates natively with Grafana/InfluxDB for beautiful real-time visualization.

Implementation: Writing a k6 Script

Let’s look at a robust load testing script.

import http from 'k6/http';
import { sleep, check } from 'k6';

// Configuration: The Load Profile
export const options = {
  stages: [
    { duration: '30s', target: 20 },  // Ramp up to 20 users (Warm up)
    { duration: '1m', target: 50 },   // Ramp up to 50 users (Load)
    { duration: '2m', target: 50 },   // Stay at 50 users (Sustain)
    { duration: '30s', target: 100 }, // Spike to 100 users (Stress)
    { duration: '30s', target: 0 },   // Ramp down (Cool down)
  ],
  thresholds: {
    // Failure Conditions
    http_req_failed: ['rate<0.01'],    // Error rate must be < 1%
    http_req_duration: ['p95<500'],    // 95% of requests must be < 500ms
  },
};

export default function () {
  const BASE_URL = 'https://staging.maisoncode.paris';
  
  // 1. Visit Homepage
  const resHome = http.get(BASE_URL);
  check(resHome, { 
    'Home status 200': (r) => r.status === 200,
    'Home fast': (r) => r.timings.duration < 200, 
  });
  
  sleep(1); // User thinks...

  // 2. Search for Product (CPU Intensive)
  const resSearch = http.get(`${BASE_URL}/api/search?q=shirt`);
  check(resSearch, { 
    'Search status 200': (r) => r.status === 200 
  });

  sleep(2);
}

The 4 Types of Performance Tests

Understand the difference, or you will test the wrong thing.

  1. Smoke Test:

    • Goal: Ensure the system handles minimal load.
    • Load: 1-5 Virtual Users (VUs).
    • When: After every deployment. “Did we break the server entirely?”
  2. Load Test:

    • Goal: Verify the system handles expected traffic.
    • Load: The average traffic of your production site (e.g., 50 VUs).
    • When: Before minor releases.
  3. Stress Test:

    • Goal: Find the Breaking Point.
    • Load: Ramp up infinitely until the system crashes.
    • Result: “We can handle 2,500 users. At 2,501, the DB crashes.”
    • When: Before major architectural changes.
  4. Soak Test:

    • Goal: Find Memory Leaks.
    • Load: Moderate load (80% capacity) for 24 hours.
    • Result: “Ah, the memory usage creeps up by 1% every hour. We have a leak.”

Identifying Bottlenecks

You found the breaking point. Now, ask Why? The “Why” is rarely the code itself. It is usually the infrastructure.

  1. The Database (The Usual Suspect):

    • Connection Exhaustion: “FATAL: remaining connection slots are reserved for non-replication superuser roles”.
      • Fix: Connection Pooling (PgBouncer).
    • CPU Saturation: Complex queries doing full table scans.
      • Fix: Indexing. Read Replicas. Use Redis to cache common queries.
  2. Bridges (APIs):

    • Your site is fast, but you call the Shipping API to calculate rates.
    • The Shipping API times out under load.
    • Your requests queue up waiting for the Shipping API.
    • Your server runs out of RAM.
    • Fix: Timeouts. Circuit Breakers.
  3. The Event Loop (Node.js):

    • Node is single-threaded. If you do heavy CPU math (encryption, image resizing) on the main thread, you block all users.
    • Fix: Offload to Worker Threads or Serverless Functions.

The Semantic of Percentiles (p95, p99)

Never judge performance by the Average. Averages hide lies.

  • User A: 100ms
  • User B: 100ms
  • User C: 10,000ms (10 seconds)
  • Average: ~3.4s. (Looks okay-ish).
  • p99: 10s. (Terrifying).

p95 (95th Percentile) means: “Ignore the slowest 5% outliers. What is the speed for everyone else?” p99 (99th Percentile) means: “The experience of the 1% slowest requests.”

Why p99 matters: In E-commerce, the “slowest requests” are often the users with the Largest Carts. User with 1 item -> Fast query. User with 50 items -> Slow query. The p99 user is your High Value Customer. If you ignore p99, you are optimizing for window shoppers and ignoring the whales.

Testing in Production?

DANGER. Running a Stress Test on Production is risky.

  1. You screw up Analytics: Google Analytics will show 10,000 “bot” visits, ruining your marketing data.
  2. You trigger Costs: If you use Serverless/Vercel, you will pay for the 10M requests you just spammed.
  3. Fraud Alerts: Stripe might ban you for “Card Testing” attack patterns.

The Golden Rule: Use a Data-Parity Staging Environment. Staging must be identical to Prod (same AWS instance size, same DB size). If Staging is a tiny t2.micro and Prod is a massive m5.large, your test results are meaningless.

14. Chaos Engineering (Breaking Things on Purpose)

Unplug the database cable. What happens? Does the site crash? Or does it show a cached version? We use Chaos Mesh or generic Pumba scripts to kill containers randomly during the load test. If 1 replica dies, the Load Balancer should instantly reroute. If the Redis Cache dies, the DB should take the load (or catch fire). Knowing this before 9 AM on launch day is priceless.

15. Browser-Based Load Testing (k6 browser)

Protocol tests (HTTP) don’t render JavaScript. They don’t catch “Hydration Errors” or “React Rendering Lag”. k6 browser spins up real Headless Chrome instances. It measures CLS (Cumulative Layout Shift) and FID (First Input Delay) under load. “The server is fast (200ms API), but the Client is slow (3s JS execution) because we sent 5MB of JSON.” Only Browser Testing reveals this.

16. Automating Load Tests in CI/CD (GitHub Actions)

Don’t run k6 from your laptop. Run it on every Pull Request. We add a load-test.yml workflow.

  1. Deploy branch to Staging URL.
  2. Run k6 run smoke-test.js (Check for 500s).
  3. If fail, block merge. This prevents “Performance Regressions”. “The Junior Dev added a N+1 query loop. The test caught it because latency went from 200ms to 2000ms.”

17. Spike Testing vs Soak Testing

Know the difference. Spike Testing:

  • Simulates a “Sneaker Drop” or “Super Bowl Ad”.
  • 0 -> 10,000 users in 10 seconds.
  • Goal: Test Auto-Scaling triggers (Do AWS servers spin up fast enough?). Soak Testing:
  • Simulates “Black Friday Weekend”.
  • High load for 48 hours.
  • Goal: Test Resource Leaks (Memory, Disk Space, File Descriptors). You need both.

18. Conclusion

Performance is not a “Nice to Have”. It is a feature. Scalability is not magic. It is engineering. Load testing is the discipline of validating your engineering. Don’t wait for the blackout. Break the lightbulb yourself, while you have a spare in your pocket.

Preparing for Black Friday?

Do not wait until November. Maison Code offers “Infrastructure Stress Testing” services. We simulate 100x traffic spikes, identify your bottlenecks, and implement Auto-Scaling policies that work.



Expecting high traffic?

We run Load & Stress tests using k6 to validate infrastructure capacity and verify Auto-Scaling policies. Hire our Architects.