System Design for Scale: From 10K to 1 Million Users
A practical guide to scaling your application from 10,000 to 1 million users. Learn when to optimize, what patterns actually matter, and how to avoid over-engineering.
The Scaling Journey
At 10,000 users, you start noticing slowdowns. At 100,000, you're removing bottlenecks. At 1 million, you're rethinking your entire architecture.
But here's the thing: you don't need to build for a million users on day one. In fact, doing so is usually a mistake. This guide walks through what actually changes at each scale and when to make those changes.
The First Rule: Don't Over-Engineer
Many successful applications handle millions of users with well-designed monolithic architectures. You don't need microservices, Kubernetes, or a distributed database to serve your first 100,000 users.
Over-engineering early causes:
- Slower initial development
- Higher operational complexity
- More failure modes
- Wasted engineering time
Build for your current scale plus one order of magnitude. If you have 1,000 users, build for 10,000. When you hit 7,000-8,000, plan for 100,000.
Scaling Phases
Phase 1: Single Server (0 - 10K Users)
Architecture:
[Users] -> [Single Server: App + Database]What works:
- Vertical scaling (bigger server)
- Single database
- Simple deployment
- All code in one place
When to move on:
- Database CPU consistently above 70%
- Memory pressure causing swapping
- Deployment requires downtime
Cost: $50-200/month
Phase 2: Separate Database (10K - 50K Users)
Architecture:
[Users] -> [App Server] -> [Database Server]This single change can double your capacity. The database and application compete for memory and CPU on a single server. Separating them lets each use resources more efficiently.
What works:
- Managed database (RDS, Cloud SQL)
- Application-level caching
- Database connection pooling
When to move on:
- Read queries dominating database load
- Single app server hitting limits
- Need for zero-downtime deployments
Cost: $200-500/month
Phase 3: Add Caching and Read Replicas (50K - 200K Users)
Architecture:
[Users] -> [App Server] -> [Cache (Redis)]
-> [Primary DB]
-> [Read Replica]Most applications are read-heavy (often 100:1 read-to-write ratio). Adding a cache and read replicas dramatically reduces database load.
Caching strategy:
- Cache frequently-read data (user profiles, settings)
- Cache computed results (aggregations, recommendations)
- Use cache-aside pattern: check cache, if miss, query DB and populate cache
Read replicas:
- Route read queries to replicas
- Keep writes on primary
- Works great for dashboards, search, content feeds
When to move on:
- Single app server can't handle load
- Need geographic distribution
- Cache hit rate plateauing
Cost: $500-1,500/month
Phase 4: Horizontal Scaling (200K - 1M Users)
Architecture:
[Users] -> [Load Balancer] -> [App Server 1]
-> [App Server 2] -> [Cache Cluster]
-> [App Server N] -> [DB Primary + Replicas]Key requirements:
- Stateless application servers: No session data on the server. Use Redis for sessions, JWTs for auth.
- Load balancer: AWS ALB, GCP Load Balancer, or nginx
- Auto-scaling: Add/remove servers based on CPU, memory, or request latency
What changes:
- Deployments become rolling updates
- Need proper health checks
- Logging and monitoring become critical
- Database connections need pooling (PgBouncer, ProxySQL)
When to move on:
- Database becoming the bottleneck again
- Different features need different scaling profiles
- Team size makes monolith coordination difficult
Cost: $1,500-5,000/month
Phase 5: Database Scaling (1M+ Users)
At this point, the database is usually the bottleneck. Options:
1. Vertical scaling (bigger database)
- Easiest but has limits
- Can get you to 5-10M users with good query optimization
2. Read replicas + caching
- If read-heavy, add more replicas
- Aggressive caching can reduce DB load 90%+
3. Sharding
- Split data across multiple databases
- Shard by user_id, tenant_id, or geographic region
- Adds significant complexity
4. Specialized databases
- Move search to Elasticsearch
- Move analytics to a data warehouse
- Move caching to Redis Cluster
- Move time-series data to TimescaleDB or InfluxDB
Patterns That Matter
1. Connection Pooling
Database connections are expensive. Without pooling, you might:
- Open a new connection for every request
- Hit database connection limits
- Waste resources on connection overhead
Use PgBouncer for PostgreSQL, ProxySQL for MySQL, or built-in pooling in your ORM.
2. N+1 Query Prevention
The N+1 problem kills performance:
// Bad: N+1 queries
const users = await getUsers();
for (const user of users) {
user.posts = await getPosts(user.id); // Query per user!
}
// Good: Eager loading
const users = await getUsers({ include: 'posts' }); // Single query3. Async Processing
Move slow operations out of the request path:
[Request] -> [API Server] -> [Queue] -> [Worker]
-> [Quick Response]Use for: Email sending, image processing, report generation, third-party API calls.
4. CDN for Static Assets
A CDN serves static files from edge locations worldwide. This:
- Reduces server load
- Improves page load times globally
- Costs pennies per GB
Use CloudFlare, AWS CloudFront, or Fastly.
5. Rate Limiting
Protect your API from abuse and runaway clients:
// Basic rate limit: 100 requests per minute per user
const rateLimit = {
windowMs: 60 * 1000,
max: 100,
keyGenerator: (req) => req.user.id
};Monitoring: You Can't Scale What You Can't Measure
Essential metrics to track:
| Metric | Why It Matters | Target |
|---|---|---|
| Response time (p50, p95, p99) | User experience | p95 < 500ms |
| Error rate | Reliability | < 0.1% |
| Database query time | Backend health | p95 < 100ms |
| Cache hit rate | Cache effectiveness | > 90% |
| CPU/Memory utilization | Capacity planning | < 70% sustained |
Tools: DataDog, New Relic, Grafana + Prometheus, AWS CloudWatch
Real-World Example: SharpDuel
When I built SharpDuel, we went from 0 to handling significant load:
Phase 1 (Launch):
- Single server, PostgreSQL
- Simple caching with Redis
- Handled first 5,000 users fine
Phase 2 (Growth):
- Separated database to RDS
- Added read replica for reporting queries
- Implemented connection pooling
- Handled 50,000+ users
Phase 3 (Scale):
- Moved to multi-server setup with load balancer
- Redis cluster for sessions and caching
- Background job processing for notifications
- CDN for all static assets
Result: Scaled to $200K MRR in 12 months without major rewrites.
Common Mistakes
1. Premature optimization
Building for 1M users when you have 1,000. Focus on product-market fit first.
2. Microservices too early
Microservices add operational overhead. Start with a well-structured monolith.
3. Ignoring database queries
One bad query can bring down your entire application. Monitor and optimize.
4. No caching strategy
Adding caching as an afterthought leads to inconsistency bugs. Design for it.
5. Scaling without monitoring
If you can't measure it, you can't improve it. Instrument everything.
When to Scale
The right time to scale is when you consistently see degradation during normal traffic, or when you can't handle traffic spikes without performance issues.
Don't wait for complete failure - plan your next phase when you're at 70-80% capacity. But also don't scale prematurely. Scaling adds complexity, and complexity has costs.
The Bottom Line
Scaling from 10K to 1M users doesn't require jumping straight into complex distributed systems. The path looks like:
- Start simple: Monolith, single database
- Separate concerns: App server + database server
- Add caching: Redis + read replicas
- Scale horizontally: Load balancer + auto-scaling
- Specialize: Sharding, specialized databases, microservices (if needed)
Each step should be driven by measured bottlenecks, not speculation. Scale in response to real problems, not hypothetical ones.
---
I've scaled systems from zero to millions of users. If you're hitting scaling challenges or planning for growth, I can help - without the $50K consulting engagement fees agencies charge. Let's talk about your architecture.
Need help with your project?
I help startups and businesses build scalable products. Let's discuss your technical challenges.
Get in Touch