SQL vs NoSQL, Sharding, and Replication #
If you are studying systems design data storage, the most common search intent is practical: when should you choose SQL vs NoSQL, when does a database need sharding, and which replication pattern keeps the system available without breaking consistency? This guide explains the core trade-offs, shows example architecture choices, and gives you a checklist for designing scalable, reliable storage.
What you will learn #
- How to choose between relational SQL databases and NoSQL data stores.
- When sharding is useful, risky, or unnecessary.
- How replication patterns affect availability, latency, failover, and data consistency.
- Which questions to ask before committing to a storage architecture.
Quick summary #
Use SQL when relational integrity, transactions, and complex queries matter. Use NoSQL when the access pattern, scale, or data model fits document, key-value, wide-column, or graph storage better. Add replication for availability and read scale, but add sharding only when a single database can no longer meet capacity, throughput, or locality requirements.
On this page #
- Choose SQL or NoSQL
- Compare common storage models
- Plan sharding carefully
- Choose a replication pattern
- Practical examples
- Storage design checklist
- Common mistakes
- Related topics
- Next steps
Choose SQL or NoSQL #
SQL databases #
SQL databases such as PostgreSQL, MySQL, and SQL Server organize data into tables with defined schemas. They are a strong default when you need transactions, joins, constraints, reporting queries, and predictable data integrity.
Use SQL when:
- The domain has clear relationships, such as users, orders, invoices, inventory, or permissions.
- ACID transactions are required for correctness.
- Teams need mature query tooling, migrations, indexes, and reporting.
- The data model changes deliberately rather than constantly.
NoSQL databases #
NoSQL databases include document stores, key-value stores, wide-column stores, and graph databases. They are useful when the data model or access pattern does not fit relational tables cleanly, or when horizontal scale and high throughput are primary design constraints.
Use NoSQL when:
- The main access pattern is simple and high volume, such as lookup by key.
- Data is semi-structured, nested, or frequently evolving.
- The system needs distributed writes or very large scale.
- The application can tolerate eventual consistency or carefully designed consistency boundaries.
Compare common storage models #
| Storage model | Examples | Best fit | Watch out for |
|---|---|---|---|
| Relational SQL | PostgreSQL, MySQL | Transactions, joins, reporting, strong consistency | Over-sharding too early or designing around queries that need many joins at huge scale |
| Document | MongoDB, Couchbase | JSON-like documents, flexible product/catalog/profile data | Duplicated data and inconsistent updates across documents |
| Key-value | Redis, DynamoDB-style access | Caches, sessions, counters, fast lookup by key | Poor fit for ad hoc queries without planned indexes |
| Wide-column | Cassandra, HBase | High-write, large-scale time-series or event data | Operational complexity and query-model rigidity |
| Graph | Neo4j, JanusGraph | Relationship-heavy traversal such as fraud, recommendations, networks | Harder horizontal scaling and specialized query patterns |
Plan sharding carefully #
Sharding partitions data across multiple database nodes. It can increase throughput and storage capacity, but it also adds routing, rebalancing, operational, and query complexity.
Common sharding strategies #
- Range-based sharding: Groups data by ordered ranges, such as customer ID ranges or dates. It is easy to understand but can create hot shards when traffic concentrates in one range.
- Hash-based sharding: Uses a hash of the shard key to distribute data evenly. It reduces hotspots but makes range queries and resharding harder.
- Geo-based sharding: Stores data near users or regulatory regions. It improves latency and data residency but complicates cross-region queries.
- Tenant-based sharding: Groups one or more customers per shard. It works well for SaaS systems when tenant size and isolation requirements are known.
Before you shard, ask #
- Can indexes, query tuning, caching, read replicas, or vertical scaling solve the current bottleneck?
- What is the shard key, and does it match the highest-volume access patterns?
- How will the system route requests to the right shard?
- What happens when one shard grows faster than the others?
- How will backups, restores, analytics, and migrations work across shards?
Choose a replication pattern #
Replication copies data across nodes or regions. It supports availability, disaster recovery, lower read latency, and maintenance windows, but it introduces lag and failover decisions.
Leader-follower replication #
One leader accepts writes and followers replicate changes. This is common for SQL systems that need simple write ordering and scalable reads.
Best for:
- Read-heavy applications.
- Clear failover procedures.
- Teams that need strong write consistency through a single leader.
Multi-leader replication #
Multiple leaders accept writes and replicate changes to each other. This can reduce regional write latency but creates conflict-resolution complexity.
Best for:
- Multi-region systems where write availability is more important than simple consistency.
- Workloads with natural conflict avoidance, such as partitioned tenants or regions.
Leaderless replication #
Several replicas can accept writes, often with quorum reads and writes. This pattern can improve availability but requires careful tuning of consistency, latency, and conflict behavior.
Best for:
- High-availability distributed stores.
- Workloads that can tolerate eventual consistency or use application-level reconciliation.
Practical examples #
E-commerce checkout #
- Primary store: SQL database for orders, payments, inventory reservations, and transaction integrity.
- Read scale: Read replicas for product browsing and order history.
- Cache: Key-value cache for sessions and frequently viewed products.
- Risk: Do not place payment state in an eventually consistent store without clear reconciliation.
IoT telemetry platform #
- Primary store: Wide-column or time-series storage for high-volume device events.
- Sharding key: Device ID plus time bucket to avoid hot partitions.
- Replication: Multi-zone replication to survive node or zone failures.
- Risk: Range-only sharding by timestamp can overload the newest shard.
SaaS application with large tenants #
- Primary store: SQL database while tenants are small and query flexibility matters.
- Scale path: Tenant-based sharding once large tenants create capacity or isolation pressure.
- Replication: Follower replicas for reporting and backups.
- Risk: Moving tenants between shards must be designed before emergency capacity pressure.
Storage design checklist #
- Document the top read and write access patterns before choosing a database.
- Decide which operations require strong consistency and transactions.
- Identify expected data volume, write throughput, read throughput, and growth rate.
- Choose indexes and partition keys based on real queries.
- Use replication for availability and recovery, then test failover.
- Add sharding only after simpler scaling options are insufficient.
- Plan backup, restore, migration, and observability from the beginning.
- Define RPO and RTO targets for critical data stores.
Common mistakes #
- Choosing NoSQL only because it sounds more scalable, even when the workload needs relational integrity.
- Sharding before proving that indexes, caching, read replicas, or vertical scaling are insufficient.
- Selecting a shard key that creates hotspots or makes common queries cross every shard.
- Ignoring replication lag when reading from followers after a write.
- Treating backups as sufficient without testing restores and failover.
- Designing storage around average load instead of peak traffic, growth, and failure scenarios.
Related topics #
- Systems Design — Review broader architecture concepts and trade-offs.
- Data Flow — Understand how data moves through services and pipelines.
- Operational Resilience — Connect storage design to recovery and continuity planning.
- Recovery Point Objective (RPO) and Recovery Time Objective (RTO) — Define acceptable data loss and recovery time.
- Monitoring & Logging — Observe database health, lag, saturation, and errors.
Next steps #
- Write down the three most important queries and three most important writes for your application.
- Choose the simplest storage model that satisfies consistency, scale, and operational requirements.
- Use RPO and RTO to turn storage failure scenarios into measurable recovery targets.