Skip to main content

Database

My decision process for picking a database. The default is Postgres unless there's a good reason to deviate.


SQL vs NoSQL Decision

Default: Postgres. It handles relational data, JSONB for flexible fields, full-text search, and time-series (with TimescaleDB extension). Don't switch until Postgres is actually the bottleneck.


When to Shard

My rule: Add read replicas before sharding. Shard only when write throughput is the bottleneck and you have a clean partition key. Sharding adds enormous operational complexity.


Indexing Rules I Follow

  • Index foreign keys and any column used in WHERE, JOIN, or ORDER BY
  • Composite indexes: column order matters — put the most selective column first
  • Partial indexes for common filtered queries (e.g. WHERE status = 'pending')
  • Don't over-index: each index slows writes and uses storage
  • Run EXPLAIN ANALYZE before assuming an index helps

Transactions and ACID

PropertyMeaningWhy it matters
AtomicityAll or nothingNo partial writes
ConsistencyDB stays in valid stateConstraints enforced
IsolationConcurrent transactions don't interfereNo dirty reads
DurabilityCommitted = persistedSurvives crashes

Default isolation level: READ COMMITTED (Postgres default). Use SERIALIZABLE only for financial transactions where correctness is critical.


Reference