Databases
1. Databases¶
1.1. ποΈ Database Types¶
1.1.1. π’ Relational Databases (RDBMS)¶
- Store data in tables with rows and columns
- Use SQL (Structured Query Language)
- ACID compliant:
- β A tomicity - all or nothing, think about bank transaction and moving funds between two accounts
- β C onsistency - After transaction database should be in the consistent state
- β I solation - Transactions should be independent
- β D urability - Once Transaction is completed, the data is there to stay
- Examples: MySQL, PostgreSQL, Oracle, SQL Server
- Great for:
- Transactions
- Complex queries
- Integrity
1.1.2. π NoSQL Databases¶
- Alternative to relational databases for unstructured data
- No Schemas and no foreign key to link tables together
- Typically sacrifice ACID for performance and scalability
- β A tomicity - all or nothing, think about bank transaction and moving funds between two accounts
- β C onsistency - After transaction database should be in the consistent state
- β I solation - Transactions should be independent
- β D urability - Once Transaction is completed, the data is there to stay
- Examples: MongoDB, Cassandra and Redis
- Great for:
- Scalability
- Quick Iteration
- Simple Queries
NoSQL Categories
π Document: JSON-like documents (MongoDB, CouchDB)
π Key-Value: Simple key-value pairs (Redis, DynamoDB)
π Column-Family: Column-oriented storage (Cassandra, HBase)
πΈοΈ Graph: Nodes and relationships (Neo4j, Amazon Neptune)
1.1.3. π§ In-Memory Databases¶
- Store data primarily in memory for faster access
- Used for:
- Caching
- Session stores
- Real-time analytics
- Examples: Redis, Memcached, VoltDB
1.1.4. π Time-Series Databases¶
- Optimized for time-stamped or time-series data
- Used for:
- Metrics
- Monitoring
- IoT applications
- Examples: InfluxDB, TimescaleDB, Prometheus
1.2. ποΈ Database Architecture¶
1.2.1. π Schema Design¶
Tables
,columns
,relationships
(relational)Collections
,documents
,indexes
(document-based)- Data models determine application interaction patterns
1.2.2. π Keys and Indexes¶
Primary keys
: Unique identifiers for recordsForeign keys
: Create relationships between tablesIndexes
: Speed up queries but can slow down writes
Index Types
π€ B-Tree: General-purpose, balanced tree structure
π Hash: Fast lookups for exact matches
π Full-Text: Optimized for text search
π Spatial: Geographic/geometric data queries
1.2.3. π Scaling Approaches¶
1.2.3.1. πΌ Vertical Scaling (Scale Up)¶
- Add more resources (CPU, RAM) to a single server
- Simpler but has physical limitations
- No changes to application code required
1.2.3.2. π Horizontal Scaling (Scale Out)¶
- Add more database servers to distribute load
- More complex but virtually unlimited scaling
- Requires application consideration
1.2.3.2.1. Horizontal Scaling Patterns¶
-
𧩠Sharding: Partitioning data across multiple servers
- Split into smaller chunks to distribute, based on a shard key
- Sharding Strategies:
Range-based Sharding
:- Based on the range of a given key
- Efficient for range queries but can lead to hotspots
- Example: Jan-Mar transactions on shard 1, Apr-Jun on shard 2
Directory-based Sharding
:- Lookup service to direct traffic to the database
- For example, if have data with 4 seasons (Summer, Autumn, Winter and Spring), this will create a look up table that would point to one of 4 shards, based on the season
Geographical Sharding
:- Based on geographic location
- Reduces latency for location-specific queries
- Example: European users on EU servers, Asian users on APAC servers
Hash-Based Sharding
:- Uses a hash function on the shard key to determine placement
- Distributes data evenly but makes range queries difficult
- Example: user_id % number_of_shards = shard_number
Functional Sharding
:- Splits by feature or domain rather than data
- Different services handle different functions
- Example: Product catalog on one shard, user profiles on another
-
π Replication: Copies of data across multiple servers
1.3. π Database Performance Optimisation¶
- Caching Layer
- Implement distributed caching (Redis, Memcached) for frequently accessed data
- Reduces database load and improves response times
- Consider cache invalidation strategies to maintain consistency
- Intelligent Indexing
- Create targeted indexes based on query patterns within each shard
- Avoid over-indexing which impacts write performance
- Consider column-oriented storage for analytical workloads
-
Query Optimization
- Rewrite queries to leverage sharding keys
- Use query hints to direct queries to appropriate shards
- Implement connection pooling for efficient resource utilization
-
Monitoring & Analytics
- Track shard performance metrics to identify hotspots
- Implement automated rebalancing based on usage patterns
- Use slow query logs to identify optimization opportunities
1.4. ποΈ Database Design Patterns¶
1.4.1. π Normalization and Denormalization¶
- Normalization: Reduce redundancy and dependency
- Denormalization: Duplicate data for performance
- Trade-offs between consistency and speed
Design Considerations
π Performance Goals: Speed vs. flexibility requirements
π Data Volume: Current and projected growth
π Access Patterns: Read-heavy vs. write-heavy workloads
𧩠Consistency Needs: Strong vs. eventual consistency
CAP Trade-offs
π CP Databases: Consistent and partition tolerant (MongoDB, HBase)
π AP Databases: Available and partition tolerant (Cassandra, CouchDB)
π CA Databases: Consistent and available, but rare in distributed systems
βοΈ Practical Approach: Most systems balance these properties situationally
Use Case Mapping
πΌ E-commerce: Often PostgreSQL, MySQL, or MongoDB
π± Social Media: Cassandra, Neo4j, or Redis + RDBMS
π Analytics: Redshift, Snowflake, or ClickHouse
π Caching: Redis or Memcached