System design tips on how to make capacity estimation

Estimation is an important skill in system design interviews because it allows you to demonstrate your ability to reason about the complexity and feasibility of a given problem. It is worth noting that estimation is an inexact science and that it is normal for estimates to be off by some margin. It is more important to show that you have a clear understanding of the problem and can provide a thoughtful and well-reasoned estimate than to provide a perfectly accurate one.

Here are a few examples of the types of estimation questions you might encounter in a system design interview:

  1. How many users can a system support? For example, you might be asked to estimate the number of users that a social media platform can support, based on the number of servers, storage, and bandwidth available.

  2. How much data can a system store or process? For example, you might be asked to estimate the amount of data that a data warehouse can store, based on the size of the data, the number of servers, and the storage capacity of the servers.

  3. How scalable is a system? For example, you might be asked to estimate the performance and scalability of a web application, based on the number of users, the amount of data, and the server resources available.

Cheat sheet:

 Byte conversions:

  • 1 B = 8bits
  • 1 KB = 1000B 
  • 1 MB = 1000KB
  • 1 GB = 1000MB
  • 1 TB =  1000 GB
  • 1 PB = 1000 TB

 Storage scale numbers:

  •  1 char = 1 byte
  •  Metadata (title, description, etc [except images]) ~ 1 - 10 KB
  •  Image ~ 1-2 MB
  •  HD video (1 minute) ~ 50 MB

Operations numbers:

  • HDD sequential read - 30 MB/s
  • SSD  sequential read - 1 GB/s
  • Memory sequential read - 4 GB/s

SQL databases (numbers are approximate, the purpose is to have general idea about performance):

  • Connections: 20 K
  • Storage: 50 TB
  • Requests: 20 K/s

Cache (numbers are approximate, the purpose is to have general idea about performance):

  • Connections: 10 K
  • Requests: 100 K/s
  • Storage: 300 GB

Web servers (numbers are approximate, the purpose is to have general idea about performance):

  • Requests: 5 K/s

Queues (numbers are approximate, the purpose is to have general idea about performance): 

  • Requests: 3 K/s
  • Throughput (writes): 1-50 MB/s
  • Throughput (reads): 2-100 MB/s

Calculation example:

10 M photos are uploaded daily to a service.

  • 10 (number of millions) * 12 (the number per second for 1 M) = 120 uploads/second.
  • 120 (uploads) * 1 MB (size of photo) = 120 MB per second.

The web servers will need to handle a network bandwidth of 120 MB per second.

It's also important to consider the bottlenecks in your system. Just because your bandwidth can handle a certain number of requests per second doesn't mean that your entire system can. Some parts of the system that might struggle to keep up with a high workload include the database connections or throughput, hard disk reading/writing, the load balancer, and API calls to a third-party service that has rate limits. Every component of the system needs to be able to handle the expected demand.

 

Comments

Popular posts from this blog

How to design tiny url system

IMAP protocol summary