Sizing and Hardware Recommendations
This guide discusses our general recommendations regarding hardware, compute, memory, and disk configurations for open-source users. If you would like to simplify your setup, we recommend using ClickHouse Cloud as it automatically scales and adapts to your workloads while minimizing costs pertaining to infrastructure management.
The configuration of your ClickHouse cluster is highly dependent on your application’s use case and workload patterns. When planning your architecture, you must consider the following factors:
- Concurrency (requests per second)
- Throughput (rows processed per second)
- Data volume
- Data retention policy
- Hardware costs
- Maintenance costs
Disk
The type(s) of disks you should use with ClickHouse depends on data volume, latency, or throughput requirements.
Optimizing for performance
To maximize performance, we recommend directly attaching provisioned IOPS SSD volumes from AWS or the equivalent offering from your cloud provider, which optimizes for IO.
Optimizing for storage costs
For lower costs, you can use general purpose SSD EBS volumes.
You can also implement a tiered storage using SSDs and HDDs in a hot/warm/cold architecture. Alternatively, AWS S3 for storage is also possible to separate compute and storage. Please see our guide for using open-source ClickHouse with separation of compute and storage here. Separation of compute and storage is available by default in ClickHouse Cloud.
CPU
Which CPU should I use?
The type of CPU you should use depends on your usage pattern. In general, however, applications with many frequent concurrent queries, that process more data, or that use compute-intensive UDFs will require more CPU cores.
Low latency or customer-facing applications
For latency requirements in the 10s of milliseconds such as for customer-facing workloads, we recommend the EC2 i3 line or i4i line from AWS or the equivalent offerings from your cloud provider, which are IO-optimized.
High concurrency applications
For workloads that need to optimize for concurrency (100+ queries per second), we recommend the compute-optimized C series from AWS or the equivalent offering from your cloud provider.
Data warehousing use case
For data warehousing workloads and ad-hoc analytical queries, we recommend the R-type series from AWS or the equivalent offering from your cloud provider as they are memory optimized.
What should CPU utilization be?
There is no standard CPU utilization target for ClickHouse. Utilize a tool such as iostat to measure average CPU usage, and accordingly adjust the size of your servers to manage unexpected traffic spikes. However, for analytical or data warehousing use cases with ad-hoc queries, you should target 10-20% CPU utilization.
How many CPU cores should I use?
The number of CPUs you should use depends on your workload. However, we generally recommend the following memory to CPU core ratios based on your CPU type:
- M-type (general purpose use cases): 4:1 memory to CPU core ratio
- R-type (data warehousing use cases): 8:1 memory to CPU core ratio
- C-type (compute-optimized use cases): 2:1 memory to CPU core ratio
As an example, when using M-type CPUs, we recommend provisioning 100GB of memory per 25 CPU cores. To determine the amount of memory appropriate for your application, profiling your memory usage is necessary. You can read this guide on debugging memory issues or use the built-in observability dashboard to monitor ClickHouse.
Memory
Like your choice of CPU, your choice of memory to storage ratio and memory to CPU ratio is dependent on your case. In general, however, the more memory you have, the faster your queries will run. If your use case is sensitive to price, lower amounts of memory will work as it is possible to enable settings (max_bytes_before_external_group_by and max_bytes_before_external_sort) to allow spilling data to disk, but note that this may significantly affect query performance.
What should the memory to storage ratio be?
For low data volumes, a 1:1 memory to storage ratio is acceptable but total memory should not be below 8GB.
For use cases with long retention periods for your data or with high data volumes, we recommend a 1:100 to 1:130 memory to storage ratio. For example, 100GB of RAM per replica if you are storing 10TB of data.
For use cases with frequent access such as for customer-facing workloads, we recommend using more memory at a 1:30 to 1:50 memory to storage ratio.
Replicas
We recommend having at least three replicas per shard (or two replicas with Amazon EBS). Additionally, we suggest vertically scaling all replicas prior to adding additional replicas (horizontal scaling).
ClickHouse does not automatically shard, and re-sharding your dataset will require significant compute resources. Therefore, we generally recommend using the largest server available to prevent having to re-shard your data in the future.
Consider using ClickHouse Cloud which scales automatically and allows you to easily control the number of replicas for your use case.
Example configurations for large workloads
ClickHouse configurations are highly dependent on your specific application's requirements. Please contact sales if you would like us to help optimize your architecture for cost and performance.
To provide guidance (not recommendations), the following are example configurations of ClickHouse users in production:
Fortune 500 B2B SaaS
Storage | |
Monthly new data volume | 30TB |
Total Storage (compressed) | 540TB |
Data retention | 18 months |
Disk per node | 25TB |
CPU | |
Concurrency | 200+ concurrent queries |
# of replicas (including HA pair) | 44 |
vCPU per node | 62 |
Total vCPU | 2700 |
Memory | |
Total RAM | 11TB |
RAM per replica | 256GB |
RAM to vCPU ratio | 4:1 |
RAM to disk ratio | 1:50 |
Fortune 500 Telecom Operator for a logging use case
Storage | |
Monthly log data volume | 4860TB |
Total Storage (compressed) | 608TB |
Data retention | 30 days |
Disk per node | 13TB |
CPU | |
# of replicas (including HA pair) | 38 |
vCPU per node | 42 |
Total vCPU | 1600 |
Memory | |
Total RAM | 10TB |
RAM per replica | 256GB |
RAM to vCPU ratio | 6:1 |
RAM to disk ratio | 1:60 |
Further reading
Below are published blog posts on architecture from companies using open-source ClickHouse: