Monitoring
On your CloudBlue Commerce installation, you must constantly monitor resource usage metrics and compare them with thresholds that you define based on your strategy of capacity planning.
These are the standard strategies that you can use for capacity planning:
-
The Lead Strategy: With this strategy, you increase the capacity of your installation in advance even before the actual demand increases. This strategy is suitable when demand is high, but it has the risk of resource underutilization when real demand does not match expected demand.
-
The Lag Strategy: This strategy is opposite to the Lead Strategy. With this strategy, you increase the capacity of your installation only after the current capacity is stretched to its limits. Although this strategy decreases the risk of resource underutilization, it may result in performance issues and customer dissatisfaction when real demand is higher than expected demand.
-
The Match Strategy: This strategy is a combination of the Lead and Lag Strategies. Rather than significantly increasing the capacity of your installation in advance or increasing the capacity only when it stretches to its limits, you make small, incremental changes to the capacity of your installation based on the latest market conditions.
-
The Dynamic Strategy: With this strategy, you increase or decrease the capacity of your installation based on the analysis of the current demand and sales forecasts. This strategy allows you to keep a balance between effective resource utilization and performance, but it requires advanced analytics tools that can provide accurate forecasts.
For system components, the main resource usage metrics are CPU usage and RAM usage. As soon as these metrics reach their thresholds, you must start a new iteration of capacity planning. Here are examples of thresholds that you can use for different strategies:
Metric | Lead Strategy | Match Strategy | ||
---|---|---|---|---|
Target | Warning | Target | Warning | |
CPU usage | 60% | 70% | 70% | 80% |
RAM usage | 60% | 70% | 70% | 80% |
For the system database server, the main metrics are the CPU usage, RAM usage, disk space usage, throughput (transactions per second), I/O throughput (input/output operations per second), and cache hit ratio. As soon as these metrics reach their thresholds, you must start a new iteration of capacity planning.
Note: To monitor the cache hit ratio of your system database server, use the heap_blks_hit and heap_blks_read numbers provided by PostgreSQL. If sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) is less than 99% for a database, you must add more memory to your server. To learn more about heap_blks_hit and heap_blks_read, please refer to the PostgreSQL documentation.
In addition to monitoring the main metrics, you should monitor the following metrics:
-
The number of active accounts
-
The number of active users
-
The number of active subscriptions
-
The number of login operations per business day
-
The number of page views (in Google Analytics)
-
The number of orders
-
The number of resource rates per service plan
-
The number of service plans
-
The number of delegated service plans
As a result of monitoring your installation for a meaningful period of time, for example, one year or more, you will see correlations between various system metrics. Knowing these correlations will allow you to plan the capacity of your installation with more accuracy.