Levitate's Alert Studio supports following algorithms for anomalous pattern detection.
The high spike algorithm is designed to detect sudden increases in signal values, particularly when the increase occur within a short time frame. It is especially useful for detecting sudden jumps in the number of 4xx, throughput, and edge hits, which are a good fit for the high spike. The high spike algorithm compares the current data point with the last 60 minutes worth of data points to check whether a given point has a considerably large amplitude or not.
The high spike algorithm is designed to detect sudden increases in signal values over a short period of time, while the low spike algorithm is useful for identifying sudden drops in signal values.
Eligible Signals for High Spike
Signals similar to following can be used for high spike pattern detection.
The low spike algorithm is particularly helpful in identifying sudden drops in signal values, Signals such as CPU utilization, cache hit rate, and availability are good fits for the low spike algorithm. The algorithm compares the current data point with the previous 60 minutes of data to determine whether a given point represents a significant drop or not.
The level change algorithm is different from high/low spike algorithms in that it detects when a data pattern has changed rather than detecting a single or few large jumps or drops.
Eligible Signals for Low Spike
The level change algorithm detects the point at which data begins to exhibit a new pattern that is different from the old. The data will have different patterns before and after the level change.
To determine if an incoming point is a candidate level change, the algorithm checks if it is different (too high or too low) from the data over the last hour.
How is this different from a high/low spike?
If the data shows a single or a few large jumps or drops, this algorithm will not detect them. A single different value or even a few of them do not necessarily indicate that the pattern has changed or that there is a new pattern.
Eligible Signals Level change Algorithm
A trend algorithm is a useful tool for detecting deviations in a signal from its expected pattern compared to its behaviour over a certain number of previous days.
- For each incoming data point, collect relevant data from the past (this is the reference period or seasonality). It is not necessary to collect all past data.
- To determine if an incoming data point is an anomaly, compare it with a reference period from the past.
The trend deviation algorithm detects when unexpected data points or patterns have occurred compared to the previous 7 days of data.
In the below scenario, the trend algorithm will detect anomalies at 10 a.m. (red circled) because it is not expected when compared to its previous days (reference period- red-colored rectangular boxes)
In Figure 2, if we observe the signal pattern carefully, the trend algorithm will not detect any anomalies at 10 a.m. (red-circled) because the point or peak is expected when compared to its previous days.
In Figure 3, if we observe the signal pattern carefully, the trend algorithm will detect anomalies at 10 a.m. (red circle). Although it is a repetitive peak, the amplitude of this peak is much higher than the peaks of the previous days.
Eligible Signal for Trend (increasing / Decreasing)
How to select the right algorithm?
Each algorithm matches a specific pattern and raises an alert when it is encountered. To use it effectively, the user should follow the below process when choosing an algorithm.
- Define normal behaviour. It is important to know what the acceptable behaviour of the signal is. One simple way of doing this, is to look at the signal over the relevant span, and try and point out the timestamps where the signal deviates from the normal behaviour, and you would like to get alerted. Remember, an algorithm is not able to detect deviation from normal behaviour, if a trained human cannot.
- Identify the anomalous pattern(s) in the signal. Different signals exhibit different anomalous behaviour. Some might show spikes, some might show level change. Eg, for a signal like CPU usage, a sharp spike that returns to baseline may be perfectly normal behaviour, but for a business metric it may not. Knowledge of the underlying processes that generate the signal is essential to determine the correct pattern.
- Check if a PromQL expression captures the intended deviation better. PromQL is a very powerful language with many functions. For detecting deviations that can be defined in terms of relative values, percentages, or some rollup formulae on historical data, prefer defining the PromQLs accordingly.
For eg., if a signal has a normal range if
it stays in a range of minimum and maximum of the 15 minute medians over the last 2 days, with a tolerance of 20%
The PromQL to detect this would be
s < min_over_time(median_over_time(s)[15m])[2d]*0.8 || s > max_over_time(median_over_time(s)[15m])[2d]*0.8
s is the original signal metric.
- Check the Algorithm. If the pattern that you want to match cannot be expressed easily like demonstrated above, check if any built-in algorithm can satisfactorily match the pattern. Remember that each algorithm has its own limitations, and it is important to understand them when working with signals.
Signals that don't meet the requirements of any of the algorithms should be handled differently. By selecting the appropriate algorithm and adjusting the sensitivity to match your use case, you can improve the accuracy of these pattern detections.
When not to choose a pattern matching algorithm?
As a rule of thumb, a pattern matching algorithm should be chosen in situations where a human who is looking at the plot can define, with a high level of accuracy, where an alert should be generated and where it should not be generated. If, by looking at the plot, it is not possible for a human to determine the alert points, it is highly unlikely that any of the above algorithms can succeed.
Below are a few signals which are not a good fit for any one of the above algorithms
The above signal is mostly zero-valued. Applying high spikes, low spikes, or increasing trend to these types of signals will cause each and every peak to be alerted. It is better to use a static threshold instead of pattern matching functions on these types of signals.
Alert Studio supports static threshold based alerting as well.
This signal is a discrete-time signal. At any given point in time, it can have one of three possible values (1000, 1500, 2000) or no value at all. For this type of signal, a static threshold may be a better choice.
These signals should be handled differently as they do not follow a predictable pattern, making it difficult to detect patterns.
While deciding the pattern detection algorithm, it is important to understand the nature of the signal and the objective of the alert before choosing the algorithm. This guide describes a few guidelines which can be used while deciding the pattern algorithms with Levitate Alert Studio.