An Introduction to Using Machine Learning to Enhance Anomaly Detection
Anomaly detection has been used in data analysis and data mining disciplines for many years. Techniques such as deviation from the norm are used to analyze various data sets ranging from finances to grades. (Can anyone say, “Bell curve”?!) 🙂 Now, with the rise of machine learning (ML), anomaly detection can be more automated and more broadly applied.
This blog article is a basic introduction to using machine learning to enhance anomaly detection. In it, I will discuss what anomaly detection is and illustrate it by example. I will then discuss how machine learning can be used to enhance traditional anomaly detection. For a concrete example of how ML advances anomaly detection, I will use predictive maintenance in mechanical equipment.
What is Anomaly Detection?
Anomaly detection is the identification of rare items, events, or observations that raise suspicions because they differ significantly from the majority of the data (the baseline of normal).
Traditionally, anomaly detection has been largely a manual effort. Due to the manual effort, identification of the anomalies has been largely reactive. Anomalies are identified after the fact rather than before they occur.
Static vs. Dynamic Thresholds
Also, static (or hard-coded) thresholds have been used historically to trigger real-time alerts. Determining these thresholds often requires extensive manual review and expert domain-level knowledge. And determining these thresholds can be difficult. You face the “goldilocks” problem. They can't be too strict or too lenient. Too strict, the system generates false alarms. Too lenient, important events are missed.
Further, static (manually determined) thresholds are limited. Many are based on a single data source and lack broader context from data gathered over time. For example, using only real-time data from an audio sensor you can set thresholds regarding decibel level. But without historical data, you cannot look for patterns in the audio data to establish a baseline pattern.
Predictive vs. Preventative Maintenance
Predictive maintenance is maintenance performed based on monitoring the actual conditions within a machine. Typically, this is done using sensors. This contrasts with preventative maintenance. Preventative maintenance is performed based on usage amount or time since last service.
An example of preventative maintenance is changing the oil in your car every 6 months or 6,000 miles. In contrast, predictive maintenance uses sensors that monitor the quality of the oil (not merely how long it has been used). It alerts you to change the oil when the quality is too low regardless of time or mileage. For example, your car would alert you after 5,000 miles if the oil quality is poor, one thousand (1,000) miles before the preventative maintenance is due. While a small difference in a single occurrence, over the years it helps prolong the life of your vehicle and prevent more serious issues.
How Machine Learning Enhances Anomaly Detection
Machine learning takes anomaly detection to the next level by using processed data to create a baseline and identify anomalies. For example, data from an audio sensor could be recorded and processed over time. This data could be turned into a visual graph for a human to look for patterns. But this same "graph" in an ML model allows the ML algorithm to detect the anomaly automatically. This processing of historical data also helps to reduce the noise-to-signal-strength so that true anomalies are easier to identify and respond to.
One major tenet of using ML for anomaly detection is the model is “trained”. This training is done by baselining the normal data collected over time. A deviation from the baseline (the norm) triggers an alert. This is advantageous as historical failures or malfunctions are not needed. And producing such failures and malfunctions can be quite costly.
This baselining plus anomaly detection is the concept of deviation from the norm. Using the normal baseline, an ML model can tell if the data differs enough to raise an alarm. Properly training an ML model establishes an accurate normalized baseline. Then, if the data is similar enough to the baseline (e.g. within a standard deviation), no alarms are raised. But if the data differs enough (e.g. outside a standard deviation), an anomaly is identified. This anomaly can then be used to create and send an alert.
The Example of a Timing Belt
For a concrete example, consider the sound a timing belt makes. The sound of a belt in good working condition differs from the sound in a degraded condition. But this sound may not be detectable to the human ear. In an ML model, the sound is represented as a graph. This graph can be used to detect the deviation of the sound from the norm. If undetected and the timing belt breaks, you are looking at a major cost and inconvenience. (And Murphy's Law demands it will be at the worst possible time and place of your daily commute. 🙂 You’ll be paying for a tow truck and the cost of repairing an engine with bent valves. (Not cheap!) But if ML detects the anomaly and alerts you, both time and money are saved!
What are the Benefits?
Time, Money, and Safety
Replacing a timing belt before it breaks saves time and money and avoids potentially unsafe conditions. Detecting equipment failures before they occur allows you to perform planned, targeted maintenance based on the actual condition of the equipment. This means you can be proactive rather than reactive. You can schedule downtime when it is required instead of having to perform unplanned or unnecessary maintenance. Sometimes these unexpected failures cause further damage to the equipment (e.g., bent valves) resulting in a more costly repair or loss in revenue due to unplanned downtime. Equipment failure can also create unsafe conditions for humans. Avoiding the failure results in safer conditions for people. And safety is good for both the human and the bottom line.
Prolonged Equipment Life
Plus, being able to address issues before they become failures prolongs the life of your machines. Over time, failure of one part can lead to failure in other parts.
Multi-Variable Data Correlation
But ML extends beyond a single sensor. ML also allows you to correlate data across multiple variables. This allows for broader, more insightful predictions and recommendations across multiple sensors. For example, failures in an unmonitored component may be preceded by tell-tale signs in other monitored components. By themselves, the data points from the monitored components may not indicate anything. But together they may point to a deeper issue in the unmonitored component.
Anomaly detection has applications just about everywhere.
At home, anomaly detection could be used to monitor your appliances such as refrigerators, A/C units, water heaters, and more. You could even use it to detect when it’s time to clean the dust out of your laptop or desktop computer. (Now might be a good time to do that if you haven't done that in a while...or ever. 🙂
Within IT operations and security, anomaly detection has a myriad of applications. Such applications range from network utilization trending to DDOS attack detection to application availability monitoring.
And on an industrial scale, anomaly detection has application in monitoring construction, manufacturing, and agricultural equipment. For example, audio sensors could be placed to capture sound from critical components such as pumps, fans, or valves. This targeted approach would allow you to reduce the impact of the most common and critical points of failure.
An Example Business Application
Consider one possible business application. A transportation company that operates a fleet of self-driving vehicles requires constant maintenance on those vehicles. If the vehicles are on a preventative maintenance schedule, receiving maintenance every 6,000 miles, there will be times when a part wears out faster and breaks between maintenance dates. The company will incur repair costs, lose revenue, experience a decrease in customer satisfaction, and impact employee productivity. And, again, Murphy's Law demands the car break down with a customer in the middle of her trip to a crucial meeting. Without a predictive maintenance system using ML for anomaly detection, you have a broken-down car, an unhappy customer, and a damaged reputation. Using ML with anomaly detection for predictive maintenance, looming part failures could be caught before they occur. The impacted car could be taken out of service and replaced by one in good working condition to avoid unexpected disruption and negative impacts.
Using ML to enhance anomaly detection has a myriad of benefits and applications. It enables more intelligent predictive maintenance. This helps to avoid disruption, reduce downtime, save money, increase safety, and build a positive reputation.
I hope you found this article helpful in understanding the value of using ML for anomaly detection. If you want to discuss how you might be able to leverage anomaly detection and/or machine learning in your environment, reach out to us here at 5.15. We’d love to discuss how we can help you with your machine learning and automation needs!