1. 程式人生 > >Anomaly Detection with Isolation Forests using H2O

Anomaly Detection with Isolation Forests using H2O

Anomaly detection is a common data science problem where the goal is to identify odd or suspicious observations, events, or items in our data that might be indicative of some issues in our data collection process (such as broken sensors, typos in collected forms, etc.) or unexpected events like security breaches, server failures, and so on. Anomaly detection can be performed in a supervised, semi-supervised, and unsupervised manner. For a supervised approach, we need to know whether each observation, event or item is anomalous or genuine, and we use this information during training. Obtaining labels for each observation might often be unrealistic. A semi-supervised approach uses the assumption that we only know which observations are genuine, non-anomalous, and we do not have any information on the anomalous observations.