Every alert has one or more triggers defining when to raise it. These can be either event triggers or state triggers.
Each trigger may check one or more devices or resources, e.g. all devices in a group. Combined with the ability to set up multiple triggers per alert, this allows very flexible setup.
An event trigger is raised when an event of a particular type conforms to the trigger condition. This condition is flexibly configured by an expression, allowing complex checking. For example, a vehicle monitoring system may generate an alert if the "impact" event received from a vehicle controller indicates that the impact strength exceeds a threshold.
Event triggers have support for event correlation, allowing an alert to be activated by the event of one type and deactivated by the event of another type (correlated event).
Any event trigger might be configured to activate only if a sequence of similar events occurred within a certain time frame.
A state trigger can either be raised in response to a certain state, or to any change in the state of whatever is monitored. A state trigger periodically checks the specific variable's value (also pointed by a custom expression).
State triggers have configurable hysteresis (deadband) time for activating the alert only if the condition lasts longer than the certain time period. For example, a state trigger may raise an alert if the temperature rises over 120 degrees for more than 3 minutes. Separate rearming hysteresis is also supported.
Conditions of state triggers may be checked against dynamically adjusted baselines, such as monthly average or weekend maximum. Plus, state triggers support value flapping (frequent change) detection that is reported as a separate alert type.
Once raised, an alert may remain active while its causing condition is in force or until an event correlated to the activation event is received. The server keeps the global active alert list and tracks active instances associated with every resource and device. Active alerts with high priority are usually visualized on the system overview dashboards.
When a certain error occurs, it often requires one specific remedy. For example, when available memory on a device becomes low, its internal database must be downloaded or purged. This is always the case – it’s never another action, such as turning the device off or running a servo.
Because of it being so predictable, it can be automated. Any system action that is available in the user interface may be automatically launched in response to an alert.
If no operators are on duty or system functions are in standalone mode, the corrective action launched is "non-interactive" (also called "automatic" or "headless"). There are also "interactive" corrective actions which require operator's input in real-time.
Some interactive corrective actions:
- Starting a custom operator-driven incident resolution workflow
- Running a database purge, asking the operator first: "Are you sure?"
- Rebooting a mission-critical device only after getting confirmation from the operator
Some automatic corrective actions:
- Preparing a status report about the device causing an alert and sending it by e-mail
- Executing an external application that fixes the problem
- Creating a new ticket in the Service Desk system