Statistical Process Control

Classic monitoring systems save historical metric values in the relational database. It works perfectly if the polling period is relatively long, old events are regularly purged, and the total number of devices is not too large.

But what if we poll ten metrics from one thousand devices every minute and keep data for ten years? We'll need to store 50 billion samples in our database! However, we're genuinely not interested in the minute statistics that is five years old. For such old data, we'd like to know yearly averages. And AggreGate IoT Platform has a solution for this case.

AggreGate can store long-term time series data in a Round-Robin Database (RRD). The module responsible for collecting, storing and processing time series data is called Statistical Process Control (SPC) module, or just statistics.

Round-robin database aims to handle time-series data, like network bandwidth, temperatures, CPU load, etc. The data is stored in a circular buffer, so the system storage footprint remains constant over time.

The RRD database has two essential benefits for storing long-term statistical data:

  • Small and constant database size
  • Extremely fast access to the historical data for any time period

Other features of SPC module include:

  • Automatic calculation of rates (e.g., flow rate or traffic)
  • Working with gauge-type and counter-type values
  • Tracking minutely, hourly, daily, weekly, monthly and yearly averages, minimums and maximums
  • Configurable degradation of precision for older values
  • Concurrent usage with "classic" RDBMS-based non-aggregated historical values storage

Granulation

Granulation module is designed to split continuous time into sections (granules) for calculating and storing various aggregate values in each granule. It is similar to RRD-based statistics, but instead of fixed-length interval sets it can use any advanced slicing of the time axis:

Real days in a certain time zone, including daylight saving consideration Morning, day, evening and night hours of every day
Real months with respect to leap years Flexible company's work shifts
Weekdays, weekends and holidays Any more complicated time slicing

Higher flexibility of granulation comparing to the round-robin database pays back by lower data update/retrieval performance and higher disk space consumption.

Granules use regular storage facility to keep any user-defined data for every time slice. Here are some samples:

  • Average, minimum and maximum value
  • Sum of values of any other equation-based result
  • First and last values in the time slice, as well as their timestamps
  • Counts of samples with different quality (good, bad, unreliable, etc.)
  • Granule-wide marks, such as "data not available"
  • Any custom numeric, textual or binary data