Description

By default, Eagle alert engine uses CEP engine (WSO2 Siddhi) as first class citizen for policy evaluator. CEP engine can handle a lot of use cases in that it naturally support filtering or window based aggregation against stream data. But it is also possible to extend eagle alert engine to support more advanced use cases.

Thanks to eagle alert engine's capability, when we implement a new policy evaluator, we don't need worry about how data schema is defined, how data is partitioned and how multi-tenant data is handled etc. We only need implement PolicyStreamHandler interface and handle each incoming event. It is policy evaluator's responsibility to manage its own state if applicable.

The possible extensions of policy evaluator are no data alert, absence alert, percentile alert, and even machine learning based policy evaluator.

No data alert

Given N entities which continuously send out events, if M entities (M << N) don't send out events for a specific time, e.g. 1 hour, then alert.

The proposed solution is:

  • Maintain time window of WINDOW_PERIOD, e.g. 1 hour
  • Upon each incoming event, slide the window if some events in this window expire
  • Compare distinct entities in current window (WIRI) with distinct entities which should be (WISB), if some entities exist in WISB but not in WIRI, then alert
  • WISB can be provided or dynamic. For provided WISB, user should specify full set of entities. For dynamic WISB, user does not need specify full set of entities. Alert engine will use last WIRI as WISB


Note

1) The assumption is that data always come in frequently to trigger set timestamp for time slide window. (To remove this assumption, we may need a separate background thread to trigger timestamp)

2) this may not be working for the use case where the whole topic does not have any data. As the above solution needs incoming event to trigger window sliding.

3) how about one node is in SA check status

 

Absence Alert

Something must happen within a specific time range within a day. For example one type of job must be running from 1PM - 2PM every day.

This type of alert is similar to but different from No Data Alert in that there is fixed time window in Absence Alert while there is slide time window in No Data Alert. So space complexity for Absence Alert is O(1) while space complexity for No Data Alert is O(window), window is size of slide time window.

Note:

1) The assumption is that data always come in frequently to trigger set timestamp for time window.

SLA alert

Some task must be completed within SLA. For example when user submits a spark job, SLA can be specified for that job, e.g. 1 hour SLA.

This type of alert does not need customized implementation as the task's status can be fed into eagle alert engine periodically, we can easily use Siddhi CEP engine to check whether the task is completed at specified time. 

 

 

  • No labels