Efficient Implementation of D2IA on Top of Flink CEP

Organisatsiooni nimi
University of Tartu, Tartu, Estonia
Data-driven interval analytics (D2IA) is an approach to enable analytics over user-defined data-driven windows. In this approach, the user defines conditions on the properties of events that continuously arrive on a stream. The condition can be absolute, i.e., refers to one or more properties of a single event, or relative, i.e., compares the event to aggregate or another event that has been matched before. The set of events for which the conditions are satisfied are
grouped into an interval. Moreover, the user selects an aggregate function to be computed on those elements.

FlinkCEP is a library for complex event processing. It is used to realize D2IA. One good feature of FlinkCEP is the so-called iterative condition. In this type of condition, FlinkCEP gives contextual access to events matched so far to the specified pattern. With this, it is possible to check whether a new candidate match satisfies the condition in a D2IA specification. For relative conditions, the current implementation is inefficient as the state of the matched events, e.g., the average value of matched events so far has to be recomputed for each new candidate match. To get rid of this inefficiency, it is necessary to shift the state calculation to the inner-working of FlinkCEP.

FlinkCEP maintains an internal state in the form of instances of the non-deterministic finite state machine forming the pattern to be matched. This state is defined in the class NFAState as a queue of ComputationState instances. Our objective is to extend the computation state with all types of aggregates needed for D2IA. Moreover, we need to update the working of the NFA so that the aggregate is updated incrementally on the acceptance of a candidate match. Thus, at condition evaluation time, we get O(1) access to the aggregates.
Lõputöö kaitsmise aasta
Ahmed Awad
inglise keel
Nõuded kandideerijale
It is required to have moderate knowledge of Java. Familiarity with Apache Flink is preferred but not required.
#stream processing, CEP, interval analytics

Kandideerimise kontakt

Ahmed Awad
Vaata lähemalt