Efficient Implementation of D2IA on Top of Flink CEP

Organization
University of Tartu, Tartu, Estonia
Abstract
Data-driven interval analytics (D2IA) is an approach to enable analytics over user-defined data-driven windows. In this approach, the user defines conditions on the properties of events that continuously arrive on a stream. The condition can be absolute, i.e., refers to one or more properties of a single event, or relative, i.e., compares the event to aggregate or another event that has been matched before. The set of events for which the conditions are satisfied are
grouped into an interval. Moreover, the user selects an aggregate function to be computed on those elements.


FlinkCEP is a library for complex event processing. It is used to realize D2IA. One good feature of FlinkCEP is the so-called iterative condition. In this type of condition, FlinkCEP gives contextual access to events matched so far to the specified pattern. With this, it is possible to check whether a new candidate match satisfies the condition in a D2IA specification. For relative conditions, the current implementation is inefficient as the state of the matched events, e.g., the average value of matched events so far has to be recomputed for each new candidate match. To get rid of this inefficiency, it is necessary to shift the state calculation to the inner-working of FlinkCEP.

FlinkCEP maintains an internal state in the form of instances of the non-deterministic finite state machine forming the pattern to be matched. This state is defined in the class NFAState as a queue of ComputationState instances. Our objective is to extend the computation state with all types of aggregates needed for D2IA. Moreover, we need to update the working of the NFA so that the aggregate is updated incrementally on the acceptance of a candidate match. Thus, at condition evaluation time, we get O(1) access to the aggregates.
Graduation Theses defence year
2021-2022
Supervisor
Ahmed Awad
Spoken language (s)
English
Requirements for candidates
It is required to have moderate knowledge of Java. Familiarity with Apache Flink is preferred but not required.
Level
Masters
Keywords
#stream processing, CEP, interval analytics

Application of contact

 
Name
Ahmed Awad
Phone
E-mail
ahmed.awad@ut.ee
See more
https://bigdata.cs.ut.ee/efficient-implementation-d2ia-top-flink-cep