Predictive Data Analytics Group

The research field of data analytics/science has grown significantly in recent years as a means to make sense of the vast amount of available data. It has permeated every aspect of computer science and engineering and is heavily involved in business decision-making. For example, in the field of performance engineering, performance prediction is an instrument for controlling and improving the behaviour of a system. Analogously, data analytics is playing an essential role for companies. In both examples, the streamlining of data analytics processes (DataOps) enables multi-tenant access to data and models. In addition, the striving for reproducible and reliable evaluations prohibits “data analytics by server under desk” approaches and demands generic architectures and software stacks.

The definition of such stacks poses a multitude of questions related to software and performance engineering: (i) The choice of low-levels of the infrastructure including the storage medium, redundancy mechanisms, and file systems; (ii) The choice of a storage system suited for the type of queries issued by analytics/machine learning tools; (iii) The choice of specific mechanisms and procedures to be used for specific types of data, and the choice of the right tools for specific mechanisms; (iv) The choice of the right methodology for a specific problem.

However, in all these cases there is no one-size-fits-all solution, as the concrete solutions depend very much on the type of analytics to be carried out. Therefore, dynamic aspects (scheduling) have to be considered as well as static ones (planning). To this end, fine-grained models of the entire stack and analysis