Data-Driven Machine Learning and Optimization Pipelines for Real- World Applications
Machine Learning is becoming a more and more substantial technology for industry.
- Koch, M.
- 01 september 2020
- Thesis in Leiden Repository
Machine Learning is becoming a more and more substantial technology for industry. It allows to learn from previous experiences in an automated way to make decisions based on the learned behavior. Machine learning enables the development of completely new products like autonomous driving or services which are purely driven by data.The development of such new data-driven products is often a long procedure. Even the application of machine learning algorithms to specific problems is mostly not straightforward. To illustrate this, a data-driven service, called Automated Damage Assessment, from the automotive industry is introduced in this work. Based on the gained experience from such data-driven service developments, this dissertation proposes a methodology to develop data-driven services in an accurate and fast manner. The Automated Damage Assessment service is based on sensor data, i.e. data recorded from vehicle on-board sensors over time. Using such time series from more than one sensor results in a multivariate time series. The existent methods to solve multivariate time series classification-problems are often complex and developed for specific problems without being scalable. To overcome this, suitable approaches with different complexities are proposed in this work. These approaches are applied on multiple publicly available data sets and on real-world data sets from medical and industrial domain with the result that especially two AutoML (Automated Machine Learning) approaches, namely GAMA and ATM, as well as one of the proposed approaches (PHCP) are most suitable to solve these particular multivariate time series problems.