# PhD on research on an industrial production process

Computer scientist Bas van Stein conducted research at Tata Steel and BMW on how their production processes could be streamlined and optimised on the basis of data. As part of his research he developed some innovative products. PhD defence 20 September.

Van Stein was able to be most innovative at Tata Steel, where dozens of different defects can arise in the steel during production. The company wanted to reduce the number of defects.

## A photo for every millimetre

Checking the steel by hand or eye happens only occasionally at the company now: equipment takes a photograph of every millimetre of steel produced, and on the basis of this the quality is assessed in ‘real time’. All the information is stored in datasets. The vast quantity of data that this yields is retained for ten years, but very little is done with it. The first thing that Van Stein had to master was: what does all that information mean, what does it contain that could be useful to me, what do the various parameters mean? Quite a task. He also discovered that the data were not complete. ‘This is normal’, says Van Stein. ‘Data are never perfect in practice. Malfunctions sometimes occur, and errors are made during manual input.’

## Missing data

Van Stein devised several new methods in the context of his research. For instance, he developed a new algorithm for predicting the missing values in the datasets, repairing the data column-by-column using a model. In this way, he gradually built up a fully repaired dataset. He also developed a new method for analysing patterns of missing values in datasets. This method quickly revealed where values were missing in a dataset and how those blanks correlated with each other; from this it was partly possible to deduce why the data were missing.

## Deviations

Van Stein also devised a new algorithm for Outlier Detection, which can find deviating data points in a multidimensional dataset, in this case applied to the steel surface. To determine whether a data point is abnormal, the algorithm not only looks at whether it deviates from the complete dataset, as most algorithms in this domain do, but also compares the deviation with datasets (steel rolls) that are comparable to the dataset to which the deviating data point belongs. As a result, the algorithm is better able to determine which parts of a long steel roll have unusual or deviating properties.

## Improvement of kriging

Cluster kriging was developed by Van Stein and Hao Wang, and is an improved way to apply kriging to large datasets. Kriging is a method for machine learning in order to model numerical data; in other words, to employ them in such a way that they gain predictive value. Using this in the production process, the correlation between input and output is statistically determined.

The advantage of kriging over similar algorithms is that this method not only makes a prediction of an as-yet-unobserved data point, but also says something about the certainty of the prediction. The disadvantage of kriging – namely, that making the model is a very slow process – was overcome by Van Stein, using a clever way of splitting the data into smaller sets.

## New approach

An important step was then the testing phase of the improvements that Van Stein had devised, which also offered the possibility of further optimisation. ‘You obviously can’t do that immediately in the factory, because it would be much too risky.’ Model-driven optimisation was therefore the next step: when the model had been trained, it could be used to optimise the correlation between input and output. For this it was important to minimise the uncertainty of the predictions. Van Stein developed an approach in which this uncertainty can be calculated for any kind of model; that is, not only for kriging.

## Turning the knobs

The function of a model is to be able to turn the knobs and see what happens. What does it mean if you adjust that one parameter, or that other one? ‘At first, you could see in the model that a specific solution caused another problem somewhere else, but the ability to overcome this got better and better.’ It was not possible to predict all types of defects, but Tata Steel were still satisfied; they now have much more insight into their data and what they can achieve with them.

At BMW the quantity of data was insufficient to use the same approach, but the company is now able to manually adjust the production process if the steel quality deviates slightly.

## Freedom to publish?

To what extent was Van Stein free to publish his research in his dissertation? ‘It was agreed in the contract that the companies’ source data could not be published. But apart from that, there were very few restrictions. The results and visualisations in the dissertation are all real, and so too are the methods that I developed, of course.’

## Learning path for everyone

Van Stein describes the trajectory at the companies as a learning path for both parties: for him and also for the company. For instance, he and the companies had to learn how to communicate effectively and efficiently with each other. Another factor is that the culture in companies is aimed at the highest possible financial revenues, which is different in the University. And what was the most striking point? ‘I was surprised that in one of the companies they talked about the possibility of the workers not wanting to cooperate, for example with carrying out an extra action. In that case, they wouldn’t have introduced the improvement. That was something I hadn’t expected.’

Text: Corine Hendriks
Mail the editors