Toward a reliable cloud

17 December 2020

The cloud is of increasing importance in our daily lives. It is thus crucial that they work properly and are reliable. Alex Uta, assistant professor at LIACS, received a Veni grant to investigate the reproducibility of experiments in the cloud.

Digital life in the cloud

‘Almost everything you nowadays do with a computer touches a cloud,’ Uta explains. ‘Sending an e-mail, having video meetings, using social media, transferring money through your bank account; all these operations go through a cloud or are in some way connected to the use of a cloud. A lot of these things are time sensitive. You want these operations to be reliable and consistent, taking the same amount of time every time you do them. If you for instance transfer money, you want it to happen as soon as possible, and not that it takes five seconds today and two hours tomorrow. However, the clouds do not operate in such a manner at the moment.’

Cloud design

The problem is caused for a large part by the way clouds are designed. Clouds are a piece of virtual hardware. This means that it looks similar to a personal computer: it gives users the illusion that only one person uses a piece of hardware at a given time. In the cloud, this is not the case. Actually, there are multiple users working next to each other on the same piece of hardware. Those users are competing with each other, all trying to run a process at the same time. And as they do that, they influence the performance of each other’s processes, which can cause problems in the time it takes to run a process. Uta: ‘If there is a lag in the conversation you have with one other person on Teams, that will not cause much of a problem. However, if you are teaching 200 students online and there are repeated inconsistencies, that will be a problem, as they will miss parts of what you are saying. Thus, you want all those operations to be consistent in the time they take to proceed, in order to have a reliable system. In my project, I want to come up with ways to quantify the time it takes to process certain operations, so we can offer certain guarantees.’

Testing and experimenting

The design of the cloud is one problem, but the development and testing of applications plays an equally important role. There is no standard way of running an experiment or reporting about it. ‘Developers and researchers claim something works within a specified time interval, but that is not the real certainty we need,’ Uta stresses. ‘At the moment, we do not give proper statistical guarantees that something really works within some amount of time. What we need is a clear description of the experiment and its conditions, and a proper framework for strong statistical guarantees of performance results. Only then, everyone is able to run the same experiment under the same conditions, which of course means the results are closer to achieve the goal of reproducibility.’

Mock-ups of reality

The experiments Uta is referring to are the experiments programmers do to make sure that a system functions in the proper way. These experiments are actually little mock-ups of reality. For a banking system, it is for example necessary to test the money transferal process. To make sure the result of the test is reliable, programmers need to run an experiment multiple times. ‘It is thus crucial that the experiment is reproducible, so that the process in the test phase does not differ from the process when thousands of users are using the system in real life,’ Uta says.

Dangerous situations

The importance of preventing those processing time differences to occur and have a reliable cloud must not be underestimated. Not only our e-mails, video meetings and document storage happen in the cloud, but also (parts of) car navigation or perhaps the control of traffic lights. If these operations are not consistent in the time they take to process, but one time take one second and another time take ten seconds, people will miss the exit or, in the traffic light example, might even crash with another car. ‘In the future, this will even get more important. Self-driving cars will communicate with clouds, but if the variability in the cloud is too big, these cars could brake too late or not have an update in time, causing dangerous situations,’ Uta explains.

Change of methods

Uta envisions the biggest challenge of his research probably not to be the research in itself, but convincing people to use the reproducible standard that he comes up with. At the moment, people do not have these standards and it will be a challenge to convince them to use a systematic way of running performance experiments. But Uta is confident that changes are possible: ‘I hope that in four or five years, when I have finished this project, the state-of-the-art in running experiments will have increased by a good margin.’

Text: Chris Flinterman