New paradigm for visual recognition

03 February 2017

Leiden University computer scientists Yu Liu, Yanming Guo and Michael Lew are a step closer to their ultimate goal: search engines with visual recognition. Their publication of a new algorithm for fusing multi-scale deep learning representations has been received with great enthusiasm. No other algorithm in the world is, at this moment, better able to recognize images.

The forest vs. the trees

All three researchers are in the Deep Learning group of the Leiden Institute of Advanced Computer Science (LIACS). Associate professor Michael Lew: ‘Multi-scale deep learning representations are models of visual concepts that depend on both many neural layers and on diverse scale information. You can imagine scale as different zoom levels on a camera or seeing the forest vs. the trees.’

Best paper out of 198

The computer scientists presented their publication titled On the Exploration of Convolutional Fusion Networks for Visual Recognition at the 23rd International Conference on MultiMedia Modeling and received the Best Paper Award for it. This was out of 198 submissions to the conference. Michael Lew: ‘With our algorithm, called Convolutional Fusion Networks, or CFNs, we present a new paradigm of so-called convolutional neural networks where the information from earlier layers is directly exploited.’

Size matters

The algorithm has the top accuracy worldwide in the for the area well known CIFAR-100 visual concept recognition test for comparable size networks. ‘In current deep learning research, an important factor is the size of the network. Larger networks are expected to perform better but also require significantly more computational and training resources. Because not everyone has a supercomputer at their disposal, each real world situation may have different computational and training resource requirements. CFNs have been shown to have the best accuracy for a specific network size.’

Find comparable images

Michael Lew’s overall research aim is to make visual recognition common for search engines like Google Images. He wants to enable us to search for images on the internet by just providing the search engine with one comparable image. ‘Right now, search engines work with the description of images that is submitted in words. But sometimes it is so difficult to exactly describe what you are looking for. In computer aided diagnosis, a doctor may want to find the most similar image in their database to assist in the diagnosis. Or if you want to buy a pink scarf, it would be great to provide the internet with a picture of a scarf that you like, and have it suggest similar ones that either look nicer or are more affordable.’