Deep Learning: Science Fiction or Reality?

Ready or not?

Deep Learning: Science Fiction or Reality?

There is no need to learn e.g. TensorFlow to successfully apply deep learning. But how complex are actual deep learning solutions for machine vision and which applications are they good for?

Todays deep learning networks can be trained with typically 20 to 50 training images within five to ten minutes on a modern GPU platform, e.g. object classification of sushi meals. (Bild: Adaptive Vision - Future Processing Sp. z o.o.)

Images 1 + 2 | Modern deep learning networks can be trained with typically 20 to 50 training images within five to ten minutes on a modern GPU platform, e.g. object classification of sushi meals. (Pictures: Adaptive Vision – Future Processing Sp. z o.o.)

From the point of view of a regular user of machine vision systems the topic of deep learning might be confusing. This highly disruptive technology is relatively young and different experts of the field provide contradictory explanations of the subject. Some say that deep learning can behave like a human being, some say it is just sophisticated classification. There is also a question of the skills required for the user to effectively take advantage of it. Does it require a PhD diploma in machine learning or is it suitable just for every regular production worker? In short, each of the answers may be correct depending on the context the speaker has in mind. In this article I will explain it from the point of view of vision-based inspection systems.

The first time I met deep learning in reality was when a colleague of mine showed me how well his brand new phone could translate his voice into written notes. It was a time when I had only seen some early voice recognition systems for cars which had real trouble understanding a simple command such as ´home´. Then the progress was fast and remarkable. Soon after we heard about deep learning being successfully used in recognition of hand-written letters, in classification of objects on images or in programs that played chess better than any program had played before. Today the most interesting commercial topics became medical diagnosis, driving autonomous vehicles, automated harvesting and industrial quality inspection.

Deep learning for industrial vision

To efficiently discuss these applications we must not confuse the underlying theory and technology with real-world products. This is like discussing software in 80′ or early 90′. Some notable people were thinking at that time that everyone would be coding in the future. This was not true. What we needed were software programs that could be learned by everyone, such as office suites or CAD. It is now similar with deep learning. We cannot expect industrial engineers to learn deep learning frameworks such as TensorFlow or Caffe, or to learn Python programming language. These are tools for specialists who use them to develop software tools for specific applications. One application is recognition of roads and traffic in autonomous vehicles, another is software that recognizes cancer on x-ray images of lungs. For industrial quality inspection application it becomes a bit more complicated as each of these applications may be totally different from another. Nevertheless, we concluded – and our findings here are consistent with other key software suppliers – that we can identify several classes of tools that may cover a large variety of cases.

Object classification

The most basic class of tools that introduce deep learning to industrial vision systems is Object Classification. It is well known from easily available open-source networks that accept an entire image on the input and produce the name of the most prominent object on that image. It can be used to classify different types of fruit or meat, and can also be used as a post-processing step in defect detection applications (to identify the type of defect). It can be fast and very reliable, but the number of real-world applications is rather limited.

Another important class of deep learning tools are those for defect detection. (Bild: Adaptive Vision - Future Processing Sp. z o.o.)

Image 3 | Another important class of deep learning tools are those for defect detection. (Pictures: Adaptive Vision – Future Processing Sp. z o.o.)

Instance segmentation

A significantly more interesting class of tools in industrial vision systems are Location and Instance Segmentation. The former find object coordinates or bounding rectangles, while the latter also produce the precise outline or region. This is very similar in the general goals to traditional Template Matching algorithms which we know from major software libraries. By comparing edges or feature points extracted from one training template with what is found on an input image these traditional tools were able to identify locations of the interesting objects. We were happy to claim that our tools were rotation or scale invariant, and could even handle a lot of background clutter and shape incompleteness. However, we were still limited to working with reasonably rigid and well-defined shapes such as automotive parts or electrical components in a controlled environment. What deep learning brings to this field is the ability to locate highly variable objects, ones that come in a variety of poses or those which are not uniformly illuminated. Notable examples include location of fruits on trees, counting spermatozoids for medical purposes or picking packed pieces of cloths from a container by a robot. Even if we take into account automotive parts which could be effectively detected with traditional template matching tools, today we can extend these applications to cases with imperfect illumination or not repeatable positioning.

Defect detection

Another important class of tools are those for defect detection. There are two very different approaches to that subject, each of which has its own unique advantages. We call them Feature Detection and Anomaly Detection respectively (they are also known under the terms of supervised and semi-unsupervised training modes). The former requires to prepare a set of training images with carefully marked pixels that correspond to defects (or features) that we want to detect. The task of the neural network is then to learn to reproduce those results on incoming input images. It is called supervised mode because the network is trained with explicit outputs that it is expected to produce for specific inputs. The latter tool is different in that its training is focused on good samples and it is expected to detect any deviations of shape, surface, color etc. It may look more attractive to many as it does not require any definition of defects and data labeling is much easier. The drawback, however, is that it is not a strictly defined task and it is not as effective in detecting tiny or weak defects as in the supervised mode.

 

Customers are expected to use the right tool for each application and provide training images representing all different product cases. (Bild: Adaptive Vision - Future Processing Sp. z o.o.)

Image 4 | Customers are expected to use the right tool for each application and provide training images representing all different product cases. (Pictures: Adaptive Vision – Future Processing Sp. z o.o.)

20 to 50 training images

What is very important is that we can have a very limited set of tools to cover a wide variety of industrial applications without the need to design custom neural network architectures for each particular case. There is no need to learn TensorFlow to successfully apply deep learning. Customers are just expected to use the right tool for each application and provide training images representing all different product cases. Contrary to a popular belief, it is also not required to provide hundreds or thousands of object samples. The most advanced software manufacturers provide highly sophisticated solutions that can be trained with typically 20 to 50 training images within five to ten minutes on a modern GPU platform. This is possible due to techniques such as usage of pre-trained networks, advanced preprocessing and artificial generation of raw training data from just a few initial samples. Details of these solutions are technical secrets of each of the supplier that required quite a lot of effort to bring it into reality, but for a regular user this are just out-of-the-shelf solutions that can be instantly applied in a huge variety of applications.

Conclusion

Everything I have described is not just research topics or ideas for next generations. These are ready products that are getting into practical use in an unprecedented pace. A major challenge now appears to be in our ability to change our mindsets in a fast way to keep up with that progress. This may be particularly difficult to more traditional industries where major revolutions took decades in the past, while today those who are not progressive enough may become obsolete in a matter of two or three years.

Das könnte Sie auch Interessieren