A significantly more interesting class of tools in industrial vision systems are Location and Instance Segmentation. The former find object coordinates or bounding rectangles, while the latter also produce the precise outline or region. This is very similar in the general goals to traditional Template Matching algorithms which we know from major software libraries. By comparing edges or feature points extracted from one training template with what is found on an input image these traditional tools were able to identify locations of the interesting objects. We were happy to claim that our tools were rotation or scale invariant, and could even handle a lot of background clutter and shape incompleteness. However, we were still limited to working with reasonably rigid and well-defined shapes such as automotive parts or electrical components in a controlled environment. What deep learning brings to this field is the ability to locate highly variable objects, ones that come in a variety of poses or those which are not uniformly illuminated. Notable examples include location of fruits on trees, counting spermatozoids for medical purposes or picking packed pieces of cloths from a container by a robot. Even if we take into account automotive parts which could be effectively detected with traditional template matching tools, today we can extend these applications to cases with imperfect illumination or not repeatable positioning.
Another important class of tools are those for defect detection. There are two very different approaches to that subject, each of which has its own unique advantages. We call them Feature Detection and Anomaly Detection respectively (they are also known under the terms of supervised and semi-unsupervised training modes). The former requires to prepare a set of training images with carefully marked pixels that correspond to defects (or features) that we want to detect. The task of the neural network is then to learn to reproduce those results on incoming input images. It is called supervised mode because the network is trained with explicit outputs that it is expected to produce for specific inputs. The latter tool is different in that its training is focused on good samples and it is expected to detect any deviations of shape, surface, color etc. It may look more attractive to many as it does not require any definition of defects and data labeling is much easier. The drawback, however, is that it is not a strictly defined task and it is not as effective in detecting tiny or weak defects as in the supervised mode.