Event-Based Vision

Event-Based Vision

Event-based vision is poised to take over from the frame-based approach used by traditional film, digital and mobile phone cameras in many machine-vision applications.

Image 1 | Prophesee has developed an image sensor containing an array of autonomously operating pixels that combine an asynchronous level-crossing detector with a separate exposure measurement circuit. Each exposure measurement by an individual pixel is triggered by a level-crossing event. (Bild: Prophesee)

Image 1 | Prophesee has developed an image sensor containing an array of autonomously operating pixels that combine an asynchronous level-crossing detector with a separate exposure measurement circuit. Each exposure measurement by an individual pixel is triggered by a level-crossing event. (Image: Prophesee)

The mode of operation of state-of-the-art image sensors is useful for exactly one thing: photography, i.e. for taking an image of a still scene. Exposing an array of pixels for a defined amount of time to the light coming from such a scene is an adequate procedure for capturing its visual content. Such an image is a snapshot taken at one point in time and contains zero dynamic information. Nonetheless, this method of acquiring visual information is also used in practically all machine vision systems for capturing and understanding dynamic scenes. This approach is seemingly supported by the way movies are made for human observers. The observation that visual motion appears smooth and continuous if viewed above a certain frame rate is, however, more related to characteristics of the human eye and brain than to the quality of the acquisition and encoding of the visual information as a series of still images. As soon as change or motion is involved, which is the case for almost all machine vision applications, the paradigm of visual frame acquisition becomes fundamentally flawed. If a camera observes a dynamic scene, no matter where you set your frame rate to, it will always be wrong. As different parts of a scene usually have different dynamic contents, a single sampling rate governing the exposure of all pixels in an imaging array will naturally fail to yield adequate acquisition of these different scene dynamics present at the same time.

Individual Pixel’s Sampling Points in Time

An ´ideal´ image sensor samples parts of the scene that contain fast motion and changes at high sampling rates and slow changing parts at slow rates, all at the same time – with the sampling rate going to zero if nothing changes. Obviously, this will not work using one common single sampling rate, the frame rate, for all pixels of a sensor. Conversely, one wants to have as many sampling rates as there are pixel in the sensor – and let each pixel’s sampling rate adapt to the part of the scene it sees. To achieve this requires putting each individual pixel in control of adapting its own sampling rate to the visual input it receives. This is done by introducing into each pixel a circuit that reacts to relative changes of the amount of incident light, so defining the individual pixel’s sampling points in time. As a consequence, the entire image data sampling process is no longer governed by a fixed (artificial) timing source (the frame clock) but by the signal to be sampled itself, or more precisely by the variations over time of the signal in the amplitude domain. The output generated by such a camera is no longer a sequence of images but a time-continuous stream of individual pixel data, generated and transmitted conditionally, based on what is happening in the scene.

Image 2 | The result of the exposure measurement is asynchronously output off the sensor together with the pixel’s coordinates in the sensor array. (Image: Prophesee)

Autonomously Operating Pixels

Following this paradigm, Prophesee has developed an image sensor containing an array of autonomously operating pixels that combine an asynchronous level-crossing detector with a separate exposure measurement circuit. Each exposure measurement by an individual pixel is triggered by a level-crossing event. Inspired by biology, every pixel in these sensors optimizes its own sampling depending on the visual information it sees. In case of rapid changes, the pixel samples at a high rate. On the contrary, if nothing happens, the pixel stops acquiring redundant data and goes idle until things start to happen again in its field of view. Hence each pixel independently samples its illuminance upon detection of a change of a certain magnitude in this same luminance, thus re-measuring its new light level after it has changed. The result of the exposure measurement (i.e. the new gray level) is asynchronously output off the sensor together with the pixel’s coordinates in the sensor array. As a result, image information is not acquired and transmitted frame-wise but continuously, and conditionally only from parts of the scene where there is new visual information. Or in other words, only information that is relevant – because unknown – is acquired, transmitted, stored and processed by machine vision algorithms. This way, both the acquisition of highly redundant and useless data by over-sampling static or slow parts of the scene, and the under-sampling of fast scene dynamics due to fixed frame rates, can be eliminated.

Image 3 | The results are no longer a sequence of images but a time-continuous stream of individual pixel data, generated and transmitted conditionally, based on what is happening in the scene. (Bild: Prophesee)

Image 3 | The results are no longer a sequence of images but a time-continuous stream of individual pixel data, generated and transmitted conditionally, based on what is happening in the scene. (Image: Prophesee)

Pixel acquisition and readout times of milliseconds to microseconds are achieved, resulting in temporal resolutions equivalent to conventional sensors running at tens to hundreds of thousands of frames per second. Now, for the first time, the strict temporal resolution vs. data rate tradeoff that limits all frame-based vision acquisition can be overcome. As the temporal resolution of the image data sampling process is no longer governed by a fixed clock driving all pixels, the data volume of the sensor output, independently of the temporal resolution available for the acquisition at the single pixel, is only depending on the dynamic contents of the visual scene. Visual data acquisition simultaneously becomes fast and sparse, leading to ultra-high-speed acquisition combined with reduced power consumption, transmission bandwidth and memory requirements.

Image 4 | OnBoard integrates the new 3rd generation VGA sensor camera module with MIPI CSI interface, into a powerful reference vision system ARM-based Quad Core platform. (Image: Prophesee)

Event-Based Vision Sensors

The advantage of treating dynamic visual information this way does not end at the sensing stage. In order to fully unlock the potential of these event-based vision sensors, also the paradigms of vision processing need to be fundamentally rethought. First of all, the notion of a frame at the basis of vision processing is to be abandoned altogether. As the sensors encode visual dynamics into highly resolved spatio-temporal patterns of events, representing the relevant features of scene dynamics (such as moving object contours, trajectories, velocity, etc.), processing algorithms now work on continuous time events and features instead of on discrete static images. The mathematics that describe these features in space and time are simple and elegant and yield highly efficient algorithms and computational rules that allow for real-time operation of sensory-processing systems while minimizing demand on computing power. The materialization of the research effort led to the launch of the most advanced event-based reference system: OnBoard integrates the new 3rd generation VGA sensor camera module with MIPI CSI interface, into a powerful reference vision system ARMbased Quad Core platform. Very high dynamic range, >120dB, can be achieved without the needs of multiple measurements, like in conventional HDR techniques, thanks to the time-based encoding of the illumination information,and the circuitry governing each pixel. It provides comprehensive connectivity including Ethernet, USB, HDMI, Wi-Fi, operating under a Linux OS. The embedded system runs dedicated computer vision software. It offers a tracking algorithm to detect motion, segment data into groups of spatio-temporal events and track over time (taking two out of four available cores). The application layer comprises area monitoring, high-speed counting, vibration measurement and real-time inspection.

Das könnte Sie auch Interessieren

Bild: EMVA
Bild: EMVA
EMVA 1288 Standard Online Training

EMVA 1288 Standard Online Training

Am 18. Juni sowie am 3. Dezember findet ein dreitägiger Online-Kurs zur Norm 1288 der European Machine Vision Association (EMVA) statt. Ziel des Trainingskurses ist u.a. die vertiefte Kenntnis über die Grundlagen der neuen Version 4.0 sowie das Sammeln von praktischen Erfahrungen.

Bild: Mahr GmbH
Bild: Mahr GmbH
Mahr Innovation Days 2024

Mahr Innovation Days 2024

Am 12. bis 13. Juni findet in Göttingen bei Mahr die Innovation Days 2024 statt. Dabei stellt die Firma zahlreiche Neuheiten zur Oberflächenmessung und Messtechnik vor und gibt an beiden Tagen in zahlreichen Vorträgen einen Überblick über aktuelle Trends und Produkte.

Bild: Ing. Büro Roth GmbH
Bild: Ing. Büro Roth GmbH
KI und Siemens Industial Edge bei Krombacher

KI und Siemens Industial Edge bei Krombacher

Die Einsatzmöglichkeiten von KI sind enorm und betreffen viele Bereiche unseres Lebens. Entsprechend sind die Erwartungen im industriellen Sektor ebenfalls sehr hoch. Gleichzeitig steigen nicht nur die Einsatzmöglichkeiten, sondern auch die Anzahl der umgesetzten Projekte. Das solche Lösungen sehr zuverlässig und flexibel eingesetzt werden können, zeigt eine Anwendung aus der Abfüllung der Krombacher Brauerei.