Single Power Dot

picture1
Image 1 | Active Event Sensor (AES) is a new kind of sensor architecture that provides asynchronous binary detection of events when viewing an active light source or pattern which does not acquire and output frames. – Bild: VoxelSensors SRL/BV

Today’s 3D perception systems can be classified into three main categories: Stereoscopic vision, Structured light and Time-of-Flight (including Lidar). Each of these modalities typically requires a different type of sensor, from standard CMOS RGB or monochrome global shutter cameras for the first two, to more specific direct or indirect Time-of-Flight devices for the latter. While an active 3D perception system is a complex architecture with different key components (including an advanced illumination source at least), the sensors are at the heart of the solution’s performance and its main limitation today.

Legacy 3D imaging: the frame-based approach

A common characteristic shared by existing 3D active modalities is that they rely on a) an illumination and b) at least one frame-based sensor. Frame-based digital imaging is the perfect technique for taking pictures of scenic views or recording movies. However, frame-based sensing is optimized neither for machine vision nor for 3D perception. Relying on a frame-based sensor limits the system performance in different key aspects that we summarize below:

  • Frame rate: The first limitation is introduced by the frame rate itself. Current 3D systems typically run at 30-60fps, and rarely exceed 100fps, providing a new frame after tens of milliseconds. Best case scenario, this single frame is not useful on its own. Worst case scenario, it can be unreliable. A frame-based depth system requires the acquisition of multiple sequential frames to output a reliable comprehension of a scene. This has a triple effect: slow acquisition of the environment, high data latency, and sensitivity to motion.
  • Power: Using a frame-based sensor in a depth system typically requires a lot of power. One way to minimize this draw is to reduce the sensor resolution or framerate, each of which reduces system quality. By nature, a global shutter (or fast-rolling shutter) or ToF sensor requires lots of power – in the range of hundreds of milliwatts – to operate, at a desirable frame rate and resolution. That second limitation is the main reason why the resolution of today’s 3D sensors is typically much lower than standard image sensors. The illumination plays an equally important role in the power consumption required by these legacy 3D systems. Whether in a flood illuminator or a high-density structured light pattern, at least several hundred milliwatts are required to achieve a decent signal-to-noise ratio at system level. Spreading the power over a wide field of illumination does not come for free.
  • Algorithmic complexity: A third limitation is the algorithmic complexity and heavy processing footprint required to reconstruct 3D information from accumulated depth frames. This introduces unwanted and unnecessary computational load on the overall system, and the central processing unit itself. In the cases of structured light systems, this often also requires a spatial kernel to identify depth points, further reducing system resolution.
  • Interferences: Lastly, an important limitation of existing frame-based 3D sensors is their vulnerability to interferences from the environment or from external sources: they do not work properly under strong ambient light or when other active 3D systems are shining their illumination in the same room.

Concentrating laser power into a single dot

To reduce the power consumption of an active 3D system, the first step is to rethink the active illumination operation. Flood illumination and structured pattern projection spread the optical power across a wide field of illumination, making it highly inefficient. By generating a single dot and using a scanning device (e.g., a mems scanner) to achieve coverage of the scene at high speed, one can design a system with a dramatic reduction of the optical power budget – up to 10x compared to current high-end illumination systems. This greatly reduces system power draw, as well as laser eye-safety concerns. Concentrating available illumination power into a single dot also brings the crucial benefit of high SNR, which is key for outdoor operation.

Image 2 | Representation of serialized 3D points aggregated over tens of milliseconds
Image 2 | Representation of serialized 3D points aggregated over tens of millisecondsBild: VoxelSensors SRL/BV

Active Event Sensor

Active Event Sensor (AES) is a new kind of sensor architecture that provides asynchronous binary detection of events when viewing an active light source or pattern. VoxelSensors‘ AES does not acquire and output frames. Instead, each pixel is smart and generates an event only upon detection of the active light signal. Each position sample requires approximately ten photons on average, with sample rates up to 100MHz. In other words, an active event location in the image plane is obtained up to every 10ns. Key patented technologies implemented in the Single Photon enable ambient light rejection even in bright ambient conditions.

3D sensing using laser beam triangulation

The novel 3D perception solution is a serialized triangulation system. Instead of searching for features that may be hard to see or even nonexistent (passive & active stereo), or inferring depth from the deformations of a complex light pattern by stereo matching (structured light), the depth reconstruction is as simple as triangulating corresponding events between sensors by matching the timestamps. This simplicity greatly reduces the latency and power required during the computing step.

  • a) a Laser Beam Scanner (LBS) uses a scanning device (e.g., a bi-axial MEMS mirror) to project a laser beam dot in a continuous pattern, such as a raster scan or Lissajous pattern.
  • b) two AES sensors capture the position of the laser dot up to every 10ns and output the location of the active dot in their respective image planes in an address event representation format (AER).
  • c) based on the two continuous AES position streams, a simple triangulation algorithm computes the corresponding 3D point and outputs its position in world coordinates in the 3D space.

The output of this laser beam triangulation system is a stream of serialized 3D points, or voxels, with a new voxel added up to every 10ns. The dynamic nature of the data stream unlocks new possibilities in computer vision, allowing for pipelined processing and customized perception schemes.

Image 3 a,b | Dense scan of a hand using VoxelSensors' solution with 1ms (l.) and 10ms (r.)
Image 3 a,b | Dense scan of a hand using VoxelSensors‘ solution with 1ms (l.) and 10ms (r.)Bild: VoxelSensors SRL/BV
Image 3 a,b | Dense scan of a hand using VoxelSensors' solution with 1ms (l.) and 10ms (r.)

Benefits and Breakthrough

The sensing and perception technology of VoxelSensors disrupts the sensorial status quo with the development of a new AES sensors enabling low power and low latency 3D active sensing using laser beam triangulation:

  • Low Power Consumption: By concentrating the optical power in a single laser dot and designing AES sensors such that only the pixels actively detecting the illumination consume energy, the LBS-based perception system achieves up to 10x lower power consumption than existing 3D solutions in similar operating modes. This makes them well-suited for use in devices where battery life is critical.
  • Low Latency: VoxelSensors‘ laser beam triangulation system offers unprecedented low latency compared to frame-based systems because the AES sensors capture and output only the laser dot position at a very fast rate rather than capturing a complete image at regular intervals. With a simple pipelined triangulation algorithm, a 3D point is computed based on the AER data after only hundreds of nanoseconds. As a result, the latency from optical sampling to depth measurement is very low. This means that this LBS system can provide a more up-to-date representation of the scanned scene.
  • Immunity: Thanks to their high sensitivity and their smart operation, the AES sensors need a minimum photonic budget of ten photons and can therefore distinguish the active laser light from other sources such as indoor lights and sunlight. The same principles, and key patented technologies brought by the Single Photon, provide immunity to concurrent systems and other 3D systems.
  • Data Scalability & Versatility: Thanks to its fully serialized data stream, the system provides a new way to look at perception data. Frames are not needed anymore to get information on the scene content. 3D points are created at a very high rate (up to every 10ns) as the laser dot scans the scene, and this incoming data stream comes with a fully flexible operation: one can decide to aggregate data over different time window sizes. In other words, the acquisition rate and the way information can be used are fully scalable. The natural trade-off offered by this novel system is speed versus density. In a limited time frame (e.g., 1ms), a coarse scan of the scene is acquired allowing for fast actionable decisions and updates.
  • Spatial and Temporal Precision: This LBS perception system relies on the continuous sweep of a laser dot to ensure high density 3D points along the scan lines. Thanks to key patented technologies, the system takes advantage of the continuous nature of the scanning pattern to enable super-resolution data, thus providing depth data with 4x greater precision than what can be achieved with pixel-level sampling. In addition to the spatial resolution, each depth point carries a sub-microsecond timestamp, eliminating artifacts such as motion blur, and simplifying object tracking and other motion-sensitive applications.