Heterogeneous Acceleration Platform for AI Applications

Super FPGA

Xilinx introduced a new platform called ACAP (Adaptive Compute Acceleration Platform). inVISION magazine talked with Kirk Saban, Vice President, Product & Technical Marketing at Xilinx, about the advantages of the new platform and the first product series Versal.

A Versal ACAP (Adaptive Compute Acceleration Platform) is significantly different than a regular FPGA or SoC. Zero hardware expertise is required to boot the device. (Bild: Xilinx Ltd.)

What is an ACAP and for which applications does it work best?

Kirk Saban: An ACAP is a heterogeneous, hardware adaptable platform that is built from the ground up to be fully software programmable. An ACAP is fundamentally different from any multi-core architecture in that it provides hardware programmability but the developer does not have to understand any of the hardware detail. From a software standpoint, it includes tools, libraries, run-time stacks and everything you’d expect from a modern software-driven product. The tool chain, however, takes into account every type of developer – from hardware developer, to embedded developer, to data scientist and framework developer.

What are the differences to a classic FPGA or SoC?

Saban: A Versal ACAP is significantly different than a regular FPGA or SoC. Zero hardware expertise is required to boot the device. Developers can connect to a host via CCIX or PCIe and get memory-mapped access to all peripherals (e.g. AI engines, DDR memory controllers). The Network-on-Chip is at the heart of what makes this possible. It provides ease-of-use and makes the ACAP inherently SW programmable-available at boot and without any traditional FPGA place-and-route or bit stream. No programmable logic experience is required to get started, but designers can design their own IP or add from the Xilinx ecosystem. With regard to Xilinx’s hardware programmable SoCs (Zynq-7000 and Zynq UltraScale+ SoCs), the Zynq platform partially integrated two out of the three engine types (Scalar Engines and Adaptable Hardware Engines). Versal devices add a third engine type (Intelligent Engines), but more importantly, the ACAP architecture tightly couples them together via the Network-on-Chip (NOC) to enable each engine type to deliver 2-3x the computational efficiency of a single engine architecture, such as a SIMT GPU.

Does this mean, Xilinx will address, besides the classic hardware designers, also application engineers in the future?

Saban: Xilinx has been addressing SW developers with design abstraction tools as well as its HW programmable SoC devices (Zynq-7000 and Zynq UltraScale+) for multiple generations. However, with ACAP, SW programmability is inherently designed into the architecture itself for the entire platform including its HW adaptable engines and peripherals.

Can you tell me a little bit about the use of ACAPs for artificial intelligence applications?

Saban: Among Versal’s intelligent engines is the AI engine, a key enabler for many of Versal’s target markets. The software programmable, hardware adaptable AI Engine addresses both the compute density and the memory bandwidth needed for high throughput and low latency Machine Learning. The massive array of interconnected VLIW SIMD high-performance processors with local memory offer up to 8X compute density for vector-based algorithms vs. programmable logic and at half the power. These engines are optimized for deterministic, real-time DSP and AI/ML computation and suited for applications from cloud, to network, to edge and endpoint. AI Engine is tightly coupled with adaptable hardware for custom compute and flexible memory hierarchy to maximize performance. Alongside these adaptable hardware engines and scalar engines, the AI engine is part of a complete heterogeneous compute platform where deep learning can be infused as an element of a larger application that has other pre/post processing requirements.

Are there already benchmarks available, to compare Versal with other devices?

Saban: Multiple performance projections for Machine Learning inference throughput for Versal VC1902 can be found in the media presentation delivered at XDF San Jose 2018. Xilinx’s initial benchmark ratings show the Versal VC1902 delivering 3.5X low-latency CNN throughput against Nvidia T4 in a 75W power envelope, and 4.2X low-latency CNN throughput against a high-end Volta V100 GPU. These Versal performance numbers assume 60 percent of the VC1902 device is reserved for user functions, such as network attach or video processing. 5X wireless compute versus UltraScale+ is also cited in the presentation. From the technology announcement in March 2018, 20X AI Compute performance is cited in a comparison to Virtex UltraScale+ (VU9P) for Machine Learning inference for image recognition.

What about ACAPs in the classic machine vision application?

Saban: Xilinx will be announcing Versal devices for edge applications such as machine vision some time in the future. These will leverage Versal’s scalar engines, adaptable engines, intelligence engines, and high-throughput connectivity from highest-resolution image sensors to frame grabber cards or industrial networks. Versal will excel in enabling the next generation of smart machine vision for the most compact and capable solution on the market.

Which devices of the Versal series are already available, which will be added in the future?

Saban: Versal is comprised of multiple device series, including AI Core Series, AI RF series, AI Edge series, Prime, Premium, and HBM. Product details of AI Core and Prime have been announced. Details of other device series will be disclosed at a later date.