Welcome Xilinx Alveo U50!

Today Xilinx announced the new Alveo U50 Data Center Accelerator Card. Press release. Launch presentation. U50 Home. Product Brief. Data Sheet. User Guide.

I usually don’t blog about FPGA card announcements but this is a big deal. Finally a vendor FPGA card streamlined and focused on pure data + network compute acceleration, with massive bandwidth (PCIe gen4x8 or gen3x16, QSFP28 for 100 GbE, ~7 TB/s to 5 MB of BRAM, ~6 TB/s to 20 MB of UltraRAM, and 460 GB/s to 8 GB of HBM2 DRAM), in an optimized form factor.

(In particular, it doesn’t have conventional DRAM DIMMs inside, and I think that’s fine. Doesn’t need them, won’t miss them. The key external RAM is the 8 GB of high bandwidth DRAM, right there behind the 32 AXI-HBM controllers. If greater RAM capacity is required, the host has tens or hundreds of GB that can be streamed in/out across PCIe. And no more sprawling soft DDR4 DRAM controllers in your design.)

Now FPGA uptake as mainstream data center accelerator platforms really depends upon their performance and cost competitiveness vs. multicore CPUs and GPUs. GPUs, with GDDRx and HBM2 DRAM memory systems, have always enjoyed a big lead in peak external memory bandwidth vs. FPGAs. This advantage has limited the types of workloads for which FPGAs are faster, or at least performance competitive. But the advent of Xilinx Virtex UltraScale+ VU3xP and Intel Stratix 10 MX devices, with HBM2 DRAM in package, now give FPGAs CPU-beating, GPU-competitive memory bandwidth. The next frontier is cost. So far, HBM2-powered FPGA cards have been expensive, many times more expensive than a GPU card with comparable bandwidth. I hope U50 will move the needle on price competitiveness, a prerequisite for FPGA accelerators to reach high volume economies of scale and support a thriving solution provider ecosystem.

Under the hood

The User Guide and Data Sheet describes the FPGA as an UltraScale+ XCU50, with 872K 6-LUTs, 5952 DSPs, 1344 BRAMs, 640 UltraRAMs, and two stacks of 4 GB HBM2 DRAM. While the XCU50 is not in the UltraScale+ Product Tables, these resources exactly match that of the XCVU35P, as does this floorplan figure:

XCU50 FPGA floorplan
XCU50 FPGA floorplan

Assuming this is the same silicon as the VU35P, that’s fantastic news — this part is extremely capable. For example, here is another kilocore RISC-V GRVI Phalanx with HBM2, for VU35P:

1176 RISC-V PE GRVI Phalanx with 30 HBM DRAM channels
An 1176 RISC-V PE implementation of the GRVI Phalanx massively parallel accelerator framework in a VU35P.
10×15 -3 clusters of { 8 PE, 128 KB SRAM, 300b Hoplite NoC router }, 30 HBM DRAM channels, PCIe DMA controller.

I look forward to an exciting future of mainstream FPGA+HBM2 accelerator cards, as common as GPU accelerator cards, deployed across the industry, there and just waiting for all of our problems, ingenuity, workloads, and bitstreams. Today’s Alveo U50 launch is a big milestone in this march to the mainstream. Congratulations to Xilinx, its staff, and partners.