GRVI Phalanx: FPGA Accelerator Framework

GRVI is an FPGA-efficient RISC-V RV32I soft processor core, hand technology mapped and floorplanned for best performance/area as a processing element (PE) in a parallel processor. GRVI implements a 2 or 3 stage single issue pipeline, typically consumes 320 6-LUTS in a Xilinx UltraScale FPGA, and currently runs at 300-375 MHz in a Kintex UltraScale (-2) in a standalone configuration with most favorable placement of local BRAMs.

Phalanx is massively parallel FPGA accelerator framework, designed to reduce the effort and cost of developing and maintaining FPGA accelerators. A Phalanx is a composition of many clusters of soft processors and accelerator cores with extreme bandwidth memory and I/O interfaces on a Hoplite NOC. Across clusters, cores and accelerators communicate by message passing.

Talks and Publications

GRVI Phalanx was introduced on January 5, 2016, at the 3rd RISC-V Workshop at Redwood Shores, CA. Presentation slides and video.

Jan Gray, GRVI Phalanx: A Massively Parallel RISC-V FPGA Accelerator Accelerator, 24th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 2016), May 2016. Received the FCCM 2016 Best Short Paper Award. [PDF]

Examples

Here are two example GRVI Phalanx designs, with 400 and 1680 RISC-V cores:

A 10x5x8 = 400 processor GRVI Phalanx

An example 400 GRVI system implemented in a Xilinx Kintex UltraScale KU040. This GRVI Phalanx comprises NX=5 x NY=10 = 50 clusters, each cluster with 8 GRVI cores and a 12-ported 32 KB cluster shared memory. The clusters are interconnected on a Hoplite NOC, with the Hoplite routers configured with 290b data payloads (including 32b address and 256b data), achieving a bandwidth of about 70 Gb/s/link and a NOC bisection bandwidth of 700 Gb/s. Each cluster can send or receive 32 B per cycle into the NOC. The GRVI Phalanx architecture anticipates a variety of configurable accelerators coupled to the processors, the cluster shared RAM, or the NOC.

An example 1680 GRVI system implemented in a Xilinx Virtex UltraScale+ VU9P. This GRVI Phalanx comprises NX=7 x NY=30 = 210 clusters, each cluster with 8 GRVI cores, an 8-ported 128 KB cluster shared memory, and a 300b Hoplite NOC router.

An example 1680 GRVI system with 26 MB of SRAM, implemented in a Xilinx Virtex UltraScale+ VU9P. This GRVI Phalanx comprises NX=7 x NY=30 = 210 clusters, each cluster with 8 GRVI cores, an 8-ported 128 KB cluster shared memory, and a 300-bit Hoplite NOC router.

Other Conference Sightings

An extended abstract and talk on GRVI Phalanx was presented at the 2nd International Workshop on Overlay Architectures (OLAF-2) at FPGA 2016.

GRVI Phalanx was discussed in the short talk Software-First, Software Mostly: Fast Starting with Parallel Programming for Processor Array Overlays at the Arduino-like Fast-Start for FPGAs pre-conference workshop at FCCM 2016. Slides.

Hardware Changes: Version 0.2

Here are some of the changes made to the GRVI Phalanx design since it was first described at the 3rd RISC-V Workshop. This is now version 0.2.

GRVI

  • MUL/MULH/MULHU/MULHSU: The multiply instructions from the RISC-V “M” extension are now enabled by default and are implemented in the GRVI cluster. Each pair of processors shares one DSP-based multiplier.
  • SL*/SR*: By default, fast left and right shift instructions are also implemented in these DSP-based multipliers.
  • LR/SC: These atomic instructions from the RISC-V “A” extension are now enabled by default. Part of the implementation is in the GRVI core and part in the GRVI cluster memory arbiters. The implementation considerations were discussed on the RISC-V mailing lists here.

Phalanx

  • A Phalanx system may be configured with a 32 KB console frame buffer, 1080p (320×102 chars).
  • Hoplite multicast message routing is now enabled by default. An agent can sent a message to every cluster on a given row, given column, or to every cluster on the NOC. If desired, all IRAMs in all clusters in a Phalanx may be updated with a single burst of 1024 XY-multicast messages.

Availability

The GRVI Phalanx accelerator framework is a work in progress. It is not open source. We expect it to become available in 1H2017 for development of “software first, software mostly” FPGA accelerators targeting Amazon AWS EC2 F1.