GRVI is an FPGA-efficient RISC-V RV32I soft processor core, hand technology mapped and floorplanned for best performance/area as a processing element (PE) in a parallel processor. GRVI implements a 2 or 3 stage single issue pipeline, typically consumes 320 6-LUTS in a Xilinx UltraScale FPGA, and currently runs at 300-375 MHz in a Kintex UltraScale (-2) in a standalone configuration with most favorable placement of local BRAMs.
Phalanx is massively parallel FPGA accelerator framework, designed to reduce the effort and cost of developing and maintaining FPGA accelerators. A Phalanx is a composition of many clusters of soft processors and accelerator cores with extreme bandwidth memory and I/O interfaces on a Hoplite NOC. Across clusters, cores and accelerators communicate by message passing.
Talks and Publications
Jan Gray, GRVI Phalanx: A Massively Parallel RISC-V FPGA Accelerator Accelerator, 24th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 2016), May 2016. Received the FCCM 2016 Best Short Paper Award. [PDF]
Here are two example GRVI Phalanx designs, with 400 and 1680 RISC-V cores:
Other Conference Sightings
An extended abstract and talk on GRVI Phalanx was presented at the 2nd International Workshop on Overlay Architectures (OLAF-2) at FPGA 2016.
GRVI Phalanx was discussed in the short talk Software-First, Software Mostly: Fast Starting with Parallel Programming for Processor Array Overlays at the Arduino-like Fast-Start for FPGAs pre-conference workshop at FCCM 2016. Slides.
Hardware Changes: Version 0.2
Here are some of the changes made to the GRVI Phalanx design since it was first described at the 3rd RISC-V Workshop. This is now version 0.2.
- MUL/MULH/MULHU/MULHSU: The multiply instructions from the RISC-V “M” extension are now enabled by default and are implemented in the GRVI cluster. Each pair of processors shares one DSP-based multiplier.
- SL*/SR*: By default, fast left and right shift instructions are also implemented in these DSP-based multipliers.
- LR/SC: These atomic instructions from the RISC-V “A” extension are now enabled by default. Part of the implementation is in the GRVI core and part in the GRVI cluster memory arbiters. The implementation considerations were discussed on the RISC-V mailing lists here.
- A Phalanx system may be configured with a 32 KB console frame buffer, 1080p (320×102 chars).
- Hoplite multicast message routing is now enabled by default. An agent can sent a message to every cluster on a given row, given column, or to every cluster on the NOC. If desired, all IRAMs in all clusters in a Phalanx may be updated with a single burst of 1024 XY-multicast messages.
The GRVI Phalanx accelerator framework is a work in progress. It is not open source. We expect it to become available in 1H2017 for development of “software first, software mostly” FPGA accelerators targeting Amazon AWS EC2 F1.