News of the day: celebrating the launch of our new website!
OPEN Processing Unit: a software/hardware tool-chain for the acceleration of general deep learning algorithms
Academic research of CPU has been facilitated greatly by open source toolchain and resources such as instruction set (ISA), compiler, and instruction/microarchitecture level simulation. Examples include SimpleScalar, GEM5, and more recently, RISC-V toolset. Yet, such open-source and complete eco-system does not exist for general machine learning algorithms. OpenOPU is meant to be an open-source and complete eco-system for machine learning hardware research, including: ISA with executable specifications, compiler with formal verification, instruction level (functional) and microarchitecture level (cycle-accurate) simulation, parametrized modules in RTL and Chisel, and FPGA emulation and development boards.
Our first release includes OPU (open processing unit for ML) for edge inference on FPGA. Future releases will extend to ISA and microarchitecture for both training and inference and for cloud AI computing as well as in-network AI computing, considering FPGA, SOC and 3D FPGA/SOC.
Currently, OPU is being used in a MURI Brain Computing Project at UCLA, University of Michigan and Stanford University. (Last updated on Jul 14, 2020)
Recent Publications
TVLSI 2019
A domain-specific FPGA overlay processor, named OPU to accelerate CNN networks. (Last updated on Jul 14, 2020)
TVLSI 2020
The first full software / hardware stack, called Uni-OPU, for an efficient uniform hardware acceleration of different types of transposed convolutional (TCONV) networks and conventional convolutional (CONV) networks. (Last updated on Jul 14, 2020)
FPGA 2020 (Best Paper Candidate)
An FPGA-based overlay processor with a corresponding compilation flow for general LW-CNN accelerations, called Light-OPU. (Last updated on Jul 14, 2020)
FPL 2021
A Mixed Precision FPGA-based Overlay Processor that effectively accelerates the inference of mixed precision models.
TRETS 2022
A lightweight FPGA-based accelerator with a software-hardware co-designed process to tackle irregularity in computation and memory access in GCN inference.
Under review
Optimizes data layout with dataflow, the first in-depth study on overlay processor considering both vision and language transformer models.
29 Feb 2020
A low precision (8-bit) floating-point (LPFP) quantization method for FPGA-based acceleration to overcome re-training and accuracy limitations. (Last updated on Jul 14, 2020)
13 Apr 2021
An FPGA-based overlay processor for NLP model inference at the edge (Last updated on Apr 15, 2021)
FCCM21 (post)
A heterogeneous dual-core architecture where one core is optimized for regular convolution layers and the other for depth-wise convolution layers.
TRETS 2021
Utilizes low-precision floating point operations to efficiently accelerate CNN, the first work that can fit four 8-bit multiplications for inference in one DSP while maintaining comparable accuracy without any retraining.
FPL 2022
An FPGA-based GCN accelerator, named SkeletonGCN, including multiple software-hardware co-optimizations to improve training efficiency.
© 2020 OPU Lab. All rights reserved.