Renzym Machine Learning Accelerator
This repo contains a design to accelerate inference in machine learning problems. More specifically, the accelerator offloads and accelerates convolution and max pooling operations. The top-level wrapper has eleven identical REN_CONV_TOP cores that are R/W accessible from the wishbone master interface. The main idea is that a processor that wants to offload, writes three rows of an image and a set of kernels that need to be applied on the image using wishbone interface to the internal memory of accelerator, configures accelerator, and kickstarts it. Accelerator then performs the configured operations (convolution and maxpool), writes back results to results RAM, and asserts a done signal which is readable using wishbone interface. The processor can poll done signal and read back result ram for results. Since there are four cores, so the processor can software pipeline the execution to further optimize the time consumed in the calculation.
Each REN_CONV_TOP Core consists of a register file for configuration and status writes/reads, REN_CONV engine and associated image, kernel, and result RAMs.
REN_CONV engine computes 3xN convolution, where N is configurable. This block starts working when the start signal is asserted. The address generation unit (AGU) starts generating addresses of image and kernels RAMs, the data from RAMs is forwarded to DATA PATH. It multiplies and accumulates results to calculate convolution results. The convolution result is forwarded to a bypass-able max pool block. If enabled, max pool block forwards a max of every pair of values from convolution output. If disabled, data from convolution is forwarded as-is. The output of max pool is written to results ram.
Configurations and RAMs are kept external to REN_CONV engine so that this generic block can multiplex between different configurations and data sets. In future implementations, multi-pass operations can be designed to support the re-use of outputs as inputs to the next pass without disturbing CPU. External config block also ensures that there can be different such blocks for different interfaces (e.g. AXI, Wishbone, etc.) without affecting the rest of the design.
A small, Convolutional Neural Network Accelerator on a wishbone slave for Raven Core in Caravel SoC.