Logo

Single...

public

Block design

Flow

Fig 1: Block diagram for the Neuromorphic Keyword Recognition IC to be implemented in the Skywater 130nm process.

Design

Fig 2: The keyword recognition IC will be developed in the Caravan frame (as shown above) where the primarily analog computation will fit into the 10mm2  user area.

The initial analog FG cell library gives some initial estimates for the size of these blocks.    A programmable bandpass filter array and amplitude detection (24 filters) is roughly 0.01mm2, and a VMM+WTA classifier block with 10,000 VMM matrix values and 100 WTA outputs is roughly  0.15mm2. The programming infrastructure,     mostly requiring several 6bit DACs and a ramp ADC [9], would be roughly 0.005mm2.  One expects the charge-pump blocks and other voltage handling blocks will be a similar size to the programming infrastructure size.  

The primary question is the size of the linear dynamics block (LMU) for this particular implementation as that block sets the remaining parameters for this system.   For general matrix-solution dynamics [10], the resulting implementation area for 10,000 A matrix values (x, u are vectors of 100) is roughly 1.7mm2.  A system using six parallel timeframes from the 24 input filter systems requires 6 x 24 x 24 = 3,456 A matrix elements.   An important design question will be whether these dynamics can be simplified based on ladder filter [11] approaches as well as sparse (e.g. local) coupling of network dynamics.    These devices will require fairly low bias currents (10-500pA) for these 5ms to 500ms dynamics, bias currents within the range of the FG programing infrastructure that also keep the resulting power consumption within expected ranges (100nW to 1μW).  

We expect the final classifier structure to enable vocabularies initially of 100 words that potentially could expand to larger vocabularies.  This structure utilizes testing from the TI digits [12] database (10-20 words) as well as keyword spotting [13] and other databases.  The WTA module will be designed for 1,000 outputs to enable a large number of null outputs that would not be significant events to be transmitted past the classifier.  Only a recognized result would be given.   The number of null elements and the number of valid outputs can be programmed through the FG elements.  We will use a software simulation [https://github.com/neuromorphs/ant-lmu-benchmark] of the classification system to explore the optimal hyperparameters (e.g. size of the A matrix) to achieve the best accuracy on these datasets given a fixed area.

Schematics of critical circuit core

Acoustic Front end

 

bandpass

Fig 3: The Capacitively Coupled Current Conveyor (C4) Bandpass filter [14] as implemented on an SoC FPAA [5,15].  Our bandpass filter array will use and optimize this circuit topology.

 

Frontend

Fig 4: An overview of Acoustic speech recognition front end with a bias current Cut-off frequency tunable bandpass filter with an Amplitude detector and a first-order low pass filter is shown [1].

 

Vector Matrix Multiplier (VMM) and Winner takes all (WTA)

Figure 5: Representative VMM+WTA classifier circuit implementation as implemented on an SoC FPAA [1].  Our circuit implementation will use a gate-coupled (versus source-coupled as shown above) VMM circuit because of the opportunities due to custom IC implementation increasing the potential system SNR.  

Target performance summary

The estimate of the final structure: 

Component

Size

Area

Power

Bandpass Filters + Amp Detect

24 channels

0.01mm2

2-8μW

Linear Dynamics Computation (LMU)

6, 24x24 blocks

1-3mm2

0.5-2μW

VMM + WTA Classifier

144 x 1000

2-6mm2

4-16μW

FG programming infrastructure

 

0.005mm2

~0 during inference

FG Voltage handling (5.5V, 11V)

 

0.005mm2

~ 0 during inference

Total

 

6-10mm2

7-26μW

The Caravan chip will interface through the standard 1.8V supply.  Some debugging points will use the higher voltage (5.5V) supply easily accessible through the analog pins. 

Based on previous SoC FPAA designs (e.g. [1,5]), we expect roughly 60dB SNR from each bandpass filter corresponding to an expected SNR possible from a MEMs microphone in the particular frequency bands. The processing elements would have approximately similar SNR (54-60dB) through the linear dynamics, and greater than 60dB SNR after the VMM stage (due to coherent signals) going into the WTA block. 

For expected accuracy, we note that the results in [1] indicate that a simplified version of this system (smaller VMM/WTA and no general dynamics) achieved 100% accuracy on a small audio classification task.  We, therefore, expect good performance on a larger dataset with this larger system.  We will also be exploring the expected accuracy in a high-level simulation of this classification system [https://github.com/neuromorphs/ant-lmu-benchmark] in order to make principled choices about the size of each block.

Design Goals

  1. Demonstrate ultra-low-power recognition of a moderate vocabulary (100) of keyword detection on a single IC utilizing LMU [2] to classify sequences of spectral dynamics. 
  2. Develop a single microphone-to-symbol classifier as a Caravan IC utilizing an array of programmable bandpass filters [5], programmable amplitude detection [5], programmable linear dynamics [10], and a programmable VMM+WTA classifier block [1,3].      
  3. Demonstrate on-chip classification of the TI DIGIT dataset [12] and keyword spotting [13]. 
  4. Demonstration of large-scale programmable analog computation and reduction of device mismatch compiled through analog FG standard cell elements that may include new components added to the library.
  5. Develop and demonstrate using this hardware platform for different vocabularies through either on-chip (e.g. similar to the approach in [1,3]) or off-chip training. 
  6. Event-driven output and storage through the on-chip memory using the  RISC-V processor and interrupt handling.

Description of circuits

The keyword recognition IC is a microphone-to-digital output, full end-to-end classifier processing that we expect will require 20-40μW of processing power.   The architecture (Fig. 1) comprises front-end filterbanks and amplitude detection, an LMU processing unit, and a universal approximator classification block.  

The IC starts with the raw microphone input being passed to a set of programmable parallel bandpass filters (16-24) with amplitude detection (Fig. 1).  These bandpass filters will typically be programmed with similar resonances and exponentially increasing corner frequencies over the main human hearing range (50Hz to 5kHz), typical of neurally-based acoustic classifiers (e.g. [1,4]) or high-end hearing aid devices.   The amplitude detection produces a set of analog signals indicating a filtered magnitude of a set of frequencies.  

The filtered frequency channel values are passed to a general linear dynamics block which computes an output x given the input u, where dxdt+Ax = Bu.  For different A and B matrices (configurable using floating gates), this creates a new temporal basis space capable of capturing dynamics on the time scale of τ.  

The output from the linear dynamics block is then passed through a classifier comprising a Vector-Matrix Multiplier (VMM) and a Winner-Take-All (WTA) block.  This classifier is a universal approximator, where adding the linear dynamics (LMU) expands the class of functions that it can approximate given its temporal dynamics.  Without the dynamics, the classifier would only be able to find patterns based on the current magnitudes of various spectrums.  With the dynamics, each category can define a temporal pattern across those inputs.  This approach is the basis of the LMU [2] which outperforms other recurrent methods (LSTMs, GRUs, etc) for detecting temporal patterns.  It can also be seen as an efficient implementation of a large collection of various synaptic kernels, which has also been shown to have advantages for temporal classification [7].   These outputs can be directly passed to the onboard Caravan RISC-V μP or asynchronously transmitted through a Verilog-compiled Address-Event Representation (AER) block [8].

Floating-Gate (FG) devices used throughout this design provide large-scale programmability for this custom IC design.  The programming infrastructure will parallel the programming infrastructure used in the SoC FPAA device [5,9], where this IC will utilize the RISC-V μP that is included in the Caraval system.  As this IC design does not require general FG programming (e.g. not fully on switches), the programming infrastructure simplifies from the general approach [9].  The programming structure is capable of precisely programming bias currents from pA to μΑ levels (~7-8 orders of magnitude) with better than 1% accuracy at all levels [9].  The analog FG standard-cell library shows initial designs for these programmable cells, showing these programmable elements are implementable in the Skywater 130nm CMOS process and reducing the risk of this IC design.

Team

  1. Nikhil Garg, University of Sherbrooke <Nikhil.Garg@Usherbrooke.ca> 
  2. Praveen Raj, University of Nottingham Malaysia Campus <praveenraj19802@gmail.com>
  3. Professor Jennifer Hasler, Georgia Institute of Technology
  4. Terrence C Stewart, National Research Council Canada terrence.stewart@nrc-cnrc.gc.ca

Potential Advisors

  1. Professor Barry Muldrey, University of Mississippi <muldrey@olemiss.edu>
  2. Parker Hardy, University of Mississippi <pwhardy@go.olemiss.edu>
  3. Marwan Besrour, University of Sherbrooke <marwan.besrour@usherbrooke.ca> 

References

[1] J. Hasler and S. Shah, “SoC FPAA Hardware Implementation of a VMM+WTA Embedded Learning Classifier,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 8, no. 1, March 2018. pp. 28-37.

[2] Voelker, Aaron R., Ivana Kajic and C. Eliasmith. “Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks.” NeurIPS (2019).

[3] S. Shah and J. Hasler, “VMM + WTA Embedded Classifiers Learning Algorithm implementable on SoC FPAA devices,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 8, no. 1, March 2018. pp. 65-76.

[4]  J. Hasler, "Large-Scale Field-Programmable Analog Arrays," Proceedings of IEEE, 2020. 

[5] S. George, S. Kim, S. Shah, et. al, "A Programmable and Configurable Mixed-Mode FPAA SOC,” IEEE Transactions on VLSI, vol. 24, no. 6, 2016. pp. 2253-2261. 

[6] J. Hasler, “Defining Analog Standard Cell Libraries for Mixed-Signal Computing enabled through Educational Directions,” IEEE ISCAS, 2020.

[7] Tapson, J. C., Cohen, G. K., Afshar, S., Stiefel, K. M., Buskila, Y., Wang, R. M., Hamilton, T. J., & van Schaik, A. (2013). Synthesis of neural networks for spatio-temporal spike pattern recognition and processing. Frontiers in neuroscience, 7, 153. https://doi.org/10.3389/fnins.2013.00153.

[8] S. Brink, S. Nease, Hasler, S. Ramakrishnan, R. Wunderlich, A. Basu, and B. Degnan, “A learning- enabled neuron array IC based upon transistor channel models of biological phenomena,” IEEE Transactions on Biomedical Circuits and Systems, vol. 7, no. 1, pp. 71–81, 2013.

[9] S. Kim, J. Hasler, and S. George, "Integrated Floating-Gate Programming Environment for System-Level Ics," IEEE Transactions on VLSI , 2016.   

[10]  J. Hasler, and A. Natarajan, “Continuous-time, Configurable Analog Linear System Solutions with Transconductance Amplifiers,” IEEE Circuits and Systems I, Vol. 68, no. 2, 2021. pp. 765-775. http://hasler.ece.gatech.edu/AnalogLinearSystemSolutionFeb2021.pdf

[11] J. Hasler and S. Shah, "An SoC FPAA Based Programmable, Ladder-Filter Based, Linear-Phase Analog Filter," IEEE CAS I, vol. 68, no. 2, 2021.

[12] R. Gary Leonard, and George Doddington. TIDIGITS LDC93S10. Web Download. Philadelphia: Linguistic Data Consortium, 1993.

[13] Warden, Pete. “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition.” ArXiv abs/1804.03209 (2018)

[14] D.W. Graham, Hasler, R. Chawla, and P.D. Smith, “A low-power, programmable bandpass filter section for higher-order filter applications,” IEEE Transactions on Circuits and Systems I, Vol. 54, no. 6, pp. 1165 - 1176, June 2007.

[15] J. Hasler, S. Kim, and A. Natarajan, “Enabling Energy-Efficient Physical Computing through Analog Abstraction and IP Reuse,” Journal of Low Power Electronics Applications, published December 2018. pp. 1-23.

 

Owner

Nikhil Garg

Summary

This Skywater 130nm implementation project will build a programmable end-to-end microphone-to-symbol keyword classifier. This full system classifier extends the large-scale Field Programmable Analog Array (FPAA) command-word classifier [1] to allow for larger vocabularies (100-500) by accounting for temporal dynamics in the classification. This design utilizes recent machine learning modeling based on Legendre Memory Units (LMU) that incorporates linear temporal dynamics for neurally-based classifier approaches [2]. The compiled SoC FPAA classifier required 20-30μW for a range of classification tasks [1,3-5]. We expect this new custom classifier component to require 20-40μW when the μP is turned off. The additional general linear dynamics block which allows for the detection of patterns over time. This block effectively allows each output category to have a preferred temporal signal instead of being classified based on a single spectral pattern. This effort looks to compile this acoustic classifier network in a single Caraval Skywater platform utilizing and modifying the analog Floating-Gate (FG) standard cell library to place and route the final structure. The analog FG standard cell library, based upon experiences in FPAA devices [6], is currently in fabrication through Skywater 130nm CMOS.

Category

acc

Process

sky130A

Shuttle Tags

SSCS-21