# M0N0: An Ultra Low Power Sub-threshold Microcontroller

James Myers Arm Research Arm Ltd Cambridge, UK Email: james.myers@arm.com

Abstract—In developing the M0N0 microcontroller, Arm seeks to advance the state of the art power and performance for low-power applications by marrying ultra low leakage with 32 bit performance and programmability at greater active power efficiency than currently available MCUs. The system targets 10nW shutdown power and active power down to 10uW while executing user code on a Cortex-M33 processor.

Keywords—sub-threshold, microcontroller, low power, MCU, always-on sensors

## I. INTRODUCTION

DARPA's N-ZERO program [1] poses the challenge to design always-on sensors and wakeup radios with 10 nanowatt power consumption. Prototype chips are now being demonstrated that meet the extremely difficult target and provide a smart wakeup source to a microcontroller (MCU).

For system designers seeking to make use of N-ZERO sensors, there is a stark choice to be made when it comes to selecting a commercial MCU. To use an easily programmable and performant 32b MCU with leakage power which swamps the new sensor? Or to use an 8b MCU with comparable leakage power but poor active power and performance?

The M0N0 MCU seeks to eliminate the tradeoff by delivering an Arm Cortex-M33 with DSP extensions designed using sub-threshold techniques which enable an industry-leading mix of low shutdown power (10 nanowatts) and low active power (10 microwatts) as shown in Fig. 1.



Fig. 1. M0N0 Targets vs. Commercial and Research MCUs [2-4]

Ben Conrad Arm Research Arm Inc. Austin, TX 78741 Email: ben.conrad@arm.com

In an NZERO application, the M0N0 MCU would act as the gatekeeper between the low-power NZERO sensor and the relatively 1,000x higher powered long-range radio shown in Fig. 2., waking the radio only when additional processing of sensor signals warranted the energy expense of radio communication.



#### Fig. 2. M0N0 Power Targets Comparison

We envision that M0N0 could enable additional functionality, such as allowing digital sensor fusion in addition to analog fusion. False alarm radio wake events could potentially be reduced by an order of magnitude or more. Sensor history could be preserved and the device could adapt to the environment, and a historical context could be communicated upon detection of a real alarm. Finally, the MCU could initiate periodic self-test to ensure the node is operational.

The M0N0 project intends to deliver both the optimized MCU and a hardware and software environment enabling government contractors to develop new applications and have new chips fabricated for government purposes. For application-specific feasibility studies to be conducted in parallel to the M0N0 project we have released a data sheet describing system performance, memory capacity, and other relevant design parameters. Arm will also create demo boards running the M0N0 chip with an example keyword spotting application as well as development boards (similar to the test board in Fig. 3) and infrastructure for new software development. Finally, Arm will develop a chip GDS-II generator tool to via-program ROMs with new software.

DISTRIBUTION STATEMENT A. Approved for public release: distribution is unlimited.



Fig. 3. Test Board for M0N0 circuit chip 2. Delivered demo and development boards will be considerably simpler due to less complex test functionality.

## II. SYSTEM DESIGN AND SPECIFICATIONS

## *A. ROM for code, RAM for development*

Non-volatile memories such as embedded flash have excellent flexibility but large power due to high voltage charge pumps and complex sensing schemes. M0N0 uses ROM as it is inherently more efficient and can also be designed to scale with logic to optimally efficient sub-threshold voltages.

The drawback is that the code is fixed at time of chip manufacture. To enable development of new code, M0N0 includes a dense SRAM (labelled "DevCode SRAM" in Fig. 4.) which can hold work-in-progress software while it is tested, debugged, and verified. Once completed, a new chip layout can be generated without the use of EDA tools using a chip generator licensed from Arm. No-cost licenses for government purposes are available. The floorplan one of the chips is shown in Fig. 5.



Fig. 4. M0N0 system diagram



Fig. 5. M0N0S1 chip floorplan, showing two routing layers and circuit macros. 2x3mm dimensions in 65nm.

### B. Highly efficient integrated analog for 10nW shutdown

The 10 nanowatt shutdown power budget for the chip was challenging and required careful optimization or elimination of always-on components. Once the digital sub-system is thoroughly power-gated, most of the leakage is from analog/mixed-signal components. From the start of the project, a tight budget has been set for each block as described in Table 1.

| M0N0 Shutdown Power Budget by Component |                      |                       |  |  |
|-----------------------------------------|----------------------|-----------------------|--|--|
| M0N0 Component                          | Target Power<br>(nW) | Present Power<br>(nW) |  |  |
| 10x IO pads on VBAT                     | 1                    | 1                     |  |  |
| Real time clock, disabled               | 1                    | <0.5                  |  |  |
| Integrated voltage regulator            | 3                    | 4                     |  |  |
| Battery monitor                         | 1                    | <0.5                  |  |  |
| Power control state machine             | 2                    | <0.5                  |  |  |
| Shutdown SRAM                           | 2                    | 2.5                   |  |  |

## Table 1: Shutdown power budget

# C. Sub-threshold operation for 10uW active power

Operating digital sub-systems below their threshold voltage has been shown in prior Arm work [2, 5] to increase energy efficiency by up to 10x. Once circuit challenges such as SRAM design and temperature variability are overcome, the next challenge is to deliver a range of performance/power points without undue burden to software.

M0N0 will remove the need for software to specify a regulator mode or voltage/frequency pair from a characterized lookup table – as is common for DVFS in mobile systems. The real requirement is for the hardware to deliver a guaranteed application-specific minimum performance. This can be done using our automatically scaling system clock

source [5], as feedback to the voltage regulator. This also has the benefit of reduced regulator power in voltage comparators.

Fig 6 below shows the simulated results from the secondgeneration sub-threshold chip, plotting energy and power against software requested frequency range. Note that this models an average compute-bound workload without excessive IO or AES accelerator usage.



Fig. 6. Active power and energy per cycle within M0N0 frequency range

# III. KEY RESULTS

# A. Memory circuits

As seen in figure 5, memory is a large portion of the M0N0 chip area. Four different specialized memories are used to in order to maximize SW capability, minimize shutdown power, and minimize active energy. These were designed and measured on the first two chips of the program, with key results shown in table 2.

Table 2 - measured memory characteristics

| Unpublished Arm 65nm simulation data | 0.4V ROM<br>(Vmin)  | 0.4V SRAM<br>(Vmin) | 1.2V SRAM<br>(leakage) | 1.2V SRAM<br>(density) |
|--------------------------------------|---------------------|---------------------|------------------------|------------------------|
| Frequency (TT 25°C)                  | 560kHz              | 340kHz              | 4.10MHz                | 320MHz                 |
| Read energy (32b)                    | 1.58pJ              | 3.89pJ              | 22.66pJ                | 36.73pJ                |
| Write energy (32b)                   | N/A                 | 3.99pJ              | 22.79pJ                | 38.71pJ                |
| Leakage/bit (standby)                | 1.5nW               | 10pW                | 1.8pW                  | 22pW                   |
| Leakage/bit (retention)              | 0                   | 3.6pW               | 75fW                   | 18pW                   |
| Area/bit (includes periph)           | 0.69∝m <sup>2</sup> | 6.2∝m <sup>2</sup>  | 6.2∝m²                 | 0.64∝m²                |

# B. Software capability

To show the software capability of the system, a keyword spotting application was developed targeted to Google's speech commands dataset [6] with 10 word dictionary. The approach of MFCC feature extraction plus 3 layer fully connected neural network was adapted from [7] but running it on M0N0 required optimization to reduce SRAM consumption, and efficiency gains were unlocked by reducing performance demand and moving to all fixed-point computation. The software was written in C and compiled with open source compiler GCC, taking good advantage of Cortex-M33 features such as DSP and 8bit vector instructions. It has been simulated to run correctly on the M0N0 system for approximately 50uW active power, but this is in the process of being measured.

## IV. NEXT STEPS

The third chip of the series of four is now in the lab undergoing testing. All learning will be carried forward into the final chip which tapes out in April. This will be available, along with demo and development boards, in early 2020.

## ACKNOWLEDGMENT

The authors want to acknowledge the funding and support received from the DARPA Microsystems Technology Office (Agreement No. HR0011-17-9-0025). The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

## REFERENCES

- R. H. Olsson, R. B. Bogoslovov and C. Gordon, "Event driven persistent sensing: Overcoming the energy and lifetime limitations in unattended wireless sensors," 2016 IEEE SENSORS, Orlando, FL, 2016, pp. 1-3.
- [2] J. Myers et al., "A Subthreshold ARM Cortex-M0+ Subsystem in 65 nm CMOS for WSN Applications with 14 Power Domains, 10T SRAM, and Integrated Voltage Regulator," in IEEE Journal of Solid-State Circuits, vol. 51, no. 1, pp. 31-44, Jan. 2016.
- [3] T. Karnik et al., "A cm-scale self-powered intelligent and secure IoT edge mote featuring an ultra-low-power SoC in 14nm tri-gate CMOS," 2018 IEEE International Solid - State Circuits Conference -(ISSCC), San Francisco, CA, 2018, pp. 46-48.
- [4] M. Fojtik et al., "A Millimeter-Scale Energy-Autonomous Sensor System With Stacked Battery and Solar Cells," in *IEEE Journal of Solid-State Circuits*, vol. 48, no. 3, pp. 801-813, March 2013.
- [5] J. Myers et al., "A 12.4pJ/cycle sub-threshold, 16pJ/cycle near-threshold ARM Cortex-M0+ MCU with autonomous SRPG/DVFS and temperature tracking clocks," 2017 Symposium on VLSI Circuits, Kyoto, 2017, pp. C332-C333.
- Pete Warden, Google Speech Commands Dataset v0.02: https://ai.googleblog.com/2017/08/launching-speech-commandsdataset.html
- [7] Yundong Zhang et al, "Hello Edge: Keyword Spotting on Microcontrollers", <u>https://arxiv.org/abs/1711.07128</u>