In the context of Natural Language Processing (NLP), the size of state-of-the-art models has grown enormously over the last few years. Indeed, the language models for instance have increased exponentially in terms of number of parameters ; leading to a very high complexity.
Recently, OpenAI produced the largest language model ever created ; 175 billion parameters ! This model is called Generative Pre-trained Transformer 3 (GPT-3) and can output human-like text that is difficult to distinguish from a writing produced by a human.
Just to contextualize how huge this model is, the graphic below shows the progress in terms of size of some models over the last two years.
This observation remains true in several fields of Artificial Intelligence such as Computer Vision ; as it shows the following graph :
In order to produce these models, an enormous amount of computational power is used and a significant amount of training time is consumed. Therefore, a consensus can be established : cutting edge models are huge in size, consume time and require power ; this is our Elephant !
While models and processing units are getting bigger and bigger, some researchers became interested in the other side of the river ; interested in little and tiny smart devices that can be embedded anywhere : in vehicles and office machines, in medical equipment and production facilities, in power tools and toys, in everywhere ; they can be just ubiquitous.
The backbone of these little objects are Microcontrollers, a small boards integrating the processor, the memory and the storage in a single block, and built for the realization of some very specific tasks. The picture below is for Arduino Nano 33 BLE Sense, a tiny platform equipped with a Cortex-M4 processor and designed for smart devices. It integrates several sensors to capture, for instance, humidity, temperature, sound, gesture and light intensity. In addition, this board has 45×18 mm as dimensions, 8 g as weight, 1 MB of memory and 3.3 V as operating voltage.
At this level, it is clear that these microcontrollers are very constrained and of limited capacity in terms of computing power, memory and storage. Hence, this is our Fridge !
How do we put the Elephant inside the Fridge..
On the one hand, we have huge models that can perform sophisticated tasks with high performance like text generation ; and on the other hand, we have tiny microcontrollers with very restricted capabilities and highly promising applications.
The challenge is how we can benefit from the high accuracy of these models and the tiny size of these microcontrollers to enable and produce high quality smart devices.. In other words, How do we put the Elephant inside the Fridge?!
Here when Tiny Machine Learning (TinyML) comes in !
TinyML is a new way of creating smart devices. Instead of running machine learning models on cloud services or locally on big processors, we run them on very low power platforms where computation, data collection and sensors are embedded in a single system without any need for external resources.
For instance, we can imagine a smart device that can be activated or deactivated using Wake-Word Detection without sending any request to the cloud.
So what is the relevance of creating an embedded voice user interface that doesn’t need external computation to run some tasks ? To answer that, let’s take the case of Google Assistant, Apple’s Siri or Amazon Alexa. These digital assistants are listening for the user voice and waiting for the Wake-Up Word (e.g. “Hey Google”, “Alexa”) to launch the dialogue. Therefore, they are potentially sending constant stream of audio from the user device to a data center to be processed searching for this Wake Word ; this approach rises many issues :
- Necessity of vast amounts of bandwidth ;
- Exposing the battery of the device to intensive computation ;
- Privacy implications ;
- Disrupting network communication and reducing the performances of the assistant as a result.
By investigating these drawbacks, one shall remark that we just need the part of the audio signal that follows the Wake-Up Word, not the preceding part. Hence, we can embed a tiny model trained to capture the Trigger Word and start the online streaming once the word is detected. By achieving that, we would protect user privacy, save bandwidth and reduce the use of battery.
The video below shows a demonstration of Wake Word Detection. The model performs a classification task of speech and can recognize the words “Yes” and “No” ; a green LED (Light-Emitting Diode) is the sign of detecting the word “Yes” and a red one is a sign of recognizing “No”.
Our Elephant now is inside the Fridge ! And as the demo shows, he is doing well !
So how TinyML works in practice ?
The previous demonstration shows a speech recognition model that can detect the words “Yes” and “No”. Despite the lack of a rich vocabulary, our model remains oversized compared to the capabilities of the microcontroller. Therefore, to enable TinyML, a general framework can be followed :
- Train a model (using, for instance, Deep Learning and TensorFlow) ;
- Fine-tune and evaluate the model ;
- Convert and optimize the model to run on-device (using TensorFlow Lite for instance) ;
- Deploy the model to a microcontroller and run inference at edge.
Following these steps, several technologies and techniques are involved and need to be enlightened. In a next post, we will investigate and experiment from scratch the implementation of the previous demonstration and provide a further detailed pipeline..
References and Credits..
 Brown B. T. et al., (2020), “Language Models are Few-Shot Learners”, https://arxiv.org/pdf/2005.14165.pdf
 HarvardX’s Tiny Machine Learning (TinyML) MOOC
 Bianco S. et al., (2018), “Benchmark Analysis of Representative Deep Neural Network Architectures”, https://arxiv.org/pdf/1810.00736.pdf
 Warden P., Situnayake D., (2019), “TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers”, O’Reilly Media