Voice Activated Gatekeepers
Posted 10/12/2016 by Weston Xu, Marketing Manager at MEMSensing Microsystems Co. Ltd
For many consumers, smart speakers are the gateway device to smart home management. Using these elegant devices, users can control their music, quickly order staple items online, and get answers to burning questions through the power of the Internet. Major players have already entered into the market, among them Amazon, Google, Logitech, and Jingdong, with more expected to come.
Today, most smart speakers focus on a combination of three main tasks; as a smart home center, as a home shopping gateway, and for big data acquisition. To enable these tasks, the core function of any smart speaker is good speech recognition, which requires not only a microphone array, but integrated hardware and software algorithms to receive and process the speech. To ensure that voice commands are never missed, an array of microphones (typically seven to eight), are employed around the device. For example, the Amazon Echo uses 7-microphone array.
Most solutions on the market utilize PDM MEMS Microphone Array + Micro Processor or Analog MEMS Microphone Array + Audio ADC + Micro Processor. To process the signals it receives, each pair of microphone signals is passed through one of 4 dual channel audio Analog Digital Converters (ADC). Inside each ADC, the audio signals are converted into I2S/PCM format and fed into the application microprocessor, which only has a limited number of I2S/PCM interfaces. While this system is adequate, it is cumbersome, as four ADC are required to do the conversion for the full array.
MEMSensing solves this problem by integrating an audio ADC directly into the microphone. This digital microphone internally converts the analog signal into a digital I2S stream, eliminating the need for external ADC chips. An array of these microphones can be connected directly to a microprocessor, or if necessary, aggregated with a Lattice iCE40 series FPGA, at a significantly reduced bill of materials (BOM) and cost savings over using external multiple ADC.
MEMSensing also reduces power consumption, as only the digital microphone and its attached application processor need to stay active to detect sound. This is a valuable feature for the mobile market, where voice recognition is already popular (Siri, Cortana, Google), but typically requires a button press, hand gesture, movement or some other indication for the device to start listening. Lowering the power consumption allows for more robust voice recognition, that can be always on for battery dependent mobile devices.
In the future voice recognition will become more important as artificial intelligent (AI) systems develop and consumers begin to talk to their devices in more natural languages. This will make smart speakers and smartphones both even more powerful and capable devices, as they become the gateway to the digital world.