The 4 Key Elements to Designing a Truly Hands-Free Computing and Why Design Matters for the Future of Augmented Reality
The term hands-free computing has been bandied around for years as the Holy Grail of wearable computing. But making a truly functioning hands-free computer is no walk in the park.
We’ve all used speech recognition before. In a pristine environment, with little noise, speech works fine if a user issues the expected commands (and enunciates every syllable). In louder environments, however, you get a familiar response as this:
“Sorry, I didn’t understand that. Can you please try again?”
But let’s back up…what is truly hands-free computing, anyway?
In the world of mobile computing, truly hands-free computing means you have the ability to operate a full computer without the need to physically touch any buttons, swipe screens with your fingers or even wave frantically in front of the computer’s camera (in the case of gesture input devices).
Thus, truly hands-free computing requires at least one mode of interaction or modality, typically speech technology or eye-tracking. Eye-tracking can work, but it’s too limited to drive a full computing environment.
Speech technology is the real answer, but it’s only part of the solution for truly hands-free computing.
Why Hands-Free Computing is Needed in Industrial Environments
For many of our enterprise and industrial customers, truly hands-free computing has become the new business requirement and is the new given.
Workers wear gloves for grip and safety, carrying heavy tools or performing actions with their arms and hands while needing the assistance of data and communication. Connected industrial workers should not now be expected to use their hands or fingers to control a device.
Imagine a worker halfway up a wind-turbine tower, or part-way down a tunnel for inspection or QA.
The use of hands is required to maintain balance on a ladder or platform; this is more critical than accessing a computer display for information. Then there’s the person with thick protective gloves on; touchscreens typically cannot respond, and the buttons themselves have to be quite large to be hit with accuracy.
Finally, the worker must hold tools in their hands while still needing to access information on the spot. Who wants to set the tools down to interact with a screen before picking the tools up again?
Here are the 4 Key Elements to Designing a Truly Hands-Free Computing for Industry
There are four challenges to overcome before speech recognition can be useful in all environments:
Good Microphones
If we are going to rely on speech as our primary interface, we’d better have really good microphones, and a useful number of them around out headset. Our RealWear ruggedized wearable computers currently use an array of four microphones, placed at strategic locations around the user’s head. This allows the user’s voice and, importantly, the ambient noise to be heard as well.
One of the challenges we’ve overcome is attaining an unsurpassed level of manufacturing consistency around our microphones. That means every microphone in every device behaves the same as every other one, and each is assembled and tested to ensure identical behavior from device to device. Because speech is so important for our device to function, we have gone above and beyond to ensure this manufacturing consistency, and in doing so have developed significant IP for the factories to support this task.
Noise Reduction / Voice Amplification
Now that we are guaranteed a consistently good level of audio data acquisition, we can feed this into the latest generation of noise reduction and voice amplification algorithms to weed out the ambient, unwanted sounds. We employ a number of different algorithms, constantly switching between them based on the noise types and the environments. Some algorithms are beam-forming; some focus on removing noise; some use deep-learning AI; and others specialize in extracting human voice signals. All these algorithms are subtly different, but when used at the right time, under the right conditions, can produce the cleanest, most audible signal out.
Speech Recognition without Requiring an Internet Connection
Thanks to high quality microphones and well-engineered noise reduction, we are now able to feed clean audio signals into our speech recognizer, confident that these signals contain nothing but the user’s commands. We make use of one of the most sophisticated speech recognition engines available today, which works without Internet connectivity. Our speech system works fully offline, in more than 40 languages.
By design, RealWear’s system is extremely responsive, understanding what the user says within 200 milliseconds of a command being issued. That’s essentially giving the user instantaneous feedback right when the user says something.
And thanks to the noise reduction, our fast-response speech recognition performs even in the harshest industrial environment, at noise levels approaching 100dB.
Software – How do you use speech recognition to drive an application?
Even with microphones, noise reduction and speech recognition in play, there is still a very important piece we needed to address: How do you use speech recognition to drive an app?
”Surely you are not going to give us an SDK and ask us to completely re-write all of our applications for hands-free?”
The answer is a big “NO”. We’ve done all of this work and embedded the speech and audio technology into the Android operating system. All you need to do is write your Android application as you normally would for touch (go ahead, put buttons, checkboxes and all manner of control widgets on the screen). Once your app is running on our HMT-1 headset, it will automatically become speech enabled – now you no longer need to press the button; you just say the name of the button to activate it. All for free – no additional programming required.
(Plus, we don’t have any of those wake-up words you have to repeat; there’s no, ‘Hey Siri’, or ‘OK, Google’, or ‘Alexa’. Just say what’s on the HMT-1 screen, whenever you want.)
Wrapping Up
To summarize – be wary when someone talks about their truly hands-free operating system. Ask yourself – is it truly hands-free? Can I drive the entire OS with voice, or is it just a single app? Can it work in high noise environments? Finally, do I need to re-write my app with an SDK to handle the speech interface? You’ll be surprised at how few offerings really meet these requirements.