The Rise of the Audiological Machines

Author: Aaron C. Jones, Au.D.

What is intelligence? Its definition is elusive but certainly includes reference to the activities of processing, reasoning, and learning. One definition was provided by Gottfredson and a group of 52 academic experts:

“Intelligence is a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings—‘catching on’, ‘making sense’ of things, or ‘figuring out’ what to do.” (1994, p. A18)

Clearly, intelligence is something we associate with the brain, but increasingly people use the term artificial intelligence (AI). Data show that popularity of the Internet search term “artificial intelligence” has more than doubled in the last 10 years (Google Trends, 2019). We see frequent references to AI in popular culture, and it is a technological basis for thousands of entrepreneurial ventures (The AI 100: Artificial Intelligence Startups That You Better Know, 2019). With this increasing societal and occupational interest, AI is bound to make inroads into the hearing care industry, which means that audiologists need to be aware of it.
Artificial Intelligence
What is AI? Simply put, AI is aptitude, demonstrated by a computer, for a task normally accomplished by a brain. It uses mathematical models, which are systems of equations that produce desired outputs for specific inputs, to mimic brain function through processing information, reasoning based on that information, and learning from it. Models are developed and trained using input data that typically have patterns and are labeled. In other words, AI involves using systems of equations trained with real-world data to automatically produce, in a brain-like way, desired outputs for new inputs.

The term “artificial intelligence” was coined in 1955 (McCarthy, et al., 2006). In their proposal, McCarthy and his team conjectured that “every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” Its idea dates back to the automatons of Greek mythology.

Over the years, AI has been depicted as enabling the automation of intellectual and physical human tasks. Depictions have ranged from utopian to dystopian. Utopian ones like the movie Robot & Frank, where a man gains both a friend and an accomplice in a robot, nurtured the idea that AI may ultimately assist humans in our daily activities and professions (Schreier, et al., 2012). At the other end of the spectrum, dystopian depictions like the film interpretation of Isaac Asimov’s novel I, Robot have fueled fears that AI may someday replace humans in our professions (Proyas, 2004). Although often depicted in the context of robots, AI does not require them. Robots themselves are not AI but they can be functionalized by it. Even without using AI, robots can perform defined tasks based on sensor data. The use of sensor data to trigger a computational decision is not AI.

Recently a computer scientist and former Chief Scientific Officer of Baidu, which is one of the largest internet and AI companies by revenue in the world, said that tasks a person can do with no more than one second of thought may be automated with AI now or in the near future (Ng, 2017). This suggests that some audiological tasks today may be ripe for automation with AI.

AI and automation have, in fact, already affected audiology. For example, screening audiometry has been automated, without using AI, as demonstrated by the Welch Allyn AudioScope®. Some manufacturers use AI to improve hearing instrument performance or to automate audiological tasks like fitting fine tuning. As audiologists we are increasingly faced with AI terminology but, even though it has become a part of our lexicon, that terminology is often misunderstood and misused. Furthermore, our own lack of AI awareness and fear—fueled by dystopian depictions in media—have made us susceptible to marketing hype.
Building Blocks
Fear and uncertainty are natural human responses when faced with a complicated topic like AI. Fortunately, AI is comprised of building blocks that may be more easily understood in isolation, especially when they have familiar real-world examples. Today, the AI building blocks of computer vision, natural language processing, and machine learning frequently appear individually or together in everyday and healthcare applications.

Computer Vision
Biological vision is the process of detecting light with the eyes, transmitting neural representations of light to the brain, processing those signals, and perceiving them. We typically refer to the resulting perception as ‘vision’. With vision, humans can describe the content of a scene or image and recognize similar ones.

Computer vision is a building block of AI and is the analog of biological vision. With computer vision, mathematical models are trained to recognize specific types of digital images. Features and patterns, contained in those images, are used to train mathematical models so that they may be used to recognize similar visual features and patterns in other images. Early examples include optical character recognition (OCR) where a computer recognizes a handwritten letter of the alphabet and converts it to an ASCII (American Standard Code for Information Interchange) one, to be used in word processing and other computer applications. Other familiar uses of computer vision include object, fingerprint, retina, and facial recognition.

Natural Language Processing
Biological language processing involves detecting, transmitting, processing, and perceiving spoken or written communication. In the case of speech, the ear functions for detection whereas in the case of written communication, the eyes do the job. In both cases, we are processing morphology, syntax, semantics, and pragmatics.

Natural language processing (NLP) is another building block of AI. It is the analog of biological language processing. In the case of NLP, mathematical models are trained to recognize specific acoustic or textual representations of language. Familiar applications of NLP include transcription and translation. It can be combined with computer vision, specifically OCR, to transcribe text, if necessary translate it, and finally generate a text or speech output.

Perhaps of more audiological interest, however, NLP may be used to transcribe speech. It commonly does this with automatic speech recognition (ASR). Functional application of NLP with ASR is dependent on one or more language models and an acoustic model. A language model, which is sometimes called a statistical language model, estimates the probabilities of specific strings of words occurring in a language. It predicts the next word in a phrase based on the word or words before it.

Due to the complexity of a language that includes millions of possible strings of words, a model represents only a subset of it. A language model often simulates, with high accuracy and precision, just one topic of conversation. Speaking rate, age, gender, accent, dialect, slang, and other language variables challenge accurate and precise ASR. Robust language models require training sets with thousands of audio samples—and if language translation is required, then high fidelity is required for both the source language and the target language.

In addition to the language model, a robust acoustic model is necessary for robust ASR. Distance, noise, and reverberation are important variables to consider with ASR, just as they are with biological language processing. ASR that works across a broad range of acoustic environments was undoubtedly trained with data acquired using different microphone distances, different background noise, and different room acoustics.
Machine Learning
Biological learning is the process of acquiring knowledge. It comes through exposure to new data and it begins before birth; pre-natal learning has been demonstrated and specifically language learning has been found to begin as early as the last 10 weeks of pregnancy (Moon, et al., 2012). We learn throughout life with exposure to new information and experiences.

Machine learning, which is another building block of AI, is the analog of biological learning. It is the ability of a model to evolve with new data. Supervised machine learning is most common type, where a human labels new data in a training set and correlates it with the specific output to train models. An example is user preference learning. Unsupervised machine learning is the other type. In this case, models recognize natural patterns in unlabeled data. Deep learning is a specific class of machine learning whereby features of input data are extracted in layers more like the process of feature extraction in the ascending auditory system.

Machine learning, computer vision and NLP are commonly used building blocks of AI. It all sounds futuristic, but individually and together they have applications that are increasingly pervasive in our lives. We encounter them in our homes, on our mobile devices, and even in the audiology profession.
Everyday Applications
In cities around the United States, it is becoming common to see self-driving cars. Technologies from autonomous vehicles are being offered in mass-production cars; parking assist, braking assist, and lane control are just a few. We are even seeing overflow of these technologies into the transport trucking and boating industries. An industry has quickly emerged for AI-enabled autonomous vehicles that includes the entire stack from sensor development to services.

Related but more relevant applications for audiology include transcribers and translators, virtual personal assistants (VPAs), and chatbots. These AI applications have been in existence for years and are already making a splash in our industry.

The ability of AI to produce a text or speech output for a given text or speech input is compelling in audiology. These text-to-speech, speech-to-text, text-to-text, and speech-to-speech applications are already here with automatic captioning and subtitles for telephone communications and broadcast media. As digital processing and storage technologies advanced, mobile apps of AI-enabled transcription and translation proliferated.

Transcription and translation apps like Google Translate and SayHi Translate from Nuance Communications may be used on our existing mobile devices, with or without a wireless network. Their accuracy and precision are, as previously discussed, dependent on language and acoustic models. Google may perform more robustly for consumer topics while SayHi, which was trained using doctor dictations, may perform better when the discussion includes healthcare terminology. AI-enabled tools like these allow us to better overcome communication barriers, using our existing mobile devices.

There are even dedicated translation devices that serve a similar purpose for specific use-cases. The ili device from Logbar, which uses ASR to produce a Japanese or Mandarin speech output from English speech, is one example. It was trained for a travel application and, therefore, is most accurate with topics like shopping, dining and navigating.

While ASR may be used as a basis for transcription and translation, it may also be used to enable a virtual personal assistant (VPA). The most commonly used VPA is Siri, which Apple claims is used monthly on over 375 million devices in 21 languages across 36 countries (Cook, 2017). VPAs like Siri and Alexa, from, use ASR to convert voice commands and questions to text before producing a corresponding output. VPAs can add a meeting to your calendar, find a recipe, play a song, and more.

ASR also serves as a basis for so-called chatbots with which you can interact via Internet or phone. Many companies already use chatbots to triage incoming customer service calls. Simply type or state your question and a get an answer by interacting with a chatbot that leverages a language model to determine its most appropriate answer. If you phone customer service these days, you may never actually speak with a real person. Instead, you might speak with a chatbot.

Multiple building blocks of AI can be used together in an individual application. In one implementation, computer vision and NLP together detect states of emotion or confusion more robustly than either building block can do on its own (Amer, et al., 2014). Similarly, computer vision may be used to detect visemes, which are fundamental facial cues that map to one or more phonemes, in order to improve the accuracy of AI for speech-in-noise beyond the performance limitations of ASR (Potamianos, et al., 2012).
Healthcare Applications
Although AI is most common with everyday applications, it is finding its way into the healthcare industry. Today, AI has found applications in disease prediction, diagnostics, and management. Prognos is using AI to predict disease from big data. Ginger is using it to assess mental health. Sensely is using AI to direct insurance plan members to resources, and for remote monitoring of chronic illnesses like congestive heart failure and chronic obstructive pulmonary disease. Arterys is using computer vision to analyze medical images and drive diagnoses. These are but a few examples. The list goes on and on.

AI is being applied in the audiology profession, too. Application of computer vision is in a particularly early stage, but at least one product is under development that leverages it for automated, otoscopic diagnosis of common middle ear disorders. More mature in its audiological application, NLP has been used with both Cloud and mobile apps. Microphones on connected hearing instruments provide a means by which a user can remotely access a VPA, transcribe, and translate. It is important to note that, like non-audiological applications, hearing instrument applications of ASR have their limitations—distance, noise, reverberation, dialect, accent, jargon, speech rate, and more—due to the challenges of training language and acoustic models, as previously described.

Looking closer at the AI building block of machine learning, two notable applications have surfaced in the hearing aid industry. The first application is hearing instrument fitting fine tuning, based on user preferences and behaviors. The second application is acoustic classification, to inform automatic changes of hearing instrument sound performance.

User Preference and Behavior Learning
In the course of a hearing instrument fitting, fine tuning is traditionally performed in-clinic, based on classical methods of validation: aided speech testing, questionnaires and inventories like the International Outcome Inventory for Hearing Aids (IOI-HA), and face-to-face discussion. Modern methods of hearing aid validation, leveraging teleaudiology and ecological momentary assessment, are gaining traction with some manufacturers (Timmer, et al., 2018).

Another approach to fitting fine tuning is to use machine learning in hearing instruments or their mobile app, for user preference and behavior learning. The idea is that a hearing instrument fitting may be allowed to evolve, without involving an audiologist, based on user preferences for volume and sound performance in different listening environments. User preference and behavior learning has been implemented by multiple hearing instrument manufacturers. This puts a modest amount of control in the hands of hearing aid wearers, which may be a double-edged sword. Ideally, the use of machine learning in this way improves user satisfaction. However, in reality, it could sometimes lead to under-amplification for users with strong preference for listening comfort.

Acoustic Classification
Modern hearing instruments automatically switch programs, based on changes in listening environments that are acoustically classified by the hearing instruments. This automaticity sometimes obviates the need for manual user adjustments, but how do these acoustic classifiers work?

“Automatic classifiers sample the current acoustic environment and generate probabilities for each of the listening destinations in the automatic program. The hearing instrument will switch to the listening program for which the highest probability is generated. It will switch again when the acoustic environment changes enough such that another listening environment generates a higher probability.” (Hayes, 2019)

Some manufacturers use machine learning to develop their acoustic classifiers in order to better distinguish between listening environments. Using a training set of many audio clips from different listening environments, acoustic classifiers learn to better differentiate between environments that are remarkably similar, and even trick people who have normal hearing thresholds. Accurate acoustic classification is the basis for automatic sound performance that hearing aid users may prefer (Rakita and Jones, 2015; Cox, et al., 2016).
Audiological Machines
What might the future hold for AI in the audiology profession? Clear applications of computer vision, NLP and machine learning are surfacing. Together these and other AI building blocks support automation of some audiological tasks. Pure-tone and speech audiometry, and perhaps assessment of central auditory processing, are strong candidates for near-term automation. Furthermore, with applicability to primary care and otolaryngology, we may see routine use of computer vision to diagnose middle and outer ear disorders. In the more distant future, computer vision may be used for viseme recognition to supplement and improve speech recognition in noisy environments; although privacy concerns, digital memory, and battery life remain obstacles.

NLP is pervasive. Companies like Apple,, Google, Nuance Communications, and Baidu continue to mature language models and acoustic models, thereby commoditizing transcription, translation, VPAs, and chatbots. We may leverage these models to caption and subtitle in challenging listening environments, where people struggle most. In addition, we may see implementations of NLP within hearing instruments rather than on mobile phones, assuming that latency and battery life barriers can be overcome. NLP innovations seem likely to focus on speech-in-noise improvements and further integration with mobile phones.

AI will continue to inform acoustic classification. As acoustic models mature, we may expect to see hearing instruments automatically identify even more listening environments and adjust sound performance accordingly. Also, with machine learning, our understanding of user preferences and behaviors should improve over time. With this improvement, AI-mediated fitting fine tuning is likely to become more efficient and effective, thereby decreasing the need for hearing aid follow-up appointments.

AI is enabling automation of audiological tasks, but it is not something to fear. AI is unlikely to replace audiologists. Some tools may even help audiologists thrive amid the rise of the machines. Counseling is one crucial aspect of audiology that seems beyond the near-term reach of automation. While VPAs may leverage language and acoustic models to function in simple use cases, and emotion detection may mature to reliably recognize extremes, the empathetic top-of-license counseling provided by audiologists ensures job security. Complex decision-making, based on subtle cues among a highly variable spectrum of patients, will keep audiologists in clinical practice for years to come.    
Aaron Jones, Au.D. is Senior Director, Product Management & Practice Development at Unitron. He can be contacted at
The AI 100: Artificial Intelligence Startups That You Better Know. (2019, February 06). Retrieved from

Amer, M. R., Siddiquie, B., Richey, C., & Divakaran, A. (2014, 4-9 May). Emotion detection in speech using deep networks. Paper presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’14), Florence, Italy.

Cook, T. (2017). Keynote. Apple Worldwide Developers Conference. Available at: (accessed 3 August 2019).

Cox, R. M., Johnson, J. A., & Xu, J. (2016). Impact of Hearing Aid Technology on Outcomes in Daily Life I: The Patients’ Perspective. Ear and hearing, 37(4), e224–e237.

Google Trends. (2019, July 1). Retrieved from

Gottfredson, L. S. (1994, December 13). Mainstream Science on Intelligence. The Wall Street Journal, p. A18.

Hayes, D. (2019). What’s the big deal with hearing instrument classifiers? (Unitron publication 1904-093-02). Kitchener, ON: Unitron.

McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (2006). A proposal for the dartmouth summer research project on artificial intelligence, august 31, 1955. AI magazine, 27(4), 12.

Moon, C., Lagerkrantz, H., & Kuhl, P. K. (2013). Language experienced in utero affects vowel perception after birth: a two-country study. Acta Paediatrica, 102(2), 156-160.

Ng, A. (2017, January 25). Personal interview during lecture at Stanford University Graduate School of Business.

Potamianos, G., Neti, C., Luettin, J., & Matthews, I. (2012). Audiovisual automatic speech recognition. In G. Bailly, P. Perrier, & E. Vatikiotis-Bateson (Eds.), Audiovisual Speech Processing (pp. 193-247). Cambridge: Cambridge University Press.

Proyas, A. Twentieth Century Fox Film Corporation. (2004). I, Robot.

Rakita L., & Jones C. (2015). Performance and Preference of an Automatic Hearing Aid System in Real-World Listening Environments. Hearing Review, 22(12), 28.

Schreier, J., Ford, C., Niederhoffer, G., Bisbee, S., Bisbee, J. K., Acord, L., Rifkin, D., ... Sony Pictures Home Entertainment. (2012). Robot & Frank.

Timmer, B., Hickson, L., & Launer, S. (2018). Do Hearing Aids Address Real-World Hearing Difficulties for Adults With Mild Hearing Impairment? Results From a Pilot Study Using Ecological Momentary Assessment. Trends in hearing, 22, 1-15.