Acoustic model An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word.
AOLbyPhone AOLByPhone was an AOL interactive voice service that began in 2000.
Apptek Applications Technology (AppTek) is a U.S. software company specializing in human language technology, headquartered in McLean, Virginia.
Articulatory speech recognition Articulatory speech recognition means the recovery of speech (in forms of phonemes, syllables or words) from acoustic signals with the help of articulatory modeling or an extra input of articula...
AT&T FSM Library The AT&T FSM Library is a collection of Unix software tools for creating and manipulating finite state machines, specifically weighted finite-state acceptors and transducers.
Audio mining Audio mining is a technique by which the content of an audio signal can be automatically analysed and searched.
Audio-visual speech recognition Audio visual speech recognition (AVSR) is a technique that uses image processing capabilities in lip reading to aid speech recognition systems in recognizing undeterministic phones or giving pre...
Automated Lip Reading Automated Lip Reading (ALR) is a software technology developed by speech recognition expert Frank Hubner.
Buckeye Corpus The Buckeye Corpus of conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof.
CMU Sphinx CMU Sphinx, also called Sphinx in short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University.
Computerized Speech Lab The Computerized Speech Lab is a speech and signal processing computer workstation used for research and clinical speech therapy.
Direct Voice Input Direct Voice Input is a style of Human-Machine Interaction "HMI" in which the user makes voice commands to issue instructions to the machine.
Direct voice input Direct voice input is a style of human–machine interaction "HMI" in which the user makes voice commands to issue instructions to the machine.
Fluency Voice Technology Fluency Voice Technology was a company that developed and sold packaged speech recognition solutions for use in call centers.
Haskins Laboratories Haskins Laboratories is an independent, international, 501(c) non-profit corporation.
HTK (software) HTK (Hidden Markov Model Toolkit) is software toolkit for handling HMMs. It is mainly intended for speech recognition, but has been used in many other pattern recognition applications that emplo...
HTK Ltd HTK Limited is a software-as-a-service company that provides mobile phone messaging and IVR services.
IBM Shoebox The IBM shoebox was a 1961 IBM computer that was able to perform mathematical functions and perform speech recognition.
Interactions Corporation Interactions Corporation is a privately held technology company that builds and delivers hosted Virtual Assistant applications that enable businesses to deliver automated natural language comm...
Janus Recognition Toolkit (JRTk) Janus Recognition Toolkit (JRTk), sometimes referred to as Janus, is a general purpose speech recognition toolkit developed and maintained by the Interactive Systems Laboratories at Carnegie Mel...
Jott Jott is a voice-to-text transcription service which allows its users to call a toll-free telephone number and speak for up to 30 seconds.
JSGF JSGF stands for Java Speech Grammar Format or the JSpeech Grammar Format (in a W3C Note).
Keyword spotting Keyword spotting is a subfield of speech recognition that deals with the identification of keywords in utterances.
Kinect Kinect (codenamed in development as Project Natal) is a motion sensing input device by Microsoft for the Xbox 360 video game console and Windows PCs. Based around a webcam-style add-on...
LENA Foundation LENA Research Foundation is a developer of advanced technology to accelerate language development of children 0-5 and for research and treatment of language delays and disorders.
Lexical Markup Framework ISO 24613:2008, Language resource management - Lexical markup framework (LMF), is the ISO International Organization for Standardization ISO/TC37 standard for natural language processing (NL...
Logogen model The logogen model of 1969 is a model of speech recognition that uses units called "logogens" to explain how humans comprehend spoken or written words.
LumenVox LumenVox is a privately held speech recognition software company, based in San Diego, California.
MacSpeech Dictate MacSpeech Dictate was a speech recognition program developed for Mac OS X by MacSpeech.
MacSpeech Scribe MacSpeech Scribe is speech recognition software for Mac OS X designed specifically for transcription of recorded voice dictation.
Microsoft Voice Command Microsoft Voice Command is an application which can control Windows Mobile devices by voice.
Modular Audio Recognition Framework Modular Audio Recognition Framework (MARF) is an open-source research platform and a collection of voice, sound, speech, text and natural language processing (NLP) algorithms written in Ja...
Monowave Corp. Monowave is a small research company located in Seattle, Washington, primarily involved in research on machine speech recognition.
Motor theory of speech perception The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the so...
N-gram In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.
Natural language processing Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages.
NER model The NER model is a method for determining the accuracy of live subtitles in television broadcasts and events that are produced using speech recognition.
Nokia 5230 The Nokia 5230 Nuron is a low cost smartphone from Nokia that is manufactured in South Korea and USA and is famous and commonly used in America, Brazil, Germany, etc.
Nokia 5250 The Nokia 5250 is a budget Nokia resistive touchscreen smartphone running on the Symbian^1 operating system.
NooJ NooJ is a development environment used to construct large-coverage, formalized descriptions of natural languages and to apply them to large corpora in real time.
Nortel Speech Server The Nortel Speech Server (formerly known as Periphonics Speech Processing Platform) in telecommunications is a speech processing system that was developed by Nortel and is now sold by Avaya.
Pattern playback The Pattern Playback is an early talking device that was built by Dr. Franklin S. Cooper and his colleagues, including John M. Borst and Caryl Haskins, at Haskins Laboratories in the late 1940s ...
Plum Voice The Plum Group, Inc. is a company that provides interactive voice response platforms, systems and hosting services to developers and companies to automate call center and business processes over...
Proteus Conversational Interface Proteus Conversational Engine is a conversational interface system developed by Artificial Ingenuity, a research and development company in Arizona, USA. Example software and software component...
Quack.com Quack.com was an early voice portal company.
QuickFuse QuickFuse is a web-based telephony application editor and rapid application development platform.
Real time factor The real time factor (RTF) is a common metric for measuring the speed of an automatic speech recognition system.
Real-time transcription Realtime transcription is the general term for transcription by court reporters using Computer Aided Transcription ("CAT") technology to deliver computer text screens within a few seconds of the...
Realtime transcription Realtime transcription is the general term for transcription by court reporters using Computer Aided Transcription ("CAT") technology to deliver computer text screens within a few seconds of the...
ROSIDS ROSIDS Rapid Open Source Intelligence Deployment System, which timeshifts the video then processes speech-to-text through the SAIL LABS Technology automatic speech recognition and then hands the...
RWTH FSA Toolkit The RWTH FSA Toolkit is a highly efficient C++ library that handles finite state machines; in particular it deals with weighted and unweighted automata and transducers.
Sautrela Sautrela is a highly modular and pluggable open source framework focused on Speech Recognition and developed by the Software Technologies Working Group at the University of the Basque Country, Spain.
Sensory, Inc. Sensory, Inc. is a Santa Clara based company which develops and makes speech technologies on both hardware and software platforms for consumer products, offering IC and software-only solutions f...
Silent speech interface Silent speech interface is a device that allows speech communication without using the sound made when people vocalize their speech sounds.
SILVIA Symbolically Isolated Linguistically Variable Intelligence Algorithms, or more popularly known as SILVIA, is a core platform technology developed by Cognitive Code.
Speaker diarisation Speaker diarisation (or diarization) is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity.
Speech acquisition Speech acquisition or early language acquisition focuses on the development of spoken language by a child.
Speech analytics Speech analytics is the process of analyzing recorded calls to gather information, brings structure to customer interactions and exposes information buried in customer contact center interaction...
Speech Application Language Tags Speech Application Language Tags (SALT) is an XML based markup language that is used in HTML and XHTML pages to add voice recognition capabilities to web based applications.
Speech corpus A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions.
Speech processing Speech processing is the study of speech signals and the processing methods of these signals.
Stenomask A stenomask is a hand-held microphone built into a padded, sound-proof enclosure that fits over the speaker's mouth or nose and mouth.
Subvocal recognition Subvocal recognition (SVR) is the process of taking subvocalization and converting the detected results to a digital output aurally or text-based.
Telephonetics Telephonetics VIP is a software company that develops speech recognition and voice automation solutions.
Telesoft Technologies Telesoft Technologies is a privately held UK-based limited company specializing in telephony equipment for monitoring and media applications for fixed, wireless and IP networks.
Text simplification Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the gramma...
Trigram Trigrams are a special case of the N-gram, where N is 3.
Vocapia Research Vocapia Research, formerly Vecsys Research, is a high tech research and development company (R&D), developing technologies for multilingual, unconstrained speech-to-text transcription syste...
Voice activity detection Voice activity detection, also known as speech activity detection or speech detection, is a technique used in speech processing in which the presence or absence of human speech is de...
Voice command device A voice command device (VCD) is a device controlled by means of the human voice.
Voice Finger Voice Finger is a software tool for Windows Vista, Windows 7 and Windows 8 that enables users to control the mouse cursor and keyboard through speech recognition.
Voice Navigator The Voice Navigator was the first voice recognition device for command and control of a graphical user interface (Patent no. 5377303).
Voice recognition Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to text.
Voice Tag Voice tags are used in automated speech recognition in a voice command device, allowing the user to "speak" commands.
VoiceBox Technologies VoiceBox Technologies is a company focused on conversational speech recognition, search and information management.
VoxForge VoxForge is a free speech corpus and acoustic model repository for open source speech recognition engines.
VoxSigma VoxSigma is a speech recognition software suite developed by Vocapia Research for Unix-like x86 and x86-64 platforms.
Windows Speech Recognition Windows Speech Recognition is a speech recognition application included in Windows Vista, Windows 7 and Windows 8.
Word error rate Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system.