Speech recognition
AOLbyPhone
AOLByPhone was an AOL interactive voice service that began in 2000.
AOLByPhone was an AOL interactive voice service that began in 2000.
Apptek
Applications Technology (AppTek) is a U.S. software company specializing in human language technology, headquartered in McLean, Virginia.
Applications Technology (AppTek) is a U.S. software company specializing in human language technology, headquartered in McLean, Virginia.
Articulatory speech recognition
Articulatory speech recognition means the recovery of speech (in forms of phonemes, syllables or words) from acoustic signals with the help of articulatory modeling or an extra input of articula...
Articulatory speech recognition means the recovery of speech (in forms of phonemes, syllables or words) from acoustic signals with the help of articulatory modeling or an extra input of articula...
AT&T FSM Library
The AT&T FSM Library is a collection of Unix software tools for creating and manipulating finite state machines, specifically weighted finite-state acceptors and transducers.
The AT&T FSM Library is a collection of Unix software tools for creating and manipulating finite state machines, specifically weighted finite-state acceptors and transducers.
Audio mining
Audio mining is a technique by which the content of an audio signal can be automatically analysed and searched.
Audio mining is a technique by which the content of an audio signal can be automatically analysed and searched.
Audio-visual speech recognition
Audio visual speech recognition (AVSR) is a technique that uses image processing capabilities in lip reading to aid speech recognition systems in recognizing undeterministic phones or giving pre...
Audio visual speech recognition (AVSR) is a technique that uses image processing capabilities in lip reading to aid speech recognition systems in recognizing undeterministic phones or giving pre...
Automated Lip Reading
Automated Lip Reading (ALR) is a software technology developed by speech recognition expert Frank Hubner.
Automated Lip Reading (ALR) is a software technology developed by speech recognition expert Frank Hubner.
Buckeye Corpus
The Buckeye Corpus of conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof.
The Buckeye Corpus of conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof.
Cache language model
A cache language model is a type of statistical language model that contains a cache component and that assigns relatively high probabilities to words or word sequences that occur elsewhere in a...
A cache language model is a type of statistical language model that contains a cache component and that assigns relatively high probabilities to words or word sequences that occur elsewhere in a...
CMU Sphinx
CMU Sphinx, also called Sphinx in short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University.
CMU Sphinx, also called Sphinx in short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University.
Computerized Speech Lab
The Computerized Speech Lab is a speech and signal processing computer workstation used for research and clinical speech therapy.
The Computerized Speech Lab is a speech and signal processing computer workstation used for research and clinical speech therapy.
Direct Voice Input
Direct Voice Input is a style of Human-Machine Interaction "HMI" in which the user makes voice commands to issue instructions to the machine.
Direct Voice Input is a style of Human-Machine Interaction "HMI" in which the user makes voice commands to issue instructions to the machine.
Direct voice input
Direct voice input is a style of human–machine interaction "HMI" in which the user makes voice commands to issue instructions to the machine.
Direct voice input is a style of human–machine interaction "HMI" in which the user makes voice commands to issue instructions to the machine.
Fluency Voice Technology
Fluency Voice Technology was a company that developed and sold packaged speech recognition solutions for use in call centers.
Fluency Voice Technology was a company that developed and sold packaged speech recognition solutions for use in call centers.
Haskins Laboratories
Haskins Laboratories is an independent, international, multidisciplinary community of researchers conducting basic research on spoken and written language.
Haskins Laboratories is an independent, international, multidisciplinary community of researchers conducting basic research on spoken and written language.
HTK (software)
HTK (Hidden Markov Model Toolkit) is software toolkit for handling HMMs. It is mainly intended for speech recognition, but has been used in many other pattern recognition applications that emplo...
HTK (Hidden Markov Model Toolkit) is software toolkit for handling HMMs. It is mainly intended for speech recognition, but has been used in many other pattern recognition applications that emplo...
HTK Ltd
HTK Limited is a software-as-a-service company that provides mobile phone messaging and IVR services.
HTK Limited is a software-as-a-service company that provides mobile phone messaging and IVR services.
IBM Shoebox
The IBM shoebox was a 1961 IBM computer that was able to perform mathematical functions and perform speech recognition.
The IBM shoebox was a 1961 IBM computer that was able to perform mathematical functions and perform speech recognition.
Interactions Corporation
Interactions Corporation is a privately held technology company that builds and delivers Virtual Assistant applications that enable businesses to deliver automated natural language communication...
Interactions Corporation is a privately held technology company that builds and delivers Virtual Assistant applications that enable businesses to deliver automated natural language communication...
Janus Recognition Toolkit (JRTk)
Janus Recognition Toolkit (JRTk), sometimes referred to as Janus, is a general purpose speech recognition toolkit developed and maintained by the Interactive Systems Laboratories at Carnegie Mel...
Janus Recognition Toolkit (JRTk), sometimes referred to as Janus, is a general purpose speech recognition toolkit developed and maintained by the Interactive Systems Laboratories at Carnegie Mel...
JSGF
JSGF stands for Java Speech Grammar Format or the JSpeech Grammar Format (in a W3C Note).
JSGF stands for Java Speech Grammar Format or the JSpeech Grammar Format (in a W3C Note).
Julius (software)
Julius is an open source speech recognition engine.
Julius is an open source speech recognition engine.
Keyword spotting
Keyword spotting is a subfield of speech recognition that deals with the identification of keywords in utterances.
Keyword spotting is a subfield of speech recognition that deals with the identification of keywords in utterances.
Kinect
Kinect is a motion sensing input device by Microsoft for the Xbox 360 video game console and Windows PCs. Based around a webcam-style add-on peripheral for the Xbox 360 console, it enables users...
Kinect is a motion sensing input device by Microsoft for the Xbox 360 video game console and Windows PCs. Based around a webcam-style add-on peripheral for the Xbox 360 console, it enables users...
LENA Foundation
LENA Foundation is a developer of advanced technology for the early screening, research, and treatment of language delays and disorders in young children.
LENA Foundation is a developer of advanced technology for the early screening, research, and treatment of language delays and disorders in young children.
Lexical Markup Framework
ISO 24613:2008, Language resource management - Lexical markup framework, is the ISO International Organization for Standardization ISO/TC37 standard for natural language processing and machi...
ISO 24613:2008, Language resource management - Lexical markup framework, is the ISO International Organization for Standardization ISO/TC37 standard for natural language processing and machi...
Logogen model
The logogen model of 1969 is a model of speech recognition that uses units called logogens to explain how humans comprehend spoken or written words.
The logogen model of 1969 is a model of speech recognition that uses units called logogens to explain how humans comprehend spoken or written words.
LumenVox
LumenVox is a privately-held speech recognition software company, based in San Diego, California.
LumenVox is a privately-held speech recognition software company, based in San Diego, California.
MacSpeech Dictate
MacSpeech Dictate was a speech recognition program developed for Mac OS X by MacSpeech.
MacSpeech Dictate was a speech recognition program developed for Mac OS X by MacSpeech.
MacSpeech Scribe
MacSpeech Scribe is speech recognition software for Mac OS X designed specifically for transcription of recorded voice dictation.
MacSpeech Scribe is speech recognition software for Mac OS X designed specifically for transcription of recorded voice dictation.
Microsoft Voice Command
Microsoft Voice Command is an application which can control Windows Mobile devices by voice.
Microsoft Voice Command is an application which can control Windows Mobile devices by voice.
Modular Audio Recognition Framework
Modular Audio Recognition Framework (MARF) is an open-source research platform and a collection of voice, sound, speech, text and natural language processing (NLP) algorithms written in Java and...
Modular Audio Recognition Framework (MARF) is an open-source research platform and a collection of voice, sound, speech, text and natural language processing (NLP) algorithms written in Java and...
Monowave Corp.
Monowave is a small research company located in Seattle, Washington, primarily involved in research on machine speech recognition.
Monowave is a small research company located in Seattle, Washington, primarily involved in research on machine speech recognition.
Motor theory of speech perception
The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the so...
The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the so...
N-gram
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.
Natural language processing
Natural language processing is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human languages.
Natural language processing is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human languages.
Nokia 5230
The Nokia 5230 Nuron is a low-cost smartphone from Nokia that is manufactured in South Korea and is famous and commonly used in India, Brazil, Indonesia, etc.
The Nokia 5230 Nuron is a low-cost smartphone from Nokia that is manufactured in South Korea and is famous and commonly used in India, Brazil, Indonesia, etc.
Nokia 5250
The Nokia 5250 is a budget Nokia resistive touchscreen smartphone running on the Symbian^1 operating system.
The Nokia 5250 is a budget Nokia resistive touchscreen smartphone running on the Symbian^1 operating system.
Nokia 5800 XpressMusic
Nokia 5800 XpressMusic is a smartphone and portable entertainment device by Nokia.
Nokia 5800 XpressMusic is a smartphone and portable entertainment device by Nokia.
Nokia C5-03
The Nokia C5-03 is a budget resistive touchscreen smartphone with WLAN from the Cseries that was released in December 2010.
The Nokia C5-03 is a budget resistive touchscreen smartphone with WLAN from the Cseries that was released in December 2010.
Nokia E75
The Nokia E75 is a smartphone from the Eseries range with a side sliding QWERTY keyboard and also front keypad.
The Nokia E75 is a smartphone from the Eseries range with a side sliding QWERTY keyboard and also front keypad.
Non-native speech database
A non-native speech database is a speech database of non-native pronunciations of English.
A non-native speech database is a speech database of non-native pronunciations of English.
NooJ
NooJ is a development environment used to construct large-coverage, formalized descriptions of natural languages and to apply them to large corpora in real time.
NooJ is a development environment used to construct large-coverage, formalized descriptions of natural languages and to apply them to large corpora in real time.
Nortel Speech Server
The Nortel Speech Server (formerly known as Periphonics Speech Processing Platform) in telecommunications is a speech processing system that was developed by Nortel and is now sold by Avaya.
The Nortel Speech Server (formerly known as Periphonics Speech Processing Platform) in telecommunications is a speech processing system that was developed by Nortel and is now sold by Avaya.
Pattern playback
The Pattern playback is an early talking device that was built by Dr. Franklin S. Cooper and his colleagues, including John M. Borst and Caryl Haskins, at Haskins Laboratories in the late 1940s ...
The Pattern playback is an early talking device that was built by Dr. Franklin S. Cooper and his colleagues, including John M. Borst and Caryl Haskins, at Haskins Laboratories in the late 1940s ...
Phonetic search technology
Phonetic Search Technology (PST) is a method of speech recognition.
Phonetic Search Technology (PST) is a method of speech recognition.
Plum Voice
The Plum Group, Inc. (DBA Plum Voice) is a company that provides interactive voice response platforms, systems and hosting services to developers and companies to automate call center and busine...
The Plum Group, Inc. (DBA Plum Voice) is a company that provides interactive voice response platforms, systems and hosting services to developers and companies to automate call center and busine...
Proteus Conversational Interface
Proteus Conversational Engine is a conversational interface system developed by Artificial Ingenuity, a research and development company in Arizona, USA. Example software and software component...
Proteus Conversational Engine is a conversational interface system developed by Artificial Ingenuity, a research and development company in Arizona, USA. Example software and software component...
Real time factor
The real time factor (RTF) is a common metric of measuring the speed of an automatic speech recognition system.
The real time factor (RTF) is a common metric of measuring the speed of an automatic speech recognition system.
Real-time transcription
Realtime transcription is the general term for transcription by court reporters using Computer Aided Transcription ("CAT") technology to deliver computer text screens within a few seconds of the...
Realtime transcription is the general term for transcription by court reporters using Computer Aided Transcription ("CAT") technology to deliver computer text screens within a few seconds of the...
Realtime transcription
Realtime transcription is the general term for transcription by court reporters using Computer Aided Transcription ("CAT") technology to deliver computer text screens within a few seconds of the...
Realtime transcription is the general term for transcription by court reporters using Computer Aided Transcription ("CAT") technology to deliver computer text screens within a few seconds of the...
ROSIDS
ROSIDS Rapid Open Source Intelligence Deployment System, which timeshifts the video then processes speech-to-text through the SAIL LABS Technology automatic speech recognition and then hands the...
ROSIDS Rapid Open Source Intelligence Deployment System, which timeshifts the video then processes speech-to-text through the SAIL LABS Technology automatic speech recognition and then hands the...
RWTH FSA Toolkit
The RWTH FSA Toolkit is a highly efficient C++ library that handles finite state machines; in particular it deals with weighted finite-state acceptors and transducers.
The RWTH FSA Toolkit is a highly efficient C++ library that handles finite state machines; in particular it deals with weighted finite-state acceptors and transducers.
Sautrela
Sautrela is a highly modular and pluggable open source framework focused on Speech Recognition and developed by the Software Technologies Working Group at the University of the Basque Country, Spain.
Sautrela is a highly modular and pluggable open source framework focused on Speech Recognition and developed by the Software Technologies Working Group at the University of the Basque Country, Spain.
Silent speech interface
Silent speech interface is a device that allows speech communication without using the sound made when people vocalize their speech sounds.
Silent speech interface is a device that allows speech communication without using the sound made when people vocalize their speech sounds.
Speaker diarisation
Speaker diarisation (or diarization) is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity.
Speaker diarisation (or diarization) is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity.
Spectral modeling synthesis
Spectral Modeling Synthesis or simply SMS is an Acoustic modeling approach for speech and other signals.
Spectral Modeling Synthesis or simply SMS is an Acoustic modeling approach for speech and other signals.
Speech analytics
Speech analytics is a term used to describe automatic methods of analyzing speech to extract useful information about the speech content or the speakers.
Speech analytics is a term used to describe automatic methods of analyzing speech to extract useful information about the speech content or the speakers.
Speech Application Language Tags
Speech Application Language Tags (SALT) is an XML based markup language that is used in HTML and XHTML pages to add voice recognition capabilities to web based applications.
Speech Application Language Tags (SALT) is an XML based markup language that is used in HTML and XHTML pages to add voice recognition capabilities to web based applications.
Speech corpus
A speech corpus is a database of speech audio files and text transcriptions.
A speech corpus is a database of speech audio files and text transcriptions.
Speech processing
Speech processing is the study of speech signals and the processing methods of these signals.
Speech processing is the study of speech signals and the processing methods of these signals.
Speech recognition
In Computer Science, speech recognition is the translation of spoken words into text.
In Computer Science, speech recognition is the translation of spoken words into text.
Speech recognition in Linux
There are currently several speech recognition software packages for GNU/Linux, some of them are open-source and others proprietary software.
There are currently several speech recognition software packages for GNU/Linux, some of them are open-source and others proprietary software.
Speech repetition
Speech repetition is the saying by one individual of the spoken vocalizations made by another individual.
Speech repetition is the saying by one individual of the spoken vocalizations made by another individual.
Speech verification
Speech verification uses speech recognition to verify the correctness of the pronounced speech.
Speech verification uses speech recognition to verify the correctness of the pronounced speech.
SpeechCycle
SpeechCycle is a company located in New York City that develops technology which enables Rich Phone Applications (RPA).
SpeechCycle is a company located in New York City that develops technology which enables Rich Phone Applications (RPA).
SpeechMagic
SpeechMagic is an industrial grade platform for capturing information in a digital format.
SpeechMagic is an industrial grade platform for capturing information in a digital format.
SpeechWeb
A SpeechWeb is a collection of hyperlinked speech applications, accessed remotely by speech browsers running on end-user devices.
A SpeechWeb is a collection of hyperlinked speech applications, accessed remotely by speech browsers running on end-user devices.
SpeechWorks
SpeechWorks was a company founded in the late 1990s in Boston that developed and supported speech-related computer software.
SpeechWorks was a company founded in the late 1990s in Boston that developed and supported speech-related computer software.
Spoken dialog system
A Spoken dialog system is a dialog system delivered through voice.
A Spoken dialog system is a dialog system delivered through voice.
Subvocal recognition
Subvocal recognition (SVR) is the process of taking subvocalization and converting the detected results to a digital text-based output.
Subvocal recognition (SVR) is the process of taking subvocalization and converting the detected results to a digital text-based output.
Telephonetics
Telephonetics VIP is a software company that develops speech recognition and voice automation solutions.
Telephonetics VIP is a software company that develops speech recognition and voice automation solutions.
Telesoft Technologies
Telesoft Technologies (founded 1989, TsT) is a privately held UK-based limited company specializing in telephony equipment for monitoring and media applications for fixed, wireless and IP networks.
Telesoft Technologies (founded 1989, TsT) is a privately held UK-based limited company specializing in telephony equipment for monitoring and media applications for fixed, wireless and IP networks.
Text simplification
Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the gramma...
Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the gramma...
Time-inhomogeneous hidden Bernoulli model
Time-inhomogeneous hidden Bernoulli model (TI-HBM) is an alternative to hidden Markov model (HMM) for automatic speech recognition.
Time-inhomogeneous hidden Bernoulli model (TI-HBM) is an alternative to hidden Markov model (HMM) for automatic speech recognition.
TIMIT
TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects.
TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects.
Transcription (software)
Transcription software is software which assists in the conversion of human speech into a text transcript.
Transcription software is software which assists in the conversion of human speech into a text transcript.
Trigram
Trigrams are a special case of the N-gram, where N is 3.
Trigrams are a special case of the N-gram, where N is 3.
Vocapia Research
Vocapia Research, formerly Vecsys Research, is a high tech research and development company (R&D), developing technologies for multilingual, unconstrained speech-to-text transcription syste...
Vocapia Research, formerly Vecsys Research, is a high tech research and development company (R&D), developing technologies for multilingual, unconstrained speech-to-text transcription syste...
Voice activity detection
Voice activity detection, also known as speech activity detection or speech detection, is a technique used in speech processing in which the presence or absence of human speech is de...
Voice activity detection, also known as speech activity detection or speech detection, is a technique used in speech processing in which the presence or absence of human speech is de...
Voice command device
A voice command device is a device controlled by means of the human voice.
A voice command device is a device controlled by means of the human voice.
Voice Finger
Voice Finger is a software tool for Windows Vista and Windows 7 that enables users to control the mouse cursor and keyboard through speech recognition.
Voice Finger is a software tool for Windows Vista and Windows 7 that enables users to control the mouse cursor and keyboard through speech recognition.
Voice Navigator
The Voice Navigator was the first voice recognition device for command and control of a graphical user interface.
The Voice Navigator was the first voice recognition device for command and control of a graphical user interface.
Voice recognition
Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to text.
Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to text.
Voice Tag
Voice tags are used in automated speech recognition in a voice command device, allowing the user to "speak" commands.
Voice tags are used in automated speech recognition in a voice command device, allowing the user to "speak" commands.
VoiceBox Technologies
VoiceBox is a company focused on conversational speech recognition, search and information management.
VoiceBox is a company focused on conversational speech recognition, search and information management.
VoxForge
VoxForge is a free speech corpus and acoustic model repository for open source speech recognition engines.
VoxForge is a free speech corpus and acoustic model repository for open source speech recognition engines.
VoxSigma
VoxSigma is a speech recognition software suite developed by Vocapia Research for Unix-like x86 and x86-64 platforms.
VoxSigma is a speech recognition software suite developed by Vocapia Research for Unix-like x86 and x86-64 platforms.
Windows Speech Recognition
Windows Speech Recognition is a speech recognition application included in Windows Vista and more recently, Windows 7.
Windows Speech Recognition is a speech recognition application included in Windows Vista and more recently, Windows 7.
Word error rate
Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system.
Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system.
Settings