Corpus linguistics
British National Corpus
The British National Corpus is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources.
The British National Corpus is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources.
Cambridge English Corpus
The Cambridge English Corpus is a multi-billion word corpus of English language.
The Cambridge English Corpus is a multi-billion word corpus of English language.
Cline (linguistics)
In linguistics, a cline is a scale of continuous gradation.
In linguistics, a cline is a scale of continuous gradation.
Collocation
In corpus linguistics, collocation defines a sequence of words or terms that co-occur more often than would be expected by chance.
In corpus linguistics, collocation defines a sequence of words or terms that co-occur more often than would be expected by chance.
Concordancer
A concordancer is a computer program that automatically constructs a concordance.
A concordancer is a computer program that automatically constructs a concordance.
Corpora (journal)
Corpora is a twice-yearly peer-reviewed linguistic academic journal that publishes scholarly articles and book reviews on corpus linguistics, with a focus on corpus construction and corpus t...
Corpora is a twice-yearly peer-reviewed linguistic academic journal that publishes scholarly articles and book reviews on corpus linguistics, with a focus on corpus construction and corpus t...
Corpus linguistics
Corpus linguistics is the study of language as expressed in samples or "real world" text.
Corpus linguistics is the study of language as expressed in samples or "real world" text.
Corpus Linguistics and Linguistic Theory (journal)
Corpus Linguistics and Linguistic Theory is a peer-reviewed linguistic academic journal that publishes scholarly articles, squibs, and book reviews on corpus linguistics, with a focus on cor...
Corpus Linguistics and Linguistic Theory is a peer-reviewed linguistic academic journal that publishes scholarly articles, squibs, and book reviews on corpus linguistics, with a focus on cor...
Corpus-assisted discourse studies
Corpus-assisted discourse studies, or CADS, is related historically and methodologically to the discipline of corpus linguistics.
Corpus-assisted discourse studies, or CADS, is related historically and methodologically to the discipline of corpus linguistics.
David G. Hays
David Glenn Hays (November 17, 1928 – July 26, 1995) was a linguist, computer scientist and social scientist best known for his early work in machine translation and computational linguistics.
David Glenn Hays (November 17, 1928 – July 26, 1995) was a linguist, computer scientist and social scientist best known for his early work in machine translation and computational linguistics.
Enron Corpus
The Enron Corpus is a large database of over 600,000 emails generated by 158 employees of the Enron Corporation and acquired by the Federal Energy Regulatory Commission during its investigation ...
The Enron Corpus is a large database of over 600,000 emails generated by 158 employees of the Enron Corporation and acquired by the Federal Energy Regulatory Commission during its investigation ...
Extended Affix Grammar
In computer science, Extended Affix Grammars are a formal grammar formalism for describing the context free and context sensitive syntax of language, both natural language and programming lang...
In computer science, Extended Affix Grammars are a formal grammar formalism for describing the context free and context sensitive syntax of language, both natural language and programming lang...
FrameNet
FrameNet is a project housed at the International Computer Science Institute in Berkeley, California which produces an electronic resource based on semantic frames.
FrameNet is a project housed at the International Computer Science Institute in Berkeley, California which produces an electronic resource based on semantic frames.
German Reference Corpus
The German Reference Corpus (original: Deutsches Referenzkorpus; short: DeReKo) is an electronic archive of text corpora of contemporary written German.
The German Reference Corpus (original: Deutsches Referenzkorpus; short: DeReKo) is an electronic archive of text corpora of contemporary written German.
Global Language Monitor
The Global Language Monitor (GLM) is an Austin, Texas-based company that collectively documents, analyzes and tracks trends in language usage worldwide, with a particular emphasis upon the...
The Global Language Monitor (GLM) is an Austin, Texas-based company that collectively documents, analyzes and tracks trends in language usage worldwide, with a particular emphasis upon the...
Hapax legomenon
A hapax legomenon is a word which occurs only once within a context, either in the written record of an entire language, in the works of an author, or just in a single text.
A hapax legomenon is a word which occurs only once within a context, either in the written record of an entire language, in the works of an author, or just in a single text.
International Journal of Corpus Linguistics
The International Journal of Corpus Linguistics is a quarterly peer-reviewed linguistic academic journal that publishes scholarly articles and book reviews on corpus linguistics, with a focu...
The International Journal of Corpus Linguistics is a quarterly peer-reviewed linguistic academic journal that publishes scholarly articles and book reviews on corpus linguistics, with a focu...
Keyness
Keyness is a term used in linguistics to describe the quality a word or phrase has of being "key" in its context.
Keyness is a term used in linguistics to describe the quality a word or phrase has of being "key" in its context.
Keyword (linguistics)
In corpus linguistics a key word is a word which occurs in a text more often than we would expect to occur by chance alone.
In corpus linguistics a key word is a word which occurs in a text more often than we would expect to occur by chance alone.
Language and Computers
Language and Computers: Studies in Practical Linguistics is a book series on corpus linguistics and related areas.
Language and Computers: Studies in Practical Linguistics is a book series on corpus linguistics and related areas.
Linguistic Data Consortium
The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories.
The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories.
MAREC
The MAtrixware REsearch Collection (MAREC) is a standardised patent data corpus available for research purposes.
The MAtrixware REsearch Collection (MAREC) is a standardised patent data corpus available for research purposes.
N-gram
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.
National Corpus of Polish
The National Corpus of Polish is the biggest and the most important corpus of the Polish language.
The National Corpus of Polish is the biggest and the most important corpus of the Polish language.
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging, also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a pa...
In corpus linguistics, part-of-speech tagging, also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a pa...
PropBank
PropBank is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank".
PropBank is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank".
Semantic prosody
Semantic prosody, also discourse prosody, describes the way in which certain seemingly neutral words can be perceived with positive or negative associations through frequent occurrences wi...
Semantic prosody, also discourse prosody, describes the way in which certain seemingly neutral words can be perceived with positive or negative associations through frequent occurrences wi...
Speech corpus
A speech corpus is a database of speech audio files and text transcriptions.
A speech corpus is a database of speech audio files and text transcriptions.
Stefan Th. Gries
Stefan Th. Gries (born 1970) is (Full) Professor of Linguistics in the Department of Linguistics at the University of California, Santa Barbara (UCSB).
Stefan Th. Gries (born 1970) is (Full) Professor of Linguistics in the Department of Linguistics at the University of California, Santa Barbara (UCSB).
Survey of English Usage
The Survey of English Usage was the first research centre in Europe to carry out research with corpora.
The Survey of English Usage was the first research centre in Europe to carry out research with corpora.
Tatoeba
Tatoeba.org is a free online database of example sentences geared towards foreign language learners.
Tatoeba.org is a free online database of example sentences geared towards foreign language learners.
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts.
In linguistics, a corpus or text corpus is a large and structured set of texts.
Treebank
A treebank or parsed corpus is a text corpus in which each sentence has been parsed, i.e. annotated with syntactic structure.
A treebank or parsed corpus is a text corpus in which each sentence has been parsed, i.e. annotated with syntactic structure.
VerbNet
The VerbNet project maps PropBank verb types to their corresponding Levin classes.
The VerbNet project maps PropBank verb types to their corresponding Levin classes.
WordSmith
WordSmith Tools is a collection of corpus linguistics tools for looking for patterns in a language.
WordSmith Tools is a collection of corpus linguistics tools for looking for patterns in a language.
Yarowsky algorithm
In computational linguistics the Yarowsky algorithm is an unsupervised learning algorithm for word sense disambiguation that uses the "one sense per collocation" and the "one sense per discourse...
In computational linguistics the Yarowsky algorithm is an unsupervised learning algorithm for word sense disambiguation that uses the "one sense per collocation" and the "one sense per discourse...
Ōno's lexical law
Ōno's lexical law, or simply Ōno's law, is a statistical law for the rate word classes as they appear in the lexicon classical Japanese literary works.
Ōno's lexical law, or simply Ōno's law, is a statistical law for the rate word classes as they appear in the lexicon classical Japanese literary works.
Settings