GAIN A Comparison of Different Tokenization Methods for the Georgian Language at DuckDuckGo

aclanthology.org
Only include results for this site Hide site from these results
Share feedback about this site
ACL Anthology
https://aclanthology.org › 2024.icnlsp-1.22
A Comparison of Different Tokenization Methods for the Georgian Language
%0 Conference Proceedings %T A Comparison of Different Tokenization Methods for the Georgian Language %A Mikaberidze, Beso %A Saghinadze, Temo %A Mikaberidze, Guram %A Kalandadze, Raphael %A Pkhakadze, Konstantine %A van Genabith, Josef %A Ostermann, Simon %A van der Plas, Lonneke %A Müller, Philipp %Y Abbas, Mourad %Y Freihat, Abed Alhakim ...
Videos for GAIN A Comparison of Different Tokenization Methods for the Georgian Language
9:27
ICNLSP 2024: A Comparison of Different Tokenization Methods for the Georgian Language
20 views
YouTube2mo
2:30
What Is Tokenization (And Why You Need It)
19K views
YouTube2yr
6:17
How Does Tokenization Work - Introduction to Tokenization
61K views
YouTube7yr
9:28
Lexical Analyzer - Tokenization
97K views
YouTube2yr
9:41
Natural Language Processing|Tokenization
191K views
YouTube6yr
20:22
Natural Language Processing: Tokenization (Basic)
8.7K views
YouTube4yr
7:10
02 | Words: Types, Tokens, & Tokenization | TTIC 31190 (NLP) - Fall 2020
7.9K views
YouTube4yr
8:20
Unigram Tokenization
11K views
YouTube3yr
8:09
Tokenization 101 - Token Issuance Process
10K views
YouTube4yr
8:15
NLP Demystified 2: Text Tokenization
16K views
YouTube2yr
More Videos
More Videos
Was this helpful?
aclanthology.org
Only include results for this site Hide site from these results
Share feedback about this site
ACL Anthology
https://aclanthology.org › 2024.icnlsp-1.22.pdf
PDF A Comparison of Different Tokenization Methods for the Georgian Language
making informed tokenization choices in future language model developments for Georgian. 1 Introduction Tokenization is a fundamental process in most nat-ural language processing (NLP) tasks that involves breaking down a text into smaller units called to-kens. It is one of the first processes conducted in most approaches and is particularly ...
trails-dfki.github.io
Only include results for this site Hide site from these results
Share feedback about this site
trails-dfki.github.io
https://trails-dfki.github.io › publication › mikaberidze-etal-2024-comparison
A Comparison of Different Tokenization Methods for the Georgian Language
While the impact of tokenization on language modeling is well-researched in richly resourced languages, fewer studies on this topic exist for challenging low-resource languages. In this work, we present the first systematic evaluation of tokenization methods for Georgian, a low-resource language with high morphological complexity. We compare standard subword tokenizers, such as WordPiece, Byte ...
semanticscholar.org
Only include results for this site Hide site from these results
Share feedback about this site
Semantic Scholar
https://www.semanticscholar.org › paper › A-Comparison-of-Different-Tokenization-Methods-for-Mikaberidze-Saghinadze › 908676b78895326f115e2edfc049fbd8c19c980e
A Comparison of Different Tokenization Methods for the Georgian Language
This work presents the first systematic evaluation of tokenization methods for Georgian, a low-resource language with high morphological complexity, and evaluates the performance of all tokenizers on masked language modeling and on four downstream tasks: part-of-speech tagging, named entity recognition, toxicity detection, and sentiment analysis. While the impact of tokenization on language ...
dfki.de
Only include results for this site Hide site from these results
Share feedback about this site
DFKI
https://www.dfki.de › en › web › research › projects-and-publications › publication › 15399
A Comparison of Different Tokenization Methods for the Georgian Language
A Comparison of Different Tokenization Methods for the Georgian Language Beso Mikaberidze; Temo Saghinadze; Guram Mikaberidze; Raphael Kalandadze; Konstantine Pkhakadze; Josef van Genabith; Simon Ostermann; Lonneke van der Plas; Philipp Müller. In: Proceedings of the 7th International Conference on Natural Language and Speech Processing. ...
youtube.com
Only include results for this site Hide site from these results
Share feedback about this site
YouTube
https://www.youtube.com › watch?v=dOFDivYupGQ
ICNLSP 2024: A Comparison of Different Tokenization Methods for the ...
Nov 12, 2024A Comparison of Different Tokenization Methods for the Georgian LanguageBy:Beso Mikaberidze, Temo Saghinadze, Guram Mikaberidze, Raphael Kalandadze, Konstant...
Created by:ICNLSP Conference9 min
cordis.europa.eu
Only include results for this site Hide site from these results
Share feedback about this site
CORDIS
https://cordis.europa.eu › project › id › 101078950 › results
Georgian Artificial Intelligence Networking and Twinning Initiative
Mar 10, 2023GAIN. Grant agreement ID: 101078950 DOI 10.3030 ... A Comparison of Different Tokenization Methods for the Georgian Language. Author(s): B. Mikaberidze, T. Saghinadze, G ... Müller Published in: Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP -2024), 2024, ISBN 979-8 ...
trails-dfki.github.io
Only include results for this site Hide site from these results
Share feedback about this site
trails-dfki.github.io
https://trails-dfki.github.io › author › beso-mikaberidze
Beso Mikaberidze | TRAILS
Project site for the DFKI TRAILS project.
medium.com
Only include results for this site Hide site from these results
Share feedback about this site
Medium
https://medium.com › @anicomanesh › token-efficiency-and-compression-techniques-in-large-language-models-navigating-context-length-05a61283412b
Token Efficiency and Compression Techniques in Large Language ... - Medium
Oct 7, 2024This paper discusses the T5 model, explaining how tokenization and transfer learning affect language models, and it introduces methods for improving token efficiency. Paper link 3.
tnt.studio
Only include results for this site Hide site from these results
Share feedback about this site
tnt.studio
https://tnt.studio › the-essential-guide-to-tokenization-for-language-models
The Essential Guide to Tokenization for Large Language Models
Feb 22, 2024Why Tokenization Matters. Think of tokenization as the process of translating text into the language that large language models (LLMs) understand — numbers. Just like we need special tools to translate between different languages, tokenization is the bridge between human language and AI.
Can’t find what you’re looking for?
Help us improve DuckDuckGo searches with your feedback

Can’t find what you’re looking for?

See What’s DuckDuckNew