lib.rs
Nov 27, 2024More details about how to use the Normalizers are available on the Hugging Face blog; The PreTokenizer: in charge of creating initial words splits in the text. The most common way of splitting text is simply on whitespace. The Model: in charge of doing the actual tokenization. An example of a Model would be BPE or WordPiece.