Taking a look at (some) tokenizers
Posted: | Categories: article | Tags: English
Recently I have been working on writing a tokenizer from scratch in Rust. In the process, I wanted to really understand the implementation of some commonly used tokenizers. Fun, I know. Moses Tokenizer Webpage http://www2.statmt.org/ and the implementation that I’ll talk about tokenizer.perl. If you already know Moses, you know. Silent nod. For everyone else, Moses is an NLP framework written in Perl, focused on statistical machine translation. It is super easy to install anywhere and everyone loves it because of it. Read more...