Category: article

Taking a look at (some) tokenizers

Recently I have been working on writing a tokenizer from scratch in Rust. In the process, I wanted to really understand the implementation of some commonly used tokenizers. Fun, I know. Moses Tokenizer Webpage http://www2.statmt.org/ and the implementation that I’ll talk about tokenizer.perl. If you already know Moses, you know. Silent nod. For everyone else, Moses is an NLP framework written in Perl, focused on statistical machine translation. It is super easy to install anywhere and everyone loves it because of it. Read more...

An unhelpful summary of AMTA 2022

Last September I attended virtually the 15th biennial conference of the Association for Machine Translation in the Americas, A.K.A. AMTA 2022. There I presented an adaptation of my master’s thesis in collaboration with my tutor (shameless self-promotion here). Even though at the moment I am not enrolled in any course, he helped me with several revisions and provided the budget for attendance, so I wanted to thank him before I move on. Read more...