Me, lifting a cup of jujube tea.

Hi there, I’m Martín and this is my personal web thingy. I sometimes write about things that have caught my eye, most of which has to do with linguistics, computer science, both or neither.

You can contact me through any of the social links.

♦ ♦ ♦


Taking a look at (some) tokenizers

Recently I have been working on writing a tokenizer from scratch in Rust. In the process, I wanted to really understand the implementation of some commonly used tokenizers. Fun, I know. Moses Tokenizer Webpage and the implementation that I’ll talk about tokenizer.perl. If you already know Moses, you know. Silent nod. For everyone else, Moses is an NLP framework written in Perl, focused on statistical machine translation. It is super easy to install anywhere and everyone loves it because of it. Read more...

An unhelpful summary of AMTA 2022

Last September I attended virtually the 15th biennial conference of the Association for Machine Translation in the Americas, A.K.A. AMTA 2022. There I presented an adaptation of my master’s thesis in collaboration with my tutor (shameless self-promotion here). Even though at the moment I am not enrolled in any course, he helped me with several revisions and provided the budget for attendance, so I wanted to thank him before I move on. Read more...

Things I have read (I)

This is the first of a series of posts where I’ll write some impressions on the latest books that I have read. It started as a personal log, but since I tend to read a ton of book reviews and they are my main way of discovering new stuff, I reckon that someone who reads it may also get something out of it. The Grownup I’m currently in Seoul for a while, and my choices for new stuff to read are limited to second-hand bookstores that have an English section, so not much reading in Spanish will be done until I am back home. Read more...