Open Source translator (instead of Google translate)

There are bunch of Open Source projects that try to provide automatic text translation among several languages.

One of the interesting ones is Libre Translate (GitHub) that relies on Argos Translate (GitHub)

It supports several languages, but if your language is not supported yet you can request a language. I just asked for Ladino support.

As I understand, in order to have another language added one needs to train a model, but I have not figured out yet what is the input of that training. It seems the Opus project collects texts that have some open license. Some of these texts are in Moses format - that means two files in which in each row are translations of the corresponding line in the other file.

You can find data-source in a lot of languages even in Ladino, but the dataset is very small in Ladino.

Anyway, now I have a basic understanding of what kind of corpus I need to find or generate to be used as input.

Published on 2022-03-27 by Gabor Szabo