Sicilian Translator

To show the progress that we are making in the development of a machine translator for the Sicilian language, we put an experimental translator online.

It does not produce good translations yet.

It is an experiment. Its purpose is to test the methodologies that we are using to develop a neural machine translator.

The experiment shows that we will succeed in our goal of creating a good translator for the Sicilian language once we have assembled enough parallel text. That is a very time-consuming task, so please be patient.

The machine "learns" through a process of trial and error. First, it predicts a translation. Then it compares its prediction to the correct translation and adjusts the model parameters in the direction that most reduces the errors.

In other words, it needs to make a lot of mistakes before it begins translating properly.

And currently, we do not have a dataset large enough for it to make enough mistakes that yield a good translator. Our dataset only has 57,000 words of parallel training data and we need at least 100,000 to obtain a good translator. (Probably far more than that).

We are assembling the dataset from issues of Arba Sicula and from Arthur Dieli's translations of Pitrè's Fables and Sicilian proverbs. They have been very helpful to us and we thank them for their support and encouragement.

We look forward to putting a good quality translator online soon.

In the meantime, it correctly translates some of the phrases that appear frequently in the dataset:

And we hope you will be amused when it does not translate correctly.

The unexpectedly bizarre translations that frequently appear are normal at this stage. The translator will keep producing gibberish until we assemble more of parallel text.

For example, Koehn and Knowles (2017) used varying amounts of parallel text to train several English-to-Spanish models. Below is a table from their paper:

Translation Quality Improves with More Parallel Text

The fractions in the left column are the fraction of the 386 million words provided by the ACL 2013 workshop. At low amounts of parallel text, the model produces fluent sentences that are completely unrelated to the source sentence. But as the amount of parallel text increases, the translation becomes perfect.

And a recent paper by Sennrich and Zhang (2019) suggests that the method of subword splitting will enable us to create a good translator with a few hundred thousand words (i.e. far less than the millions that Koehn and Knowles needed).

So please do not be discouraged by the strange translations that our models currently produce. It's normal at this stage. The translation quality will improve as we assemble more parallel text.

how to use the translator

Just type the sentence that you want to translate into the input box, select the appropriate direction (i.e. either "Sicilian-English" or "English-Sicilian") and press the "translate" button.

For best results when translating from Sicilian to English, use the standard Sicilian forms. For example, use dici (not rici), use bedda (not bella), etc. And do not use apostrophes in the place of the elided i. For example, use mparamu (not 'mparamu), use nzignamunni (not 'nzignamunni), etc.

You do not need any special keyboard. A standard American or Italian keyboard should work fine because – with the exception of è and – you do not need to use accents at all.

If you're using an American keyboard, you can type the word è as e'. And you can type the word as si'.

Or if you're using an Italian keyboard, just type as you normally would. The translator will automatically perform the appropriate conversions to any accented letter that you type.

frequently asked questions

Why doesn't it translate properly?

The current translator was trained on a dataset with only 57,000 words. For good results, we need a dataset with well over 100,000 words and developing that larger dataset takes time.

This is an experimental product designed to test the methodologies that we will use once we have a larger dataset. It is not ready for serious translation yet.

Will it ever translate properly?

Yes! Once we assemble a large enough dataset, the translation quality will be very good.

Do I need a special keyboard to type Sicilian letters?

No. You can type Sicilian words without using any accents at all. So if you have a standard American keyboard, go ahead and use it. The only two words which require an accent are è and which you can type as e' and si' respectively. (In other words, when typing those two words, just add an apostrophe to the end).

Or if you have an Italian keyboard, go ahead and use it. The translator will automatically perform the appropriate conversions to any accented letter that you type.

How did you create this translator?

With neural machine translation, a form of artificial intelligence which "learns" how to translate by examining thousands of sentences that humans translated. For the details and source code, please see the machine translation page.

Copyright © 2018-2019 Eryk Wdowiak