The Next Steps

In the past few years, we've annotated a vocabulary list with proverbs, poetry and prose. We've trained a translator from 14,494 pairs of translated sentences. And now we have a small online dictionary for the 21st century.

But it's a small dictionary, not a big one.

I'd like to develop a better one, but I don't know how. I'm an economist, not a linguist. So I just did what an economist would do: I assembled a dataset. I hope it fills the need for an online dictionary until we can create a better one.

Get involved!

Help us develop a better dictionary. Write to me at: economics@doviak.net and let's find places where you can make difference.

Most readers of this page will not be linguists either. Some might not even know Sicilian. But if they want to help develop a Sicilian dictionary, then I want their help. And there's plenty that they can do to help.

For example, someone who only knows English could help assemble English language text for back-translation. As a general matter, we train a neural machine translation model with parallel text (i.e. pairs of translated sentences), but supplementing that parallel text with back-translations of monolingual text would also improve translation quality. (In this case, it would improve quality of translation into English).

Someone who only knows Sicilian could (similarly) assemble Sicilian language text for back-translation. And someone who knows both languages could help assemble parallel text. In general, assembling parallel text and/or monolingual text creates more examples for our translator to learn from.

We can also improve the user experience by linking the translator to the vocabulary. For example, if you use your mouse to highlight a word in Google Translate or Yandex Translate, their interface will show you a definition of the word. Our translator should provide the same feature.

To implement it, we need people who know Sicilian to add entries to the vocabulary list and supplement those entries with definitions and examples. And let those examples be poetry, proverbs and prose.

Of course, they would also need an interface where they can add entries and an interface where they can assemble sentences. And those interfaces would have to be connected to a database. So we also need help from people with website and database skills.

Importantly, the website needs a redesign. The website should use standard libraries, develop new libraries where necessary and separate concerns. In particular, creating a proper set of Perl modules would help separate concerns. For example, verb conjugation and webpage formatting are different concerns and should be handled separately.

Finally, we should add spell-checking tools to the vocabulary list and translator. And we should develop our word embedding models into a better explanation of how neural machine translation works.

I don't know how to do all of these things. I certainly have some ideas, but you may have another idea or a better idea or maybe you just want to help. If so, I would love to work with you.

Send me an email at: economics@doviak.net and let's find ways for you to get involved.

Copyright © 2018-2020 Eryk Wdowiak