Decoding, Reranking, Evaluation, and Alignment for Machine Translation
DREAMT consists of code and data for students studying the basic algorithms behind statistical machine translation, the technology that drives popular web services like Google Translate, Bing Translator, and SDL FreeTranslation.
The best way to really understand how these algorithms work is to implement them yourself and see how they behave on real data. DREAMT provides you with data, simple baseline algorithms, and code to check the accuracy of your solution. You can run the baseline straight out of the box and try simple experiments within minutes. Then, to test your knowledge of textbook algorithms, you can implement them and see how they improve on our baselines. The textbook algorithms only require a few dozen lines of code at most, and you’ll understand them much better once you derive and implement them yourself.
But the sky’s the limit! Machine translation is hardly a solved problem. Can you do better than the textbook algorithms? Test out your NLP hacking skills and see how accurately you can solve these four challenges:
If you’re an instructor, or you’re simply curious, you can read more about why we created DREAMT.