Summary
- In a group of two students, we analyzed a baseline machine translation model for German to English. We both analyzed the code and evaluated the performance using the BLEU score and other metrics.
- Then, in order to improve the performance, we implemented the lexical attention model as described by Nguyen and Chiang (2017).
- Furthermore, we analyzed a basic implementation of the Transformer architecture and implemented the multi-head attention mechanism according to Vaswani et al. (2017). We worked with PyTorch.
- We also spend a good amount of time analyzing and optimizing the training data.
Main learnings
- I learned to transform machine-learning papers into a working implementations. This requires understanding the papers in detail and transforming formulas and diagrams into code.
- In addition, I learned to explore and extend an existing codebase.
- Furthermore, I gained a quite deep understanding of transformers, multi-head attention. Additionally, I learned about related topics such as beam search.