Arvutiteaduse instituut - lõputööde teemade register

Lõputööde teemad (sisestamine) Valminud lõputööd (sisestamine)

Aligning contextual vector spaces between different neural systems

Organisatsiooni nimi

TartuNLP

Kokkuvõte

The idea is to see if contextual token embeddings in the decoder in independently created machine translation models (e.g. Google Translate and Neurotõlge) have a similar topology, i.e. whether they can be converted between two systems. Motivation: If you can, you can (1) "read" the text with one MT system, then (2) convert the vectors of this encoder into vectors of another system, and (3) generate the output translation text with another MT system. If you can't convert these vectors, then a good exploratory master's thesis will come out :-) More widely the same exploration can be done for other encoders (BERT, GPT-2, etc.).

Lõputöö kaitsmise aasta

2022-2023

Juhendaja

Mark Fishel

Suhtlemiskeel(ed)

eesti keel, inglise keel

Nõuded kandideerijale

The skills you need to do this are basic linux, python, some pytorch, knowledge of the machine translation of the transformer/encoder-decoder type (or you can learn this as part of your thesis).

Tase

Magister

Märksõnad

#transformer #transformers #embeddings #alignment

Kandideerimise kontakt

Nimi

Mark Fišel

Tel

E-mail

fishel@ut.ee

TÜ arvutiteaduse instituudi lõputööde teemade register

Aligning contextual vector spaces between different neural systems

Kandideerimise kontakt