Multihead Attention Enhanced Memory Augmented Neural Network for Multimodal Trajectory Prediction

Name
Farhan Syakir
Abstract
Autonomous driving has gathered an increased interest over the last two decades. One of the problems in autonomous driving that the researchers are actively trying to solve is agent trajectory prediction. The trajectory prediction is the problem of predicting future trajectories of surrounding agents such as other cars, cyclists, pedestrians, and any other road users around an autonomous vehicle. Deep learning has shown promising results in tackling the problem. There are various deep learning approaches addressed to the problem, and one of the approaches is using Memory Augmented Neural Network (MANN) and multi-head attention layer. Memory augmented neural networks in multimodal trajectory prediction have been proposed in the literature to address trajectory prediction (in a model called memory augmented networks for multiple trajectory prediction or MANTRA), but they do not use multi-head attention layers. Meanwhile, multi-head attention layers have also been investigated in the literature but in different contexts within this research topic.
In this work we proposed two models which both employ multi-head attention layers to the memory augmented neural network model. We name the models Multihead Attention Enhanced MANN (MAEMANN) 1 and MAEMANN-2. Similar to MANTRA, MAEMANN uses AutoEncoder, Memory Controller, and iterative refinement module (IRM). While the AutoEncoder and Memory Controller is responsible for memory, the IRM compiles the output from the memory and input from the surrounding agents in the environment. The MAEMANN-1 uses the multi-head self-attention layer in the memory network to improve predicting future trajectory by giving attention to the multiple neighboring memories, while MAEMANN-2 uses the multi-head attention in IRM to improve perceiving surrounding agents. Our experimental results showed that both MAEMANNs (i.e. models 1 and 2) outperform the MANTRA model, when tested on the Kitti dataset, where we predict 4 seconds future trajectory given 2 seconds past. In the multimodal prediction where the number of modes is 5, the MAEMANN-1 improves the Final Displacement Error (FDE) and Average Displacement Error (ADE) at t = 4 seconds by 10.58 % and 9.24%. Meanwhile, for MAEMANN-2, the improvements for FDE and ADE are 14.39% and 13.47%.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Naveed Muhammad, Yar Muhammad
Defence year
2021
 
PDF Extras