Radial Softmax: A Novel Activation Function for Neural Networks to Reduce Overconfidence in Out-Of-Distribution Data

Name
Rain Vagel
Abstract
Neural networks are used widely and give state-of-the-art results in fields such as machine translation, image classification and speech recognition. These networks operate under the assumption that they predict on data that originates from the same distribution, as the training data. If this is not the case then the model will output incorrect results, often with very high confidence. In this work we explain how the commonly used softmax is unable to mitigate these problems and propose a new function called radial softmax which might help to mitigate out-of-distribution (OOD) overconfidence issues. We show that radial softmax is capable of mitigating OOD overconfidence issues in almost all cases. Based on our literature review this is the first time an improvement to softmax has been proposed for this issue. We also showed that changes to the training cycle or intermediate activation functions are not needed. With this function it is possible to make the models more resistant to OOD data without modifications to the larger architecture or training cycles. By having models that we know are resistant to OOD data, we can be more confident in the model output and use them for applications where mistakes are unacceptable such as healthcare, the defence industry or autonomous driving.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Ardi Tampuu, Meelis Kull, Raul Vicente
Defence year
2020
 
PDF