Automated Payment Classification in Retail Banking

Artem Mateush
Retail banks use special techniques to analise their customer data to achieve business goals or improve their service. Modern machine learning techniques can be utilised to augment the classic data analysis techniques in this field. The ability to classify payments of their customers enables retail banks to better understand their customers' expenditure patterns and to customize their offers accordingly.
\t\tPayment classification is a difficult problem because of the large and evolving set of businesses and the fact that each business may offer multiple types of products, e.g.\\ a business may sell both food and electronics. Two major approaches to payment classification are rule-based classification and machine learning-based classification. The classification machine learning technique is a variant of supervised learning, and, as such, it requires a labeled transaction set — in our case, transactions classified by the customers themselves (as a form of crowdsourcing). The rule-based approach is not scalable as it requires rules to be maintained for every business and type of transaction. The crowdsourcing approach leads to inconsistencies and is difficult to bootstrap since it requires a large number of customers to manually label their transactions for an extended period of time.
\t\tHere we present a case study at a financial institution in which a hybrid approach is employed. A set of rules is used to bootstrap a financial planner that allowed customers to view their transactions classified with respect to 66 categories, and to add labels to unclassified transactions or to re-label transactions. The crowdsourced labels, together with the initial rule set, are then used to train a machine learning model.
\t\tWe evaluated our model on real anonymised dataset, provided by the bank, which consists of wire transfers and card payments. In particular, for the wire transfer dataset, the hybrid approach increased the coverage of the rule-based system from 76.4\\% to 87.4\\% while replicating the crowdsourced labels with a mean AUC of 0.92, despite inconsistencies between crowdsourced labels.
\t\tThis improvement shows the viability of hybrid models proposed, and the positive evaluation result allows us to set up the integration of the hybrid model with the bank's systems.
Graduation Thesis language
Graduation Thesis type
Master - Software Engineering
Rajesh Sharma
Defence year