Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Accuracy Affecting Factors for Optical Handwritten Character Recognition

Name

Joonas Lõmps

Abstract

Optical character recognition (OCR) refers to a technique that converts images of typed, handwritten or printed text into machine-encoded text enabling automatic processing paper records such as passports, invoices, medical forms, receipts, etc. Pattern recognition, artificial intelligence and computer vision are all research fields that enable OCR. Using OCR on handwritten text could greatly benefit many of the emerging information systems by ensuring smooth transition from paper format to digital world. Nowadays, OCR has evolved into a multi-step process: segmentation, pre-processing, feature extraction, classification, post-processing and application-specific optimization. This thesis proposes techniques to improve the overall accuracy of the OCR systems by showing the affects of pre-processing, feature extraction and morphological processing. It also compares accuracies of different well-known and commonly used classifiers in the field. Using the proposed techniques an accuracy of over 98% was achieved. Also a dataset of handwritten Japanese Hiragana characters with a considerable variability was collected as a part of this thesis.

Graduation Thesis language

English

Graduation Thesis type

Master - Computer Science

Supervisor(s)

Amnir Hadachi

Defence year

2018

PDF Extras

UT Institute of Computer Science Graduation Theses Registry

Accuracy Affecting Factors for Optical Handwritten Character Recognition