Accuracy Affecting Factors for Optical Handwritten Character Recognition

Joonas L├Ámps
Optical character recognition (OCR) refers to a technique that converts images of typed, handwritten or printed text into machine-encoded text enabling automatic processing paper records such as passports, invoices, medical forms, receipts, etc. Pattern recognition, artificial intelligence and computer vision are all research fields that enable OCR. Using OCR on handwritten text could greatly benefit many of the emerging information systems by ensuring smooth transition from paper format to digital world. Nowadays, OCR has evolved into a multi-step process: segmentation, pre-processing, feature extraction, classification, post-processing and application-specific optimization. This thesis proposes techniques to improve the overall accuracy of the OCR systems by showing the affects of pre-processing, feature extraction and morphological processing. It also compares accuracies of different well-known and commonly used classifiers in the field. Using the proposed techniques an accuracy of over 98% was achieved. Also a dataset of handwritten Japanese Hiragana characters with a considerable variability was collected as a part of this thesis.
Graduation Thesis language
Graduation Thesis type
Master - Computer Science
Amnir Hadachi
Defence year
PDF Extras