Development of a robust camera-based text recognition model for the visual impaired.

Nwokoma,Francisca Onyinyechi2025-10-312025-10-312022-12Nwokoma, F. O. (2022). Development of a robust camera-based text recognition model for the visual impaired. {Unpublished Doctoral Thesis), Federal University of Technology, Owerri.https://repository.futo.edu.ng/handle/20.500.14562/2250Doctoral thesis on "robust camera-based text recognition model". It contains diagrams, pictures and tables.The quest to bridge the digital divide in this world of fast growing Information and Communication Technology should not only be restricted to some domains but should also be extended to all and sundry. Till date, Screen readers for the visually impaired still perform below expectation; their applications are also domain dependent. Generally, research has shown that the Visually Impaired Persons (VIPs) tend to be greatly deprived of certain job opportunities due to their visual incapacitation and as such the unemployment rates among the visually impaired are increasingly alarming irrespective of their intellectual prowess. Therefore, to improve Text Recognition capabilities of OCR and incorporate the visually impaired community into employment setting, a Robust Camera Based Text Recognition model that will enable a blind person access documents and scene images for effective work collaboration is proposed. The system was designed to come up once the user machine is turned on. To bring this Concept to light, deep learning approach precisely CRAFT (Character-Region Awareness for Text Detection) Architecture which is suitable for detecting Curved images was deployed for text detection and CRNN (Convolutional Recurrent Neural Network) which combines the functionalities of CNN (Convolution Neural Network), RNN (Recurrent Neural Network) and CTC (Connectionist Temporal Classification) loss for an optimal Character Recognition was deployed. The Recognition Model was trained using Synth90k synthetic text dataset provided by the Visual Geometry Group (VGG) architecture which gives recognition accuracy of 98%. The system was implemented using Python Natural Language Processing Libraries. Finally, the recognized text is then communicated to the VIP in audio format.enAttribution-NonCommercial-ShareAlike 4.0 InternationalArtificial intelligencemachine learningcomputer visionspeech processingvisual impaireddepartment of computer scienceDevelopment of a robust camera-based text recognition model for the visual impaired.Doctoral Thesis