Data Testing in Digital Transmission: Tamil Literature

மின்னணுப் பரிமாற்றத்தில் தரவுப் பரிசோதனை: தமிழ் மொழிஇலக்கியம்

Keywords: Optical Character Recognition, Handwritten Character Recognition, Tamil Language, Deep Learning, Techniques


The developing area of Optical Character Recognition (OCR) is digital handwriting recognition. Manual writing is replaced by a digital writing pad. The font and shape of the letters vary while writing digitally. The writer’s digital pen pressure and position on the digital pad cause covert text file problems during OCR recognition.
When converting OCR to text, an error occurs because of the variations in letter shapes. In languages like Tamil, Chinese, Arabic, and Telugu, where the alphabet is made up of bends, curves, and rings, the aforementioned issue occurs. Tamil has more word mistakes in OCR-to-text conversion because the alphabet comprises curves and angles that must be correctly transcribed. The ResNet (Residual Neural Network) Two-Stage Bottleneck Architecture (RTSBA) is suggested in this paper. In order to recognize text written in Tamil on a digital writing pad, this article suggests using ResNet (Residual Neural Network) Two-Stage Bottleneck Architecture (RTSBA). The suggested RTSBA reduces the complexity of the Tamil alphabet recognition problem by using two distinct phases of neural networks. There are fewer inputs and variables in the early stages. Time and computational complexity are minimized in the last phase. A two-channel and two-stream transformer, long short-term memory, Inception-v3, recurrent neural networks, convolutional neural networks, and other conventional algorithms have been compared to the suggested algorithm. The digital writing pad-handwritten and HP lab datasets demonstrate the effectiveness of proposed methods like RTSBA, which yield accuracy rates of 98.7% and 97.1%, respectively.

Abstract views: 40 times
PDF downloads: 31 times