Handwriten mathematical expression recognition using graph grammars
Çelik, Mehmet (2010) Handwriten mathematical expression recognition using graph grammars. [Thesis]
Official URL: http://192.168.1.20/record=b1302950 (Table of Contents)
This thesis presents a graph grammar approach for the recognition of handwritten mathematical expressions. Pen based interfaces provide a natural human computer interaction; interfaces for entering mathematical expressions are no exception to that. The problem is challenging, as it includes the sub-problems of character recognition (OCR) and 2-dimensional structure understanding. Thus, on top of the problems of the standard OCR systems, such as high variation in character shapes, the two dimensional nature of a mathematical expression brings further ambiguity. We use graph grammars for structural understanding of the expressions in order to represent as much information as possible in the parse process. Representing input expression as a graph protects the geometrical relations among the symbols of the input, while alternatives include methods for linearization of the input which may introduce critical errors into the parse process. Also graph grammars have the advantage of flexibility over procedurally coded parse systems. Another important aspect of our system is the fact that all alternative parses are evaluated and the one with maximum likelihood is selected as the intended expression. The likelihoods are estimated according to OCR confidence scores and structural relationships statistics. The segmentation step precedes the parse process, and segments and groups strokes collected from the Tablet input, according to timestamps and distance in space respectively. Then, the segmented symbols are recognized by the OCR engine which uses offline (image) features to allow for flexibility in time dimension, such as adding extra strokes and symbols anytime during the equation. The extracted features are used in an ANN and SVM combination engine returning top-3 character alternatives and confidence values. The parse process expands the graph by generating new tokens with repeated application of grammar rules. At the end, one or more tokens contain the full expression, along with a confidence value based on the 2-dimensional layout of the symbols in the expression and the associated statistics of geometrical relations between symbols. These and the OCR confidence scores are used in disambiguating alternative parses. Our approach is more powerful compared to graph re-writing systems in that all alternative parses are evaluated, rather than selecting the most likely rule application at a particular step, in an irreversible fashion. This also eliminates the need for specifying rule precedences, making system development or use of alternate grammars easier. The only limitation of our system is that segmentation errors are irreversible. That is, the parse process does not handle alternate segmentations, in order to keep the complexity of the parse process down. We alleviate this problem by providing feedback to the user as the segmentation proceeds, in real time. Our user interface gives error correction tools to the user to correct OCR errors and it can generate LATEX code, and MathML codes and graphical rendering of the input handwritten mathematical expression. An extensive collection of mathematical expression and isolated symbols are collected from 15 users for 57 different expressions from a 70-character alphabet. There are, in total, 1710 mathematical expressions and 10500 isolated characters. All samples are in the natural writing styles of the users.
Repository Staff Only: item control page