Exploring Ink Analysis
The Tablet PC SDK makes it easy to incorporate digital ink and handwriting analysis into applications; and now the InkAnalysis API (available in Windows Vista™ as well as downlevel to the Microsoft® Windows® XP operating system through a redistributable) takes it one step further. Actually, the InkAnalysis API exposes some of the lower-level functions that make handwriting recognition possible. It also exposes some functionality that can improve recognition results, support shapes, alternative recognition results, and spatial analysis. In this article, I will take a deeper look into what goes on behind the scenes and how to take advantage of the tablet team’s hard work.
When I first started to work on Tablet PC solutions, the depth and the richness of the Tablet PC API immediately impressed me. Thanks to the well-designed Tablet PC SDK, I could add handwriting recognition and support for digital ink without concerning myself with the gritty details and hard work of handwriting recognition.
As far as most tablet application developers are concerned, handwriting recognition is as easy as calling a ToString method.
The following code best exemplifies the basic scenario of converting ink to text that most tablet developers are familiar with.string recognitionResults =
MessageBox.Show("Recognition Results:" +
As you can see in the above code, the Strokes collection’s ToString method does all the work here, easily converting the ink strokes to text.
The ability to interpret structure as well as content from written text opens up a number of possibilities.
But, is handwriting recognition really that easy? Of course it isn’t. A lot of research went into building the handwriting recognition engine to make it this easy for tablet developers.
What goes on behind the scenes? More to the point, is there any functionality exposed at these lower levels of the API? How can you leverage the process to add advanced functionality to applications?
Developers are never happy for long. They have a curious nature and push the envelope-to keep expanding what’s possible. So it is with handwriting recognition. As you develop more Tablet PC applications, you will quickly find that the ToString method of handwriting recognition leaves you wanting more.
What if you wanted to see more than just the top result? What if you wanted to recognize shapes as well? What if, in addition to converting the text of the document, you wanted to copy the structure of the document?
Ink analysis makes all this and more possible.
But first, I will briefly review the basic mechanics of digital ink and what goes on when your code calls the ToString method.
The Basics of Recognition
Most developers know that the Tablet PC can convert handwriting to text, but fewer know about the shape recognition features of the Tablet PC platform. The subject of ink analysis covers handwriting, shape, and document recognition. First, I will examine the path ink takes when you call the ToString method.
The ink data type is a vector-based graphics format that consists of a series of strokes. Strokes are Bezier curves, which are made up of a series of discrete points.
The handwriting recognition engine analyzes these strokes, produces a list of possible alternatives comparing them against a system-wide dictionary, and ranks them according to what the engine believes is the most likely result. The result with the highest confidence rank wins and becomes the value returned from the recognizers.
In order to do this, the handwriting recognition engine has to determine several important key factors; the same factors that the human eye takes into account when reading text.
First, the engine performs a rudimentary spatial analysis on the ink: determining the baseline on which the text is written, the boundaries of paragraphs, lines, and even individual words. The analysis in this scenario is quite different than the work done in the InkAnalysis API, which I will dig deeper into later in this article.
Figure 1 shows the baseline drawn out over a line of text. The spatial analysis engine then passes these individual words to the recognition engine, which evaluates them and returns a text value and a confidence level. The recognition engine assigns a confidence level to every text result it associates with a series of strokes. It returns the result with the highest level of confidence as the value from the ToString method in the previous example.
Figure 1: The recognition engine first determines the baseline on which the text is written.
By: Frank LaVigne
Frank La Vigne is a Microsoft® Tablet PC MVP and Lead Architect/Designer for Applied Information Sciences (AIS) in Northern Virginia.
Frank started in software development when he was twelve, writing BASIC programs for the Commodore 64.
He began his professional career writing Visual Basic 3 applications for Wall Street firms in 1993. He then moved on to be the first webmaster for a major book retailer. Frank then went on to develop a large multinational online banking project in Germany.
In 2004, Frank became heavily focused on Tablet PC application development. Since then, he has been working on various tablet-based solutions.
You can read more on his blog at http://www.FranksWorld.com
The Ink Analysis API goes beyond text recognition and interprets the structure of a document adding more power and value to ink-enabled applications.
Tablet PC, Mobile PC, or UMPC?
Throughout this article I refer to the Tablet PC platform, which includes mobile PCs and the sleek Ultra-Mobile PC (UMPC).
All the techniques and code mentioned in this article will run on all these devices.