Content by Category
.NET 1.x
.NET 2.0
.NET 3.0
.NET 3.5
.NET 4.0
.NET Assemblies
.NET Framework
.NET Getting Started
Accessibility
ADO.NET
Advertorials
Agile Development
AJAX
Architecture
ASP.NET
ASP.NET MVC
ASP.NET WebForms
B2B (Business Integration)
BizTalk
Book Excerpts
Build and Deploy
C#
C++
Code Contracts
CODE on the Road!
COM+
Community
Conferences
Continuous Integration
Crystal Reports
CSLA.NET
CSS
Data
Design Patterns
Development Process
Display Technologies
Distributed Computing
DotNetNuke
DSL
Dynamic Programming
Editorials
Enterprise Services ("COM+")
Entity Framework
Events
Expression Blend
F#
Fox to Fox
Frameworks
Functional Programming
Graphics
Internet Explorer 8.0
Interviews
iPhone
Java
Java Script
jQuery
LINQ
Linux
Mac OS X
MDX
Microsoft Application Blocks
Microsoft Business Rules Framework
Microsoft Expression
Microsoft Office
Mobile Development
Mobile PC
Mono
Network
NHibernate
Object Oriented Development
Open Source
Opinion
Opinions
Oracle
ORM
Other Languages
Parallel Programming
Patterns
Podcasts
Post Mortem
PowerPoint
Print/Output
Product News
Product Reviews
Project Management
Python
Q&A
Reporting Services
REST
RIA Services
Ruby
Search
Security
Services
SharePoint
Silverlight
SOA
Social Networks
Software & Law
Software Business
Source Control
Speech-Enabled Applications
SQL Server
SQL Server 2000
SQL Server 2005
SQL Server 2008
SQL Server CE/AnyWhere/Mobile/Compact
Subversion
Sync Framework
Tablet PC
TDD
Team System
Techniques
Testing and Quality Control
Tips
UI Design
UML
User Groups
VB Script
VB.NET
VFP and .NET
VFP and SQL Server
Virtual Earth
Vista
Visual Basic
Visual Basic 6 (and older)
Visual FoxPro
Visual Studio .NET
Visual Studio 2005
Visual Studio 2008
Visual Studio 2010
Visual Studio Tools for Office
VSX
WCF
Web Development (general)
Web Services
WF
Whitepapers
Windows 7
Windows Azure
Windows Live
Windows Server
Windows Vista
WinForms
Workflow
WPF
XAML
XML
XNA
XSLT



ESDC


 


CODE TRAINING

Reader rating:
Click here to read 2 comments about this article.
Article source: CoDe (2005 - Nov/Dec)


Article Pages: < Previous - 1 2 3 4 5  6  7 - Next >


Building Speech-Enabled Applications with ASP.NET (Cont.)

Grammar Authoring

Speech is an interactive process of prompts and commands. Semantic Markup Language Grammars are the set of structured command rules that identify words, phrases, and valid selections that are collected in response to an application prompt. Grammars provide both the exact words and the order in which the commands can be said by application users. A grammar can consist of a single word, a list of acceptable words or complex phrases. Structurally it’s a combination of XML and plain text that is the result of attempting to match the user responses Within MSS, this set of data conforms to the W3C Speech Recognition Grammar Specification (SRGS). An example of a simple grammar file that allows for the selection of a sandwich is shown in Listing 4.

Grammars form the guidelines that applications must use to recognize the possible commands that a user might issue. Unless the words or phrases are defined in the grammar structure, the application cannot recognize the user’s speech commands and returns an error. You can think of grammar as a vocabulary of what can be said by the user and what can be understood by the application. This is like a lookup table in a database that provides a list of options to the user, rather than accepting free-form text input.

A very simple application can limit spoken commands to a single word like “open” or “print.” In this case, the grammar is not much more than a list of words. However, most applications require a richer set of commands and sentences. The user interacting with this type of speech application expects to use a normal and natural language level. This increases the expectation for any application and requires additional thought during design. For example, an application must accept, “I would like to buy a roast beef sandwich,” as well as, “Gimme a ham sandwich.”

A well-defined grammar provides a bit more functionality than that, of course. It won’t just define the options, but also the additional phrases such as a preamble to a sentence. For example, the grammar corresponding to the question above must also recognize “I would like to” in addition to the option “roast beef.” So given this, the grammar is essentially a sentence or sequence of phrases broken down into their smallest component parts.

Another job of the grammar is to map multiple similar phrases to a single semantic meaning. Consider all the ways a user can ask for help. The user may say “help,” “huh,” or “what are my choices.” Ultimately, however, in all three cases the user is asking for help. The grammar is responsible for defining all three phrases and maps them to a single set of options. The benefit is that a developer only has to write the code to deal with the phrase “help.”

Implementing Grammar

Within a Visual Studio speech application, grammar files have a .grxml extension and are added independently as shown in Figure 13. Once added to a project, the Grammar Editing Tool, as shown in Figure 14, is used to add and update the independent elements. This tool is designed to provide a graphical layout using a left to right view of the phrases and rules stored in a particular grammar file. Essentially, it provides a visualization of the underlying SRGS format, in a word graph rather than the hierarchical XML.

Click for a larger version of this image.

Figure 12:Within a Visual STudio speech application, grammar files have a .grxml extention and are added directly to the project.

Click for a larger version of this image.

Figure 13:The Grammar Editing Tool is used to manage and define the various elements of the speech prompts.

For developers, the goal of the Grammar Editor is to present a flowchart of the valid grammar paths. A valid phrase is defined by a successful path through this flowchart. Building recognition rules is done by dragging the set of toolbox elements listed in Table 2 onto the design canvas. The design canvas displays the set of valid toolbox shapes and represents the underlying SRGS elements.

During development the Grammar Editor provides the ability to show both the path of an utterance and the returned SML document as shown in Figure 15. For example, the string, “I would like to buy a ham sandwich” is entered into the Recognition string text box at the top and the path the recognizer took through the grammar is highlighted. At the bottom of the screen the build output window displays a copy of the SML document returned by the recognizer. This feature provides an important way to validate and test that both the grammar and SML document returned are accurate.

Click for a larger version of this image.

Figure 14:During development the grammar Editor provides the ability to show both the path of an utterance and the returned SML within the Visual Studio environment.

Structurally the editor provides the list of rules that identify words or phrases that an application user is able to provide. A rule defines a pattern of speech input that is recognized by the application. At run time the speech engine attempts to find a complete path through the rule using the supplied voice input. If a path is found the recognition is successful and results are returned to the application in the form of an SML document. This is an XML-based document that combines the utterance, semantic items, and a confidence value defined by the grammar as shown below.

<SML confidence="1.000" text="ham" 
   utteranceConfidence="1.000"ham</SML

The confidence value is a score returned by the recognition engine that indicates the degree of confidence it has in recognizing the audio. Confidence values are often used to drive the confirmation logic within an application. For example, you may want to trigger a confirmation answer if the confidence value falls below a specific threshold such as .8.

The SASDK also includes the ability to leverage other data types as grammar within an application. The clear benefit is that you don’t have to manually author every specific grammar rule. Adding these external grammars can be done through an included Grammar Library or using a process called data-driven grammar.

The Grammar Library is a reusable collection of rules provided in SRGS format that are designed to cover a variety of basic types. For example, this includes grammar for recognizing numbers and mapping holiday dates to their actual calendar dates. Data-driven grammar is a feature provided by three Application Speech controls. The ListSelector and DataTableNavigator controls enable you to take SQL Server data, bind it to the control, and automatically make all the data accessible by voice. Logically this means that you don’t have to recreate all the data stored in a database into a grammar file. The third control, the AlphaDigit control, isn’t a data-bound control. Rather, it automatically generates a grammar for recognizing a masked sequence. For example, the mask “DDA” would recognize any string following the format: digit, digit, character.

&


Table 2: The elements of the Grammar Editor toolbox.
ElementDescription
PhraseThe phrase element represents a single grammatical entry.
ListThe list element specifies the relationship between a group of phrases.
GroupThe group element binds a series of phrases together in a sequence.
Rule ReferenceThe rule reference element provides the ability to reference an external encapsulated rule.
Script TagThe script tag element defines the set of valid phrases for this grammar.
Wild CardThe wild card element allows any part of a response to be ignored.
Skip The skip element creates an optional group that can be used to insert or format semantic tags at key points in the grammar
HaltThe halt element immediately stops recognition when it is encountered.


Listing 4: SRGS within a grammar file
<?xml version="1.0"?>
<grammar xml:lang="en-US" tag-format="semantics-ms/1.0" version="1.0" root="Root" mode="dtmf" xmlns="http://www.w3.org/2001/06/grammar" xmlns:sapi="http://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">
   <rule id="sandwich" scope="public">
      <one-of>
         <item>
            <item>ham</item>
            <tag>$._value = "ham"</tag>
         </item>
         <item>
            <item>roast beef</item>
            <tag>$._value = "roast beef"</tag>
         </item>
         <item>
            <item>italian</item>
            <tag>$._value = "italian"</tag>
         </item>
      </one-of>
   </rule>
</grammar>


Article Pages: < Previous - 1 2 3 4 5  6  7 - Next Page: 'Application Deployment' >>

Page 1: Building Speech-Enabled Applications with ASP.NET
Page 2: Defining the Application
Page 3: Building Speech Applications
Page 4: Dialog Speech Controls
Page 5: Recording a Prompt
Page 6: Grammar Authoring
Page 7: Application Deployment

How would you rate the quality of this article?
1 2 3 4 5
Poor      Outstanding

Tell us why you rated the content this way. (optional)

Average rating:
3.3 out of 5

29 people have rated this article.

      Tower 48

 

CODE TRAINING