EMMA 2.0 Lets Applications Decide What to Tell You - and How
From: Speech Technology - Summer 2016 - page 37
By: Deborah Dahl

Information can be graphical or spoken, depending on context

The Extensible Multimodal Annotation language, known as EMMA, was designed by
the World Wide Web Consortium (W3C) as a way to represent user inputs,
particularly the kinds of rich, complex inputs possible with spoken natural
language. EMMA 1.0 became a W3C standard in 2009 and has since been used to
link processors like speech recognizers and natural language understanding
systems with platforms such as VoiceXML and Web browsers.

The latest version, EMMA 2.0, supports adapting content not only to various
screen sizes but to entirely different presentation formats - including
speech, graphics, combined speech and graphics, even robot actions. This
version can account for the user's general preferences as well as current
context. If a visual presentation is appropriate for the user and context,
then a graphical display is generated. If a spoken presentation is
appropriate, the output is spoken. The presentation could also include both
graphics and speech.

How do users gain from this type of adaptation? Devices with small screens,
like smart watches, or without screens, like the Amazon Echo, clearly need or
benefit from spoken output. Spoken output is also suited for eyes-busy tasks
like exercising or driving. Applications designed to be used in public or
noisy eovironments, on the other hand, will profit from graphical output.

Another benefit is accessibility, if the users' inputs and system outputs are
treated as generic meanings by the application, the core user-system
interaction logic doesn't have to change much to accommodate the different
types of presentations that might be preferred by users with disabilities.

Read the entire article at:
http://www.speechtechmag.com/Archives/ArchiveIssue.aspx?IssueID=6152

Link:
EMMA: Extensible MultiModal Annotation markup language Version 2.0
https://www.w3.org/TR/emma20