SALT Profile for SVG
SALT Forum, May, 2003
Authors:
Kuansan Wang (Microsoft)
Antoine Quint (Fuchsia Design)
On behalf of: the SALT Forum
Status of this document:
This document has been reviewed and approved by members of the SALT Forum
Technical Working Group and other interested parties. The document defines a
profile for SALT applications in conjunction with W3C SVG 1.1 and its mobile
profiles. At least one implementation is known to validate the feasibility of
the proposed profile at the time of the publication.
The IPRs covered by the material in this document follow the same RF
policy of SALT Specification 1.0 (see the
SALT
Forum Adopters Agreement and the
SALT Forum W3C IP statement).
Introduction
Speech Application Language Tags, or SALT, is a specification to provide speech
capabilities to any XML applications. The SALT 1.0 contribution to W3C
[1] details how SALT can enable users of various W3C recommendations,
such as HTML, XHTML, and SMIL, to interact with the Web page in naturally
spoken commands. This document supplements the original contribution with notes
on how SALT can be incorporated into another important W3C recommendation: SVG,
which has seen considerable supports and implementations world wide.
Brief Overview of SVG
In January 2003, W3C formally issued the recommendation for SVG 1.1
[2] and SVG Mobile Profile [3]. Based on the proven
vector graphics technology, SVG 1.1 is an XML-based specification to render
advanced graphics and images. It enables Web developers to deliver a visually
rich user interaction experience in their Web applications.
Because it is XML-based, SVG fully enjoys the extensibility promised by XML
while leveraging other existing W3C specifications that have been widely
adopted by the Web developer community. For example, developers can use CSS to
fine tune the look and feel, and SMIL to synthesize sophisticated timing
controls for their visual presentations. SVG is also equipped with a fully
featured DOM, which makes the SVG based user interface fully programmable in
the mainstream object oriented fashion. DOM programmers will find familiar and
straightforward the SVG extensions to DOM modules, such as Core, CSS, Style
Sheets, and Events.
Despite the name, SVG actually has native supports for not only graphics but
also images and text. The notion of "text is text" in SVG is quite important as
not only does it increase efficiency and flexibility for text handling, but it
also makes SVG, when combined with the hyperlink specification XLink, a natural
candidate to render W3C's upcoming XHTML 2.0 specification. The native text
support also puts SVG in a great position for accessibility. For example, the
text contents in SVG can be read out programmatically and channeled through a
text-to-speech synthesizer for eyes-free applications, such as for users who
are operating motor vehicles or who are visually challenged.
Recognizing the growing popularity of accessing the Web through mobile devices
that are often very resource-sensitive in terms of computational powers, W3C
has also recommended the SVG Mobile Profile for vendors to implement a common
subset of SVG. A wide adoption of the Mobile Profile will promote greater
application portability. According to W3C SVG web site, quite a few companies
and products have pledged support for the SVG Mobile Profile. Roughly speaking,
SVG Mobile Profile defines two subsets of SVG: SVG Basic targets devices
that can afford scripting support and SVG Tiny for even thinner devices
that, among other things, do not allow scripting.
W3C is continuing the evolvement of SVG. The upcoming versions of SVG will see
more capabilities for typesetting text contents as well as rendering arbitrary
XML documents with clearly defined rendering grammars. Not only will these new
features bring SVG to a par with XHTML in terms of hypertext display, but also
make SVG suitable as a rendering engine for other high level, XML based UI
languages (e.g. XForms). As a result, for resource sensitive devices, an
on-device SVG implementation coupled with server side XML translations seems to
be a practical and economical approach to accommodate diverse applications
authored in these XML based UI languages.
Brief Overview of SALT
As SVG excels at visual rendering, SALT excels at providing rich and
lightweight support for speech interactions. At the core of SALT are two
objects: a "listen" object that provides all the functionality for speech
input, and a "prompt" object, speech output. The speech input object can record
or convert the user's utterance into the corresponding text transcription,
whereas the speech output object can play pre-recorded waveforms or synthesize
speech from text.
A key distinctive design of SALT 1.0 is the lack of self hosting capability.
Instead, SALT objects are intended to be embedded into other XML applications
and follow the programming and execution model of their hosting environment.
This principle yields two important outcomes. First, by leveraging
functionality already in existence in the hosting environment (e.g., data
model, interaction model, etc.), SALT core features become very compact and
lightweight. Second, it enables SALT to provide a very flexible and platform
independent object model so that SALT can be used to speech enable applications
of either a procedural or a declarative leaning programming style. When viewed
from a scripting environment, for example, all SALT objects have all the
characteristics of a scriptable entity, with properties, methods and events to
be manipulated for desired effects. On the other hand, SALT also defines a set
of Prolog-like behaviors that are more palatable with a declarative programming
style. The choice of SALT programming style rests upon application developers
first and foremost, and is not dictated by the platform vendors.
Much like SVG's leveraging of existing W3C specifications, SALT makes extensive
references to widely adopted standards such as CSS and SMIL. The details can be
found in Section 2.8.2 and 2.8.3 of the SALT 1.0 specification, respectively.
Much like SVG in recognizing the importance of diverse access to the Web, SALT
modularizes itself around four main areas of speech applications. Using the
terms in the SALT 1.0 specification, these are: (1) smart clients without
scripting, (2) smart clients with scripting, (3) rich clients, and (4)
telephony systems. The smart client category covers mainly the mobile phone or
PDA, whereas the rich client, desktop, laptop or Tablet PC. More details can be
found in Section 4.2 of the SALT 1.0 specification.
SVG Profile for SALT
It is recommended that when embedding SALT into SVG with CSS and/or SMIL in
use, all the SALT object behaviors under CSS and/or SMIL are identical to those
already defined in Section 2.8 of the SALT 1.0 specification.
It is further recommended that a full implementation of SVG 1.1 is treated as a
rich client of SALT, and the SVG Basic and SVG Tiny be treated as smart client
with and without scripting, respectively, with all the modules defined in SALT
1.0 specification made available in accordance with the SALT conformance
guidelines.
Unlike HTML that predates XML, SVG is XML from the onset. As a result, SVG does
not have many legacy burdens, such as the DOM Level 0 objects in HTML (1).
As a result, discussions in the SALT specification with respect to HTML (Sec.
2.8.2.2.1.1.4), especially those surrounding DOM Level 0 objects (e.g., 'window',
'document', etc.), do not apply to SVG. SALT sample code based on HTML
may need adaptation to comply with the programming styles of SVG 1.1 and XML in
general (See Appendix B for an example). It is worth noting, however, that the
upcoming SVG 1.2 (Second working draft release by W3C on April 29, 2003) has
introduced several objects, such as SVGWindow and SVGDocument, that bear close
resemblance to DOM Level 0 objects. These new features will bring a more
consistent programming style between SVG and HTML.
SALT demonstrates how HTML can be used for telephony applications. However, SVG
is designed for a graphically rich environment, and therefore unlikely to be
applied to a voice-only telephony world. It is appropriate that SALT SVG
profile does not consider telephony applications at this point.
SVG employs SMIL for media controls, namely, SVG applications will abide by the
specification of the media queue in the SMIL Content Control Module. As per
Sec. 2.8.3 of the SALT specification, the PromptQueue object can be left
unimplemented when SMIL is used for content control. The section above implies
such an arrangement for SVG applications as well.
One feature of SALT is to speech-enable existing applications without requiring
a massive code rewrite. In this appendix, the sample code to SALT-enable the
SVG Chart & Graph demo by Adobe is demonstrated. This is an SVG
application in which the end user can enter and visualize the data in one of
the three choices: point graph, pie chart, or bar chart. In Adobe's demo, the
user interaction is conducted exclusively through GUI objects. (Note that the
original demo was created before the SVG Recommendation was published by W3C,
and therefore the sample program contains some function and property names that
are not consistent with the final SVG 1.1 Recommendation. In the following, we
have modified these inconsistencies based on the official W3C document.)
All SVG applications have <svg> as the XML root element. In this
example, the high level document structure can be sketched as follows:
<svg onload="Initialize(evt)" ...>
// application utilities
// CSS related declarations
<script><![CDATA[
function Initialize(evt) {
SVGDocument doc = evt.target.ownerDocument;
// implementations of onload event handler
// was: evt.getTarget().getOwnerDocument();
}
// other script functions
]]></script>
<!-- body of SVG markup -->
// core SVG content
<!-- body of SVG markup -->
</svg>
Unlike all the HTML samples in SALT 1.0 specification, where listening to the
document load event is often carried out at the HTML body element,
this example program listens to the document load event at the root SVG
element. As in other XML applications, all the event handlers follow the DOM
Level 2 Event specification and take as an argument an object of DOM Event type
(or equivalent, through inheritance or polymorphism). The event target can be
obtained via the target property of the event object. This is in
contrast to HTML where the event object is obtained through the DOM Level 0
object window.event. As mentioned in Appendix A, however, the upcoming
SVG 1.2 will allow SVG developers to read the event object through the evt
property of the SVGWindow object.
The core of the graphical rendering appears in the SVG body section where SVG
elements such as <g>, <path>, etc., are used to
draw various shapes. The user interactions are managed through the
corresponding event handlers implemented in the script section. In other words,
the self hosting structure of SVG resembles that of HTML and SMIL. As a result,
the procedure to SALT-enable this SVG document is very similar to SALT-enabling
any HTML documents. For example, we can introduce a SALT listen object to allow
user to enter data in speech by adding the following three lines of code at the
end of the main body section:
<!-- body of SVG markup -->
// core SVG content
<listen xmlns="http://..." onreco="handleReco(evt)">
<grammar src="recogrammar.grxml"/>
</listen>
<!-- body of SVG markup -->
Since SVG is an XML application, it allows "foreign" XML elements (2)
fully decorated with their namespaces, such as the SALT listen element in this
case. The listen element loads in a recognition grammar, and has a handleReco
handler registered with the recognition event. The recognition grammar includes
simple semantic interpretation and produces outcome in XML. For example, a
sentence like "add five to United State" will generate the outcome in the
recognition XML (here called SML) as:
<sml confidence="0.7" text="add five to United States">
<cmd text="add">Add</cmd>
<number text="five">5</number>
<label text="United Sates">US</label>
</sml>
The event handler parses the outcome using XPath and invokes the proper
actions:
function handleReco(evt) {
var obj = evt.getTarget().recoresut; // obtain the SML
var cmd = obj.selectSingleNode("//cmd");
if (cmd.text == "Clear")
clearChart();
else if (cmd.text == "Delete")
clearOuties();
else if (cmd.text == "Add") {
var num = obj.selectSingleNode("//number");
var itm = obj.selectSingleNode("//label");
addChartValue(num, itm, false);
}
...
}
Note the functions clearChart, clearOuties, and addChartValue,
are existing routines for GUI interaction. The SALT speech event handler can
invoke these routines, and thereby reuse the existing interaction logic, by
acting like a speech "event translator" as shown in the example above.
[1] http://www.saltforum.org/saltforum/downloads/SALT1.0.pdf
[2] http://www.w3.org/TR/SVG11/
[3] http://www.w3.org/TR/SVGMobile/
(1) DOM Level 0 objects, pioneered by AOL Netscape and followed by major
browser vendors, were referred to in the
W3C DOM Level 1 but have not been officially recommended by W3C.
(2) Although SVG only considers rendering arbitrary XML documents in its
current version, the upcoming version will further allow foreign XML document
authors to extend SVG and include customized behavior models so as to create a
richer user interaction.
|