SALTforum October 22-16
 
 

SALT Profile for SVG

SALT Forum, May, 2003

Authors:
Kuansan Wang (Microsoft)
Antoine Quint (Fuchsia Design)
On behalf of: the SALT Forum

Status of this document:
This document has been reviewed and approved by members of the SALT Forum Technical Working Group and other interested parties. The document defines a profile for SALT applications in conjunction with W3C SVG 1.1 and its mobile profiles. At least one implementation is known to validate the feasibility of the proposed profile at the time of the publication.

The IPRs covered by the material in this document follow the same RF policy of SALT Specification 1.0 (see the SALT Forum Adopters Agreement and the SALT Forum W3C IP statement).

Introduction

Speech Application Language Tags, or SALT, is a specification to provide speech capabilities to any XML applications. The SALT 1.0 contribution to W3C [1] details how SALT can enable users of various W3C recommendations, such as HTML, XHTML, and SMIL, to interact with the Web page in naturally spoken commands. This document supplements the original contribution with notes on how SALT can be incorporated into another important W3C recommendation: SVG, which has seen considerable supports and implementations world wide.

Brief Overview of SVG

In January 2003, W3C formally issued the recommendation for SVG 1.1 [2] and SVG Mobile Profile [3]. Based on the proven vector graphics technology, SVG 1.1 is an XML-based specification to render advanced graphics and images. It enables Web developers to deliver a visually rich user interaction experience in their Web applications.

Because it is XML-based, SVG fully enjoys the extensibility promised by XML while leveraging other existing W3C specifications that have been widely adopted by the Web developer community. For example, developers can use CSS to fine tune the look and feel, and SMIL to synthesize sophisticated timing controls for their visual presentations. SVG is also equipped with a fully featured DOM, which makes the SVG based user interface fully programmable in the mainstream object oriented fashion. DOM programmers will find familiar and straightforward the SVG extensions to DOM modules, such as Core, CSS, Style Sheets, and Events.

Despite the name, SVG actually has native supports for not only graphics but also images and text. The notion of "text is text" in SVG is quite important as not only does it increase efficiency and flexibility for text handling, but it also makes SVG, when combined with the hyperlink specification XLink, a natural candidate to render W3C's upcoming XHTML 2.0 specification. The native text support also puts SVG in a great position for accessibility. For example, the text contents in SVG can be read out programmatically and channeled through a text-to-speech synthesizer for eyes-free applications, such as for users who are operating motor vehicles or who are visually challenged.

Recognizing the growing popularity of accessing the Web through mobile devices that are often very resource-sensitive in terms of computational powers, W3C has also recommended the SVG Mobile Profile for vendors to implement a common subset of SVG. A wide adoption of the Mobile Profile will promote greater application portability. According to W3C SVG web site, quite a few companies and products have pledged support for the SVG Mobile Profile. Roughly speaking, SVG Mobile Profile defines two subsets of SVG: SVG Basic targets devices that can afford scripting support and SVG Tiny for even thinner devices that, among other things, do not allow scripting.

W3C is continuing the evolvement of SVG. The upcoming versions of SVG will see more capabilities for typesetting text contents as well as rendering arbitrary XML documents with clearly defined rendering grammars. Not only will these new features bring SVG to a par with XHTML in terms of hypertext display, but also make SVG suitable as a rendering engine for other high level, XML based UI languages (e.g. XForms). As a result, for resource sensitive devices, an on-device SVG implementation coupled with server side XML translations seems to be a practical and economical approach to accommodate diverse applications authored in these XML based UI languages.

Brief Overview of SALT

As SVG excels at visual rendering, SALT excels at providing rich and lightweight support for speech interactions. At the core of SALT are two objects: a "listen" object that provides all the functionality for speech input, and a "prompt" object, speech output. The speech input object can record or convert the user's utterance into the corresponding text transcription, whereas the speech output object can play pre-recorded waveforms or synthesize speech from text.

A key distinctive design of SALT 1.0 is the lack of self hosting capability. Instead, SALT objects are intended to be embedded into other XML applications and follow the programming and execution model of their hosting environment. This principle yields two important outcomes. First, by leveraging functionality already in existence in the hosting environment (e.g., data model, interaction model, etc.), SALT core features become very compact and lightweight. Second, it enables SALT to provide a very flexible and platform independent object model so that SALT can be used to speech enable applications of either a procedural or a declarative leaning programming style. When viewed from a scripting environment, for example, all SALT objects have all the characteristics of a scriptable entity, with properties, methods and events to be manipulated for desired effects. On the other hand, SALT also defines a set of Prolog-like behaviors that are more palatable with a declarative programming style. The choice of SALT programming style rests upon application developers first and foremost, and is not dictated by the platform vendors.

Much like SVG's leveraging of existing W3C specifications, SALT makes extensive references to widely adopted standards such as CSS and SMIL. The details can be found in Section 2.8.2 and 2.8.3 of the SALT 1.0 specification, respectively.

Much like SVG in recognizing the importance of diverse access to the Web, SALT modularizes itself around four main areas of speech applications. Using the terms in the SALT 1.0 specification, these are: (1) smart clients without scripting, (2) smart clients with scripting, (3) rich clients, and (4) telephony systems. The smart client category covers mainly the mobile phone or PDA, whereas the rich client, desktop, laptop or Tablet PC. More details can be found in Section 4.2 of the SALT 1.0 specification.

SVG Profile for SALT

It is recommended that when embedding SALT into SVG with CSS and/or SMIL in use, all the SALT object behaviors under CSS and/or SMIL are identical to those already defined in Section 2.8 of the SALT 1.0 specification.

It is further recommended that a full implementation of SVG 1.1 is treated as a rich client of SALT, and the SVG Basic and SVG Tiny be treated as smart client with and without scripting, respectively, with all the modules defined in SALT 1.0 specification made available in accordance with the SALT conformance guidelines.

Appendix A. Comparison to HTML Profile for SALT

Unlike HTML that predates XML, SVG is XML from the onset. As a result, SVG does not have many legacy burdens, such as the DOM Level 0 objects in HTML (1). As a result, discussions in the SALT specification with respect to HTML (Sec. 2.8.2.2.1.1.4), especially those surrounding DOM Level 0 objects (e.g., 'window', 'document', etc.), do not apply to SVG. SALT sample code based on HTML may need adaptation to comply with the programming styles of SVG 1.1 and XML in general (See Appendix B for an example). It is worth noting, however, that the upcoming SVG 1.2 (Second working draft release by W3C on April 29, 2003) has introduced several objects, such as SVGWindow and SVGDocument, that bear close resemblance to DOM Level 0 objects. These new features will bring a more consistent programming style between SVG and HTML.

SALT demonstrates how HTML can be used for telephony applications. However, SVG is designed for a graphically rich environment, and therefore unlikely to be applied to a voice-only telephony world. It is appropriate that SALT SVG profile does not consider telephony applications at this point.

SVG employs SMIL for media controls, namely, SVG applications will abide by the specification of the media queue in the SMIL Content Control Module. As per Sec. 2.8.3 of the SALT specification, the PromptQueue object can be left unimplemented when SMIL is used for content control. The section above implies such an arrangement for SVG applications as well.

Appendix B. Code Example: SALT-enabling an existing SVG application

One feature of SALT is to speech-enable existing applications without requiring a massive code rewrite. In this appendix, the sample code to SALT-enable the SVG Chart & Graph demo by Adobe is demonstrated. This is an SVG application in which the end user can enter and visualize the data in one of the three choices: point graph, pie chart, or bar chart. In Adobe's demo, the user interaction is conducted exclusively through GUI objects. (Note that the original demo was created before the SVG Recommendation was published by W3C, and therefore the sample program contains some function and property names that are not consistent with the final SVG 1.1 Recommendation. In the following, we have modified these inconsistencies based on the official W3C document.)

All SVG applications have <svg> as the XML root element. In this example, the high level document structure can be sketched as follows:

   <svg onload="Initialize(evt)" ...>
      // application utilities
      // CSS related declarations
      <script><![CDATA[
         function Initialize(evt) {
            SVGDocument doc = evt.target.ownerDocument;
            // implementations of onload event handler
            // was: evt.getTarget().getOwnerDocument();
         }
         // other script functions
      ]]></script>
      <!-- body of SVG markup -->
        // core SVG content
      <!-- body of SVG markup -->
   </svg>

Unlike all the HTML samples in SALT 1.0 specification, where listening to the document load event is often carried out at the HTML body element, this example program listens to the document load event at the root SVG element. As in other XML applications, all the event handlers follow the DOM Level 2 Event specification and take as an argument an object of DOM Event type (or equivalent, through inheritance or polymorphism). The event target can be obtained via the target property of the event object. This is in contrast to HTML where the event object is obtained through the DOM Level 0 object window.event. As mentioned in Appendix A, however, the upcoming SVG 1.2 will allow SVG developers to read the event object through the evt property of the SVGWindow object.

The core of the graphical rendering appears in the SVG body section where SVG elements such as <g>, <path>, etc., are used to draw various shapes. The user interactions are managed through the corresponding event handlers implemented in the script section. In other words, the self hosting structure of SVG resembles that of HTML and SMIL. As a result, the procedure to SALT-enable this SVG document is very similar to SALT-enabling any HTML documents. For example, we can introduce a SALT listen object to allow user to enter data in speech by adding the following three lines of code at the end of the main body section:

   <!-- body of SVG markup -->
      // core SVG content
      <listen xmlns="http://..." onreco="handleReco(evt)">
         <grammar src="recogrammar.grxml"/>
      </listen>
   <!-- body of SVG markup -->

Since SVG is an XML application, it allows "foreign" XML elements (2) fully decorated with their namespaces, such as the SALT listen element in this case. The listen element loads in a recognition grammar, and has a handleReco handler registered with the recognition event. The recognition grammar includes simple semantic interpretation and produces outcome in XML. For example, a sentence like "add five to United State" will generate the outcome in the recognition XML (here called SML) as:

   <sml confidence="0.7" text="add five to United States">
      <cmd text="add">Add</cmd>
      <number text="five">5</number>
      <label text="United Sates">US</label>
   </sml>

The event handler parses the outcome using XPath and invokes the proper actions:

   function handleReco(evt) {
      var obj = evt.getTarget().recoresut; // obtain the SML
      var cmd = obj.selectSingleNode("//cmd");
      if (cmd.text == "Clear")
         clearChart();
      else if (cmd.text == "Delete")
         clearOuties();
      else if (cmd.text == "Add") {
         var num = obj.selectSingleNode("//number");
         var itm = obj.selectSingleNode("//label");
         addChartValue(num, itm, false);
      }
      ...
   }

Note the functions clearChart, clearOuties, and addChartValue, are existing routines for GUI interaction. The SALT speech event handler can invoke these routines, and thereby reuse the existing interaction logic, by acting like a speech "event translator" as shown in the example above.

Appendix C: References

[1] http://www.saltforum.org/saltforum/downloads/SALT1.0.pdf
[2] http://www.w3.org/TR/SVG11/
[3] http://www.w3.org/TR/SVGMobile/


Notes

(1) DOM Level 0 objects, pioneered by AOL Netscape and followed by major browser vendors, were referred to in the W3C DOM Level 1 but have not been officially recommended by W3C.
(2) Although SVG only considers rendering arbitrary XML documents in its current version, the upcoming version will further allow foreign XML document authors to extend SVG and include customized behavior models so as to create a richer user interaction.