WEB SURFING WITHOUT A MONITOR
by T. V. Raman
Scientific American - March, 1997, page 73

When I hook up to the Internet to check out the news on CNN, to peruse a
colleague's latest paper, or to see how Adobe's stock price is doing, I leave
the display on my laptop turned off. The batteries last much longer that way.
Besides, because I cannot see, a monitor is of no use to me.  

That a blind person can navigate the Internet just as efficiently and
effectively as any sighted person attests to the profound potential of
digital documents to improve human communication. Printed documents are fixed
snapshots of changing ideas; they limit the means of communication to the
paper on which they are stored. But in electronic form, documents can become
raw material for computers that can extract, catalogue and rearrange the
ideas in them. Used properly, technology can separate the message from the
medium so that we can access information wherever; whenever and in whatever
form we want. 

In my case - and in the case of someone who, while using a telephone or a
tiny handheld computer; becomes functionally blind - it is much easier to
hear material spoken out loud than to try to view it on a screen. But it is
no good to have the computer simply recite the page from top to bottom, as
conventional screen-reading programs do. Imagine trying to read a book using
only a one-line, 40 character display through which text marches
continuously.  

Ideally, an aural interface would preserve the best features of its visual
counterpart. Consider this special section of Scientific American. By reading
the table of contents, skimming the introductory article and flipping through
the section, you can quickly obtain a high-level overview and decide which
parts you want to read in detail. This works because the information has
structure: the contents are arranged in a list, titles are set in large type
and so on. But the passive, linear nature of listening typically prohibits
such multiple views it is impossible to survey the whole first and then
"zoom in" on portions of interest. 

Computers can remove this roadblock, but they need help. The author
scribbling on his virtual page must tag each block of text with a code
indicating its function (title, footnote, summary, and so on) rather than its
mere appearance (24-point boldface, six-point roman, indented italics, and
so on). Programs can then interpret the structural tags to present documents
as the reader; rather than the creator; wishes. Your software may render
titles in large type, whereas mine could read them a Ia James Earl Jones. 
Moreover; listeners can browse through a structured text selectively in the
same way you skim this printed magazine. 

Fortunately, Hypertext Markup Language (HTML) - the encoding system used to
prepare text for the World Wide Web was designed to capture the structure of
documents. If it were used as originally intended, the same electronic source
could be rendered in fine detail on a printer; at lower resolution on a
screen, in spoken language for functionally blind users, and in myriad other
ways to suit individual preferences. 

Unfortunately, HTML is steadily evolving under commercial pressure to enable
the design of purely visual Web pages that are completely unusable unless one
can see the color graphics and can rapidly download large images. This
current rush to design Web pages that can be viewed properly with only the
most popular programs and standard displays, and that moreover lack important
structural information, threatens to undermine the usefulness of the
documents archived on the Internet. 

For users with special needs, the only efficient way to obtain certain
information is to get it on-line. In the end, unprocessable digital documents
are not only useless to the blind but also represent a missed opportunity for
everyone. Archiving texts in a structurally rich form ensures that this vast
repository of knowledge can be reused, searched, and displayed in ways that
best suit individuals' needs and abilities, using software not yet invented
or even imagined. 

EMACSPEAK, a speech interface for desktop computers developed and distributed
freely by the author, supports a Web browser that translates the visual
structure and style of a document into intuitive audio clues, such as
distinctive tones and voices. A listener being read a long report by
Emacspeak can request an overview of subtitles, then interrupt the summary to
listen to the full text of any section. 

T.V. RAMAN is a researcher in the Advanced Technology Group at Adobe Systems
in San Jose, CA.