Handheld Audio May Be the Next Big Thing
In the May issue of EContent, we talked about text-based services and applications for handheld devices ("It's a Small World After All: Content for Wireless
Without a doubt, audio-enabled mobile and Wireless Information Devices (WIDs) will be part of our near-term future. In fact, there are already a number of consumer and B2B services on the market-and many more poised to leap into the audio space just as soon as the hardware devices, standards, and wireless networks catch up.
AUDIO FLAVORS
The audio/voice arena is complex; it doesn't fit easily into neat, pigeonholed categories. As an input mechanism, there's voice-driven automated menu systems we're all already familiar with ("For customer service, press or say two"). There's passive pre-recorded audio content-- which could be voice, music, radio broadcast, any sound-played over a handheld device in either downloaded mobile or streaming wireless format. There's pre-recorded consumer dial-in information services. And there are B2B enterprise applications for dial-in listening and voice updating of proprietary databases.
Some commercial services you may have already heard of: Tellme and Indicast for current news, stock quotes, and horoscopes; customizable Web-based "audio content portal" live365.com; or wakeup/reminder ring-you-up service iPing. In the B2B space, dial-in solutions for ecommerce, business productivity, and staff management are becoming prevalent. While audio downloads and dial-in voice information are plentiful, true wireless applications are still in nascent form-on the drawing board or in beta testing.
VOICE WEB: VOICE-ENABLING WEB CONTENT
Web-based services are transformed into phone-accessible voice applications with "voice browsers" using speech recognition, pre-recorded audio, and speech synthesis (i.e., machine-generated voice). The real benefit of voice interaction is that it overcomes the limitations of tiny keypads and diminutive screens on most wireless devices. A news story that takes five minutes to read on a WAP phone can be heard in half the time using streaming audio. Voice interaction also holds the possibility of speech dialogs to interact with Web services, giving users the choice of responding by pressing a key or speaking a command. And, as aggregator and distributor Audible Inc. is fond of pointing out, it provides an accessible alternative "when the eyes are busy but the mind is free."
SHORT AND SWEET
What types of content are appropriate for mobile and wireless audio delivery? Dynamic, changing information and brief snippets-email headers, custom news clips, stock quotes, sports scores, and movies listings, for instance. Longer offerings can include speeches, vintage radio shows, entertainment jokes and comedy shows), and short stories or poetry. But, warns Jonathan Korzen, senior manager of media relations for Audible, "Don't ask your customers to download large files. Paying attention to compression and file size are the most important things..and a good compression algorithm is key." Audible resolves this issue by offering its customers four different formats in various qualities (AM radio quality, for instance, takes only 2MB of space for one hour of listening).
Of course, Korzen admits, it's not only what's short, but whatever people want to hear and are willing to pay for. "It's marketability-if we think people want it, we offer it to them."
GOT CONTENT? TWO RECIPES FOR ENTERPRISE AUDIO
Initially, as with text conversion from HTML to, say, WML for wireless, I expected to find transcoding software solutions being employed to repurpose content from text to audio. In fact, that's not the case at all. Due to voice quality issues (synthesized voice is not yet considered prime time for consumers), most Web-to-- voice conversion is occurring in customized enterprise solutions for internal use. Two companies who are developing applications for machine-generated voice on-the-fly are VocalPoint and Informio [See VocalPoint's Profile in the May 2001 issue of EContent, p. 56].
VocalPoint applies style sheets; Informio transcodes HTML to VoiceXML (a scripting language based on XML that defines voice segments and supports the creation of menu prompts to enable Internet access over smartphones and wireless PDAs).
VocalPoint: Style Sheets
VocalPoint describes its service as a "voice Web browser" targeting the B2B space for proprietary applications by enterprise customers. It focuses on healthcare, employee self-service (HR), utilities, the financial and insurance markets, and sales force automation (SFA).
Garry Chinn, VocalPoint CTO, clarifies, "We work with customers to figure out which content to voice-enable. Static HTML is not difficult to voice-enable, but in ecommerce sites, there are dynamic pages with database information. If you need to enable dynamic information, that's our specialty."
Using HTML on existing Web sites, VocalPoint extends style sheets to voice applications. To vocalize MyYahoo!, Chinn explains, you need to add about 30 lines of style sheet code. (Five to 30 lines is the standard amount, depending on the complexity of the page.) "We can embed at least part of the data with pre-recorded static info-- this is put in the style sheet as 'don't read this content, play the recording instead.' Dynamic information will still need to be synthesized, though."
One particularly interesting application is a dial-in, vocalized "employee self-service" solution. By dialing a toll-free number from any old telephone, employees can learn about their personalized HR benefits, find out how many sick days they have, get new membership cards, etc.
"Most companies have already built out their Web applications," Chinn explains, "so we leverage that into voice delivery. For instance, we're currently testing a mobility solution for a large utility-energy company. We've voice-enabled their sales force so they can call in for up-to-date customer and product information."
Informio: VoiceXML
Informio, "a wireless Web infrastructure services company," builds custom applications to give mobile professionals access to critical business data over a cell phone or wireless device. MUPpies (that's Mobile Urban Professionals) can use voice to interact with their own password-protected data-to update databases, input new sales figures, delete old sales leads, etc. Based on VoiceXML, such applications are geared for enterprises with large workforces who use sales force automation, customer relationship management (CRM), and other internal databases-think pharmaceuticals, financial firms, insurance, healthcare.
Informio's proprietary Unified Media Browser, a voice version of an Internet browser, supports VoiceXML as the interface for all content and applications. (VoiceXML 1.0 was adopted as the basis for development of a W3C dialog markup language in May 2000.) Informio partners with Nuance for speech recognition technology, and recently switched to SpeechWork's Speechify for text-to-speech (TTS).
IMAGE PHOTOGRAPH 28VocalPoint extends style sheets to voice applications.
IMAGE PHOTOGRAPH 29Informio builds custom voice apps based on VoiceXML.
An innovative and soon-to-be-everywhere service-mobile audio email-allows users to dial in and use their voice or a touchtone command to listen to email messages as streaming audio, MP3, or .wav files. Coming soon: the ability to hear attachments (but for long documents, there'll be an option to forward them to a printer or fax machine, too).
Mark Lowenstein, Informio's chief industry analyst, postulates, "Voice is complementary to what's happening in the data world; in some ways, voice addresses the shortcomings of text-based devices. Screen scraping is not too successful... we're trying to avoid some of those interface issues with audio. The Nirvana we're focused on is the two working hand in hand. But certain things have to fall in place first-the devices and the networks have to be brought along."
IMAGE PHOTOGRAPH 36Neil Budde, vice president, editor, and publisher, WSJ Online, notes, "I see the blending of text and audio services in customized applications as something with a lot of potential."
LISTEN WHILE YOU WORK, WORK OUT, GO TO WORK
The more common solution for "making audio," particularly for the consumer market, is radio-quality recordings of a human voice in a sound studio. Audible Inc. is one of the leaders in this genre, offering pay-per-download audiobooks, lectures, public radio programs, newspapers, comedy skits, and much more. Audio titles are then played back for listening either from a PC or a variety of mobile, Audible-ready devices (hands-free accessories such as headphones or a car kit can be used while you're commuting, exercising, or traveling).
Audible's content partners include more than 160 audiobook publishers, broadcasters, magazine and newspaper publishers, business information providers, and educational and cultural institutions. Says David Simpson, Audible director of business development, "We receive both analog and digital content (from our content partners) and produce it as human voice-we have studios in our offices with voice talent for recordings. The process is editing, encoding, compression, and encryption."
Although most of Audible's usage falls into the mobile, dock-and-go category, that will change with wireless. Future Pocket PCs will be able to get automatic updates for industry-specific content. Simpson explains the technical details: "To deliver wireless, we create a quarter-size HTML Web page that links to a Web server to deliver the audio file on a Pocket PC with an Audible player. There would be an encrypter/decrypter in the Pocket PC, delivered in some mutually agreeable Codec (COder-DECoder), transcoded over a TCP/IP connection into the RAM of the receiver device, played back with a player device."
Is Audible considering text-to-voice software? "Today, the only way to go is professionally rendered human speech. We think that's what people want to listen to. We're not doing anything with text-to-- speech, and nothing is on the drawing board until it gets better."
THE FUTURE: PERSONALIZED AND MULTIMODAL
What can we expect on the wireless horizon? Look for increased personalization, like customized audio portals and Internet radio (with targeted ads, of course), and mobile, personal productivity tools-voice-enabled email, calendar, to-- do lists, etc.
And the ultimate wireless vision (at least until The Next Big Thing) is "multi-- modal" applications. This is where voice and keypad (text) input is combined with audio and visual output, so you could, say, read a WAP message and then tap to initiate a phone call. And it would be contextually sensitive, too, so at your desktop you might read email as text, but while driving, you'd choose to have a voice- and audio-centric experience.
So should we all rush out and get second mortgages to invest in wireless audio? Well, no, not just yet. There are still hurdles to overcome. Standards for hardware devices, user interfaces, and programming languages need to be worked out so developers can commit to applications without risk of writing appliance-specific interfaces for a device that might not be on the market tomorrow. Currently, a fragmentation of standards for wireless multimedia delivery makes that risky.
Also, upgrading to 3G (third generation) wireless networks to support high-speed, high-capacity audio and video is proving more arduous than imagined. Combined voice and data are what's needed for the wireless environment, but in the U.S. at least, the path is littered with standards, regulatory issues, and the question of profitability. It will certainly happen, but the question is how soon. Many operators, handset manufacturers, and equipment manufacturers will have trouble surviving until the boom times start-by which time, the industry will probably be dominated by fewer, consolidated players ("3G Mobile: a Booming Industry, One Day, Maybe." Commentwire by Datamonitor, http:// www.commentwire.com/commwire-story .asp?commentwire_ID=1171, April 27, 2001).
The Wall Street Journals Jessica Perry agrees. "The jury's out on the willingness to pay for any of this stuff. Some of these companies are going to have difficulty staying viable, but there's a place for all media in convergence... it still has to be figured out."
SIDEBARWho's On First?
SIDEBARNumerous device manufacturers are rushing to be first-to-market with their enhanced smartphones and wireless PDAs that handle both voice and data in the same "experience." Multimodal browsing in next generation handsets will support voice and keypad input combined with audio and visual output. Here's a sample list (illustrative, not exhaustive), but the market is growing so fast there will probably be twice as many devices by the time you read this.
SIDEBARWIDS
Types of audio-enabled Wireless Informatin Devices
SIDEBARSmartphones
(Cell phones and PDAs combined)
Kyocera VisualPhone
Motorola iDEN Handset
Sendo Z100 Multimedia Phone
PDAs with built-in audio capability
Compaq iPAQ Pocket PC
Casio Cassiopeia Pocket PC
HP Jornada Pocket PC
SIDEBARPDAs that are audio-enabled using add-ons
Handspring Visor, using add-on wireless Springboard Module
Palm Handheld with MultiMediaCard
Digital audio players
(specifically for music and audio applications)
Diamond Rio MP3 players
Iomega HipZip
Philips Rush
SIDEBARPhone and audio player combined
Samsung Uproar MP3 Phone
SIDEBARNeedless to say, it takes more that just the latest hardware to make mobile and wireless audio work. You need service carriers like AT&T, Verizon, and Sprint PCS; software and application providers such as VocalPoint, Informio, and TellMe; content aggregators and distributors such as Audible.com; and content providers like the Wall Street Journal.
SIDEBARContent Provider Case Study: The Wall Street Journal
SIDEBARThe Wall Street Journal was one of the first newspapers to make daily audio digests of its content available to subscribers from their cell phones, mobile audio devices, or downloaded to a computer or MP3 player. Now there are more than a dozen offerings, from one-minute Hourly Business Reports to five- to seven-minute reads of "Heard on the Street," a popular column from its Money and Investing section. Jessica Perry, vice president of business development & consumer electronic publishing, remarks that "after we saw audio could be successful, we began producing content just for the Internet." Some examples?
SIDEBAR"Heard on the Net," produced a few times a week, "Technology Headlines," updated a couple times a day, plus a three-minute "Career Journal" taken from the WSJ Online Career Web site.
In fact, this is an expanding area for the Journal. Neil Budde, vice president, editor, and publisher of WSJ Online, comments, "Audio is going to be a component of mobile access for a while. It's easier than text to use in many places-in cars, for example." WSJ recently signed an agreement as the primary business news provider with General Motor's OnStar "telematics" system (audio information services bundled into automobiles that combine GPS satellite tracking and wireless communications for roadside assistance and remote diagnostics)-a very mass-audience distribution for exposure.
How They Do It
Rather than tagging existing data in XML for
SIDEBARconversion to other formats as WSJ does with text-based content, audio is produced by the WSJ Radio Network, recorded, and then delivered to third parties to convert for delivery. WSJ partners with a variety of distributors and service providers such as TellMe, Audible.com, MediaBay.com, Voquette, and Informio.
Predicts Neil Budde, "I see the blending of text and audio services in customized applications as something with a lot of potential. For example, a customer who registers with the Journal for alert services might have the headline sent as text, with a clickable phone number embedded in it; you click on the phone number to hear the full story.
SIDEBARTieme-sensitive business news such as stock prices might be in higher demand for that kind of service than other stories. Another model may be to set it up so you receive three headlines related to your interest profile per day, then click to hear full stories."
Writing for the Ear
SIDEBARPaul Bell, executive director of broadcast services, recounts how it all got started: "Audio production began way back in 1980 with The Wall Street Journal Radio Network. One- to two-minute reports were
SIDEBARproduced from a studio in Manhattan and syndicated to what has since grown to 165 radio stations across the country. When Real Media and Windows Player came out, with their ability to take down simple audio files, we began to convert Radio Network news in that format.
"What began on the Web as a text business quickly grew to an audio business two or so years ago. Web-based audio was first designed to leverage the manpower, talent, and expertise already in the radio newsroom. We realized that we had the raw material to fashion audio reports and so, based on what a client needed, we developed a catalog of offerings. We specifically write for the ear, not for the eye. But the delivery method is unimportant to us-that's what our content aggregators and distributors worry about."
When asked if the Journal plans to use a realtime speech synthesis software (generating machine voice on-the-fly) for wireless, Bell exclaims, "No, we hate that like the plague! The text-to--
SIDEBARvoice softwares are all clunky-that technology is not yet prime time. People might 'put up with it' if they have to, but that's not the best use-especially when customers are paying for content! Besides, it would be content written for the eye, not for the ear."
Bell gives this advice for others who would vocalize their offerings: "Be extraordinarily careful about what your customer is willing to listen to. You have to understand what they're willing to sit still for-that might be a seven-minute report or a 30-second update. I suspect the fundamental issue is psychographic, that is, the amount of perceived patience an individual has for a specific task."
AUTHOR_AFFILIATIONJAN ZASTROW (hyperclick@hawaii.rr.com) is president of HyperClick Online Services.
Comments? Email letters to the editor to ecletters@onlineinc.com.