3-D telepresence

The advent of virtual conferences in online metaworlds

Comments

Two weeks ago I wrote about an invitation to join Sun's Oct. 10 debut in Second Life. On the same day, coincidentally or not, IBM invited me to its alumni virtual block party on the 12th. I'd been itching to try my hand at virtual cinematography, so I donned my avatar, went to the party, and used the in-world movie camera to document the event.

The screencast I made pokes some fun at what I found there: PowerPoint slideshows and breakout sessions. But it also raises a serious issue: What are the real benefits of simulated social spaces?

Given the political, economic, and climatological instability created by our petroleum-powered transportation systems, that's not just an academic question. And I do think 3-D telepresence will play a key role. But the devil's in the details.

Consider videoconferencing. Even though it's broadly available now, I don't know anybody who finds it an acceptable substitute for face-to-face interaction. We rely heavily on the telephone because the voice channel creates a high-fidelity emotional connection, but we rarely escalate to video. Instead we jump into a car or onto a plane in order to be physically present.

Why isn't videoconferencing more compelling? After all, when we say we want to look the other person in the eye, what we really want to do is read the microexpressions of the face. As Malcolm Gladwell points out in Blink, people adept at reading faces can literally read minds. And at a sufficient frame rate the visual channel can transmit those microexpressions.

Talking-heads videoconferencing doesn't, however, convey body language. One of the valid arguments for 3-D social spaces is that, in principle, they can. In Second Life, for example, you can instruct your avatar to yawn or stretch or chuckle or dance. But you perform these gestures by selecting them from a menu. They're not connected to anything you do naturally.

Neither is your speech. Voice chat isn't natively supported in Second Life, so the meeting I attended was augmented by a conference call -- but that was mostly for listening to presenters. Conversation took the form of text chat, which renders as avatars typing on invisible keyboards.

Although there are options for recruiting out-of-band VoIP channels, emotional realism will require closer integration. Many years ago at MIT's Architecture Machine Group (now the Media Lab), my former Byte colleague Howard Eglowstein was part of a team that made videoconferencing work credibly at 2,400 baud by focusing on animation of moving lips and eyes at the expense of other facial features. The principle still applies. Environments such as Second Life will need to be aware of the voice channel, animate lips and eyes accordingly, and assign high priority to the rendering of those animations.

Often, of course we care as much about the document to which the group attends as we care about the facial expressions or body language of the group. Videoconferencing has always had the notion of a special camera trained on that shared document. In Second Life you paint Web pages onto objects, but it's hard to read them that way. I'd rather see those pages in a browser that's synced to the presenter's browser.

Full 3-D telepresence is still a long way off. But a few strategic enhancements could make systems such as Second Life effective now for certain business and social purposes.

Join the newsletter!

Error: Please check your email address.

More about IBM Australia MIT Transportation