Immersive Acoustics
- Multi-channel Signal Processing for Next-generation Teleconferencing
& Communication
Two important advances took place
in telecommunications in recent years. One is the ubiquity of packet networks
and the other the exponential growth of data transfer rate afforded by fiber
optic broadband networks. While the widely popular Internet has taken advantage
of these advances, the full potential of broadband packet networks is yet
to be realized. As a matter of fact, these two advances offer communication
engineers tremendous opportunities to revolutionize the traditional telephony
into tele-collaboration networks to support multi-dimensional information
sharing that makes full use of human capabilities in binaural hearing and
binocular vision to maximize the joint productivity. Broadband packet networks
bring about an information transport mechanism that does not have the rigid
notion of a traditional voice circuit and are ready to support dynamically
allocated multi-channel tele-collaboration applications.
The challenge in remote multi-dimensional information sharing is
an advanced teleconferencing environment which allows reconstruction of
the far-end acoustic and visual scene at the near-end so that conferencing
participants are able to maintain the sense of interaction, keeping track
of who is speaking and what has been said and done, as if all the collaborators
were in the same room. The recent success in developing a stereophonic echo
cancellation algorithm for hands-free teleconferencing indicates that indeed
spatialization of sound (and immersive acoustics for a complete acoustic
environment) is imperative in achieving a much enhanced conferencing experience
and productivity. Building upon on the past success, our current research
aims at generalizing the previous result to multi-channel, multi-party
communications, beyond an elementary point-to-point scenario, with an
additional challenge in the networking area in order to ensure a high
level of quality of service.
This project is organized to address the technical issues around
multi-channel signal processing and communication for tele-collaboration,
with emphasis in multi-source (e.g., multiple talkers) and multi-channel
(e.g., multiple microphone input) information processing for echo control,
source tracking, ambient interference suppression, and spatial sound
reconstruction. It also extends the current advance in point-to-point
stereophonic teleconferencing to a multi-party scenario, involving more
than two participating conferencing sites. Technical innovations and
merit in this research comprise four major components:
- System and signal gain plan analysis
related to immersive acoustics;
- Generalization of multi-channel
audio and acoustic signal processing based on the multi-input-multi-output
(MIMO) system formulation;
- Incorporation of source separation
and tracking algorithms and introduction of “sound objects” in multi-channel
echo cancellation and acoustic experience reconstruction for conferencing;
and
- Multi-party communication protocol
design for integration with object-based acoustics, to enhance the visual
effects and to ensure quality of service for tele-collaboration.
Our research will lead to next-generation
teleconferencing and tele-collaboration systems, bringing broad impacts
on many fronts. The new set of technologies has the potential to shift
the paradigm of telecommunications from the traditional telephony to a new
mode of communication involving multi-dimensional information sharing that
requires high quality sound, acoustics and visual effects to work with
the natural human capability in binaural hearing and stereo vision. Figure
1 depicts such a paradigm change, from the traditional telephony to a multi-channel,
multimodal collaborative conferencing scenario. Used in education, multi-channel
information sharing is not only beneficial but imperative for distance learning
to be effective. Multi-channel signal processing that enables multi-phonic
acoustic echo cancellation and sound spatialization for hands-free teleconferencing
will boost the collaboration productivity tremendously among conferencing
participants. With the recent growth in the use of teleconferencing, which
is at times considered the next killer application for broadband packet networks,
the new set of technologies will help materialize or further drive the broadband
revolution by providing truly beneficial applications to the user. Also in
light of the recent security concerns, this new set of technologies will
provide a sensible alternative to travel without the issue of compromising
productivity.
Fig. 1 A multi-channel
network for multimedia, multi-modal collaboration and Interactions
A number of technical challenges need to be tackled
to realize this vision of multi-dimensional, multi-modal information
sharing. These include:
- A stereophonic acoustic echo control
and cancellation algorithm that achieves reasonable reduction in
acoustic echo (15-20dB echo return loss) to support stereo teleconferencing
with spatialized audio output. (A real time demonstration is available.)
- Generalization of stereophonic
teleconferencing to multi-channel teleconferencing for 3-D effects.
- Synthetic stereophonic and multi-phonic
reconstruction of room acoustics to support multi-party communications.
- Real-time multi-camera image
and video capturing, reconstruction and synthesis.
- Multi-channel source localization
and talker tracking. (A single-source localization and tracking
system demonstration is available.)
- Multi-channel source separation.
(A demonstration of a 2-channel case under a benign condition is
available.)
- Multi-sensor, multi-channel acoustic
field modeling and reconstruction.
- Wideband speech (50-7000Hz) and
audio (audible spectrum) coding.
- Integration with packet networks.
|