Multimodal Signal Processing: Theory and Applications for Human-computer Interaction

Sensor Review

ISSN: 0260-2288

Article publication date: 29 March 2011

335

Citation

(2011), "Multimodal Signal Processing: Theory and Applications for Human-computer Interaction", Sensor Review, Vol. 31 No. 2. https://doi.org/10.1108/sr.2011.08731bae.001

Publisher

:

Emerald Group Publishing Limited

Copyright © 2011, Emerald Group Publishing Limited


Multimodal Signal Processing: Theory and Applications for Human-computer Interaction

Multimodal Signal Processing: Theory and Applications for Human-computer Interaction

Article Type: Book review From: Sensor Review, Volume 31, Issue 2

Edited by Jean-Philippe Thiran, Ferran Marqués, Hervé BourlardAcademic PressBoston, MANovember 2009$130.00352 pp.ISBN: 978-0-12-374825-6web link: www.elsevier.com/wps/find/bookdescription.cws_home/717211/description#description

Multimodal Signal Processing (MSP) presents an overview of an emerging field that is concerned with exploiting multiple modalities of communication in human-human (HHI) and human-computer interactions (HCI). The modalities include spoken, written and graphical form, intonation, gesture, facial expression and body language, often simultaneously. MSP aims to support/extend these modalities computationally and create more natural HCI interfaces, utilising different channels (e.g. audio, video) and different media (e.g. speech, text, sound, and graphics). MSP presents approaches derived from signal processing, machine learning, and social/human behaviour analysis/modelling, for analysing, processing and fusing modalities.

MSP is likely to be of interest to postgraduate students, researchers, practitioners, and application developers. It comprises three parts. Part I “Signal processing, modality and related mathematical tools” presents the basic elements of MSP. Chapter 1 gives an Introduction. Chapter 2 details the main concepts – Statistical Machine Learning, focusing on support vector machines for binary classification and hidden Markov models for speech recognition. Chapter 3 examines Speech Processing – speech and speaker recognition and text-to-speech synthesis. Chapter 4 considers aspects of Natural Language Processing – natural language understanding (NLU), natural language generation (NLG), and dialogue processing, combining NLU and NLG. Chapter 5 reviews Image and Video Processing tools for face analysis, hand-gesture analysis, head orientation and body gesture analysis. Chapter 6 examines handwriting and sketching dynamics.

Part II “Multimodal signal processing and modeling” presents developments in MSP for HCI. Chapter 7 reviews basic concepts and advantages of Multimodal Analysis. Chapter 8 presents Multimodal Information Fusion, with particular regard to fusion level (sensor, feature, score, rank, decision) and compares adaptive and non-adaptive fusion. Chapter 9, Modality Integration Methods, presents two fusion applications – high-level audio-visual speech recognition and low-level multimodal speaker localisation (finding the speaker in a multi-person video). Chapter 10 illustrates multimodal integration through a Recognition Framework with two applications – enhanced speech recognition and biometric authentication. Chapter 11 discusses management of very large collections of Multimodal Data, Metadata and Annotations via databases.

Part III “Multimodal HCI and HHI” illustrates applications of MSP systems for HCI and HHI analysis. Chapter 12, Multimodal Input, explains and reviews a new class of user interfaces combining modes such as speech, pen, and touch. Chapter 13 presents aspects of Multimodal HCI Output including speech synthesis, animation and coarticulation. Chapter 14 considers Interactive Representations of Multimodal Databases for searching/browsing large image collections. Chapter 15 discusses aspects of Modelling Interest in Face-to-Face Conversations from Multimodal Nonverbal Behaviour.

Informed by recent research and with a clear structure, MSP balances theory and applications, also the breadth and depth of the field, whilst avoiding duplication. It has comprehensive references plus useful links to a variety of tools and projects. The writing quality is high, the style readable, and there are few glitches/typos. A sequential reader might prefer greater conciseness occasionally, but MSP will equally suit a reader who dips in – Chapter 12 would be a good starting point. An overall summary/epilogue would be useful.

With rapid expansion in this field, MSP serves as a good introduction and springboard to the many forthcoming developments in multimodal HCI processing as well as in other fields.

A.G. DeakinDepartment of Electrical Engineering and Electronics,University of Liverpool, Liverpool, UK

Related articles