Multimodal Interfaces Channels
Multimodal interfaces enable different ways of interaction through different channels. These channels include touch, speech, gesture, and etc. Speech interface is one of the modality that takes the audio inputs from the user and generates responses according to the input.
Communication through speech is the natural way of interacting. Speech interface is being used in different applications. One of the prominent areas of application includes Gaming. The speech interface makes the game play more natural and more involving. The user feels more involved in the game with the speech interface.
People with special needs, cannot communicate with a computer in a traditional way. So many techniques and aids are developed to support the interactive communication between the computer and the people with impairments. Speech recognition, eye gaze detection are some of the techniques used to help these people. Games are a form of entertainment for computer users. But these communities of people are unable to play computer games using the traditional input devices such as mouse, keyboard, game pad and etc. They need to have other means of communication channels. By using different modalities games can be made so that they can be played by the people with special needs.
Logical games like Chess, Sudoku and some puzzle games increase the intellectuality of the players. Speech interface cannot completely replace the traditional ways controlling the input, because games often require complex combination of different input devices such as mouse, key board or game pad. But the communication for strategic, turn based games like Chess, Checkers, can be almost completely replaced by using speech interface. Because, these kind of games require only a simple input commands from the players.
Moreover, speech interface brings interest for the normal players too. Because, Computer based chess game is, basically, software running on a computer. It makes the user feel that, he is basically playing a game with an artificial program. Natural means of communication makes the user feel that he is playing the game with a human opponent rather than a computer. A game is better played when the interaction with the opponent is more natural. Voice communication between the player and the computer makes the game play natural. Playing games on laptops is a little bit complex, when additional input devices are not available. Because it is a difficult move the pointer fast using the touch pads. Speech embedded games are easier to play on laptops.
AIMS AND DELIMITATIONS
We are attempting to build a speech interface to the chess game. Taking the time into consideration, we just wanted to build a basic speech communication between the user and the game along with a partial natural interaction. We have categorized levels of communication into simple, modest and complex or natural. In simple level of communication, user has limited command over the dialogues. User must use the predefined communication dialogue system to play through the game. Modest level offers basic communication and a partial natural communication. Natural communication gives the user, a full command over the communication. User can interact with the game in a natural way. But due to the time limitations we have chosen to develop the modest level of communication interface for the chess game.
The main of aim of our project is to build to a speech interface to a Chess game. The main requirements of the project is to make the chess game to play using the speech commands. The requirements are met when the chess game is actually playable with modest level of communication.
First thing, in our project is to get the source code for the chess game, which is freely available in the Internet. After analyzing the source code and after the complete understanding of the logic we will start developing the speech interface for the game. First, we will attempt to deal with user side communication dialogues. Then we will try to develop the computer communication dialogues with the user. This communication may possibly be through an animation face speaking to the user. After the minimal requirements are met, if time allows, we will try to make the dialogues a little more natural.
One of the members of the group, Mr. Abdus Sattar is well aware of the software languages and the programming, and the other member, Bharat is good at logical decisions. So based upon our strengths we have divided our work. Mr. Abdus Sattar will be taking care of the analysis of the source code and all the tasks related to programming language. Mr. Bharat will be taking of the work with CSLU toolkit, which is used to generate the speech communication dialogues. We work together on MS speech API. Both the members will work together throughout the process of project, and in all the situations where typical decisions have to be made.
As the first step of the project is to have a complete understanding of the source code of the chess game, it may take at least 2 days to have a thorough understanding. Preparation of the dialogues for the communication takes one day, but the communication dialogues may alter throughout the process depending upon the choices we might consider. Building up the interface using the CSLU tool kit and logical embedding the dialogues at appropriate positions and connecting both the game and the interface may take up to 7 to 10 days. Since, we can only access the CSLU tool kit software in the labs we will schedule our time daily to work in the labs. We thought of working in the labs in the afternoon times (possibly from 1 PM to 6 PM).
Since, speech is the natural medium of communication, the general expectations of the speech controlled game might differ from the results of the project. Natural mode of communication may not be possible within the short duration of time. Developing the speech interface may face some risks in the process of binding the game and the speech commands together. Human communication contains similar words and since the pronunciation of one person may differ from one person to another person. This may force us to alter the choices we made before. We are trying to build the dialogues in way that they are distinct and easily understandable by the player.
By the pre study for the project which included the search for the articles, we found that even though speech interface is used in many application areas, gaming field hasn't fully utilized the speech interface. We found only a few of the articles which are related to the speech interface for games. We tried to present the details of the articles related to our project.
In the article "Designing Speech interfaces", authors Nicole yankelovich and Jennifer Lai discussed the issues related to the designing of speech interfaces. They mentioned that using speech interfaces to computer applications has many potential advantages. They said that the input commands or data entry is very easy through the speech interface. One of the problems they mentioned is that even though speech input is error prone speech output is often difficult to understand. The article described about the types of speech synthesizers. "Parameterized" synthesizers are small and fast but are not natural. "concatonative" are more natural but are more resource intensive. The article mentioned that successful speech interfaces must not violate the conversational conventions. Because of the noisy backgrounds and different reasons speech interfaces is not error prone. Using different modalities when the error occurs repetitively, they can be controlled.
In the article "A Fundamental Study of Novel Speech Interface for Computer Games", authors Hiroaki NANJO, Hiroki MIKAMI, Suguru KUNIMATSU, Hiroshi KAWANO and Takanobu NISHIURA described about a novel speech interface for computer game systems. They mentioned about a problem in their article, which is, the problem of excited speeches generated by the users while playing the game. They mentioned that ASR (Automatic Speech Recognizer) works well with polite speeches but it faces problems with excited speeches. They distinguish the shouts from the screams. A shout contains linguistic information, whereas the scream has no linguistic information. Article suggests that it is more preferable to distinguish the shout from naturally uttering speeches since it provides a more natural way of ineraction. Authors performed some tests to use the shouted speeches as a form of interaction. They also mentioned about including the laugh, cough and whistle sounds as a form of interaction as their future work.
In the article, "Applying Speech Interface to Mahjong Game", authors Jie Zhang1, Ji Zhao1, Shuanhu Bai2, Zhiyong Huang mentioned about the results of applying a speech interface to a popular Chinese game called Mahjong. In this article authors described a multi-model interface for the game. Article mentioned about the problems involved in the speech interfaces while playing the games, such as attention grabbing and transient speech. Once the voice output from the computer is over, it is gone no matter whether the player listened to it or not. So they made the dialogues very short. Authors used the synthesis engine to develop the speech interface. After the experiments conducted by the authors results proved that speech interface enhanced the interaction between the users and the computer.
In the article, An Implementation of Multi-Modal Game Interface Based on PDAs, authors Kue-Bum Lee, Jung-Hyun Kim and Kwang-Seok Hong mentioned about a PDA based multi-model network game interface using speech, gesture and touch sensations. In the article authors mentioned that multi-model games have created a new opportunity for mobile operators and other service providers to extend the popularity of their games. They mention two approaches to address the accessibility issues of the people with special needs. One, generally games are developed to be compatible with assistive technologies, that is speech interface can be useful for the non action games which don't require fast reflexes and reactions. Two, special games can be developed specifically for the people with disabilities. Propose a PDAs-based multimodal game interface using double-touching by coupling embedded speech and gesture recognizer, and implement the multi-modal omok game using a TCP/IP-based PDA network. The interface for the desktop computers and PDAs is different. They used speech synthesizer for speech and wireless glove to recognize the gestures. The results of the tests showed improved performances.
In the article "Universal speech interface", authors Ronald Rosenfeld, Dan Olsen and Alex Rudnicky discussed about the advantages and the problems included in building the speech interfaces. They mentioned about three fundamental advantages of the speech interfaces.
- Speech is an ambient medium rather than an attentional one.
- Speech is descriptive rather than referential.
- Speech requires modest physical resources.
Authors felt that the main benefit of the speech interface is not the naturalness in the communication but the ubiquity. Because of the simple size of the devices such as a headphone and a mike authors believe that speech interface can be embedded into applications very easily. Situations where text, pointing and other devices are not accessible. The authors mentioned about different approaches for creating usable speech systems such as Natural language, Dialogue trees and commands. Authors mentioned that context based natural interaction is more feasible than a general interaction mechanism.