Search results
1 – 10 of over 11000Sunhee Kim, Yumi Hwang, Daejin Shin, Chang-Yeal Yang, Seung-Yeun Lee, Jin Kim, Byunggoo Kong, Jio Chung, Namhyun Cho, Ji-Hwan Kim and Minhwa Chung
This paper describes the development process of a mobile Voice User Interface (VUI) for Korean users with dysarthria with currently available speech recognition technology by…
Abstract
Purpose
This paper describes the development process of a mobile Voice User Interface (VUI) for Korean users with dysarthria with currently available speech recognition technology by conducting systematic user needs analysis and applying usability testing feedback to prototype system designs.
Design/methodology/approach
Four usability surveys are conducted for the development of the prototype system. According to the two surveys on user needs and user experiences with existing VUI systems at the stage of the prototype design, the target platforms, and target applications are determined. Furthermore, a set of basic words is selected by the prospective users, which enables the system to be not only custom designed for dysarthric speakers but also individualized for each user. Reflecting the requests relating to general usage of the VUI and the UI design preference of users through evaluation of the initial prototype, we develop the final prototype, which is an individualized voice keyboard for mobile devices based on an isolated word recognition engine with word prediction.
Findings
The results of this paper show that target user participation in system development is effective for improving usability and satisfaction of the system, as the system is developed considering various ideas and feedback obtained in each development stage from different prospective users.
Originality/value
We have developed an automatic speech recognition-based mobile VUI system not only custom designed for dysarthric speakers but also individualized for each user, focussing on the usability aspect through four usability surveys. This voice keyboard system has the potential to be an assistive and alternative input method for people with speech impairment, including mild to moderate dysarthria, and people with physical disabilities.
Details
Keywords
Michael Schuricht, Zachary Davis, Michael Hu, Shreyas Prasad, Peter M. Melliar‐Smith and Louise E. Moser
Mobile handheld devices, such as cellular phones and personal digital assistants, are inherently small and lack an intuitive and natural user interface. Speech recognition and…
Abstract
Purpose
Mobile handheld devices, such as cellular phones and personal digital assistants, are inherently small and lack an intuitive and natural user interface. Speech recognition and synthesis technology can be used in mobile handheld devices to improve the user experience. The purpose of this paper is to describe a prototype system that supports multiple speech‐enabled applications in a mobile handheld device.
Design/methodology/approach
The main component of the system, the Program Manager, coordinates and controls the speech‐enabled applications. Human speech requests to, and responses from, these applications are processed in the mobile handheld device, to achieve the goal of human‐like interactions between the human and the device. In addition to speech, the system also supports graphics and text, i.e., multimodal input and output, for greater usability, flexibility, adaptivity, accuracy, and robustness. The paper presents a qualitative and quantitative evaluation of the prototype system. The Program Manager is currently designed to handle the specific speech‐enabled applications that we developed.
Findings
The paper determines that many human interactions involve not single applications but multiple applications working together in possibly unanticipated ways.
Research limitations/implications
Future work includes generalization of the Program Manager so that it supports arbitrary applications and the addition of new applications dynamically. Future work also includes deployment of the Program Manager and the applications on cellular phones running the Android Platform or the Openmoko Framework.
Originality/value
This paper presents a first step towards a future human interface for mobile handheld devices and for speech‐enabled applications operating on those devices.
Details
Keywords
Holley R. Lange, George Philip, Bradley C. Watson, John Kountz, Samuel T. Waters and George Doddington
A real potential exists for library use of voice technologies: as aids to the disabled or illiterate library user, as front‐ends for general library help systems, in online…
Abstract
A real potential exists for library use of voice technologies: as aids to the disabled or illiterate library user, as front‐ends for general library help systems, in online systems for commands or control words, and in many of the hands‐busy‐eyes‐busy activities that are common in libraries. Initially, these applications would be small, limited processes that would not require the more fluent human‐machine communication that we might hope for in the future. Voice technologies will depend on and benefit from new computer systems, advances in artificial intelligence and expert systems to facilitate their use and enable them to better circumvent present input and output problems. These voice systems will gradually assume more importance, improving access to information and complementing existing systems, but they will not likely revolutionize or dominate human‐machine communications or library services in the near future.
Papers and articles on automatic speech recognition appear in many different journals. Research on the nature of speech is prominent in the Journal of the Acoustical Society of…
Abstract
Papers and articles on automatic speech recognition appear in many different journals. Research on the nature of speech is prominent in the Journal of the Acoustical Society of America, and for research on algorithms for speech recognition the IEEE Proceedings on Acoustics, Speech and Signal Processing can be recommended.
Soo‐Young Suk and Hyun‐Yeol Chung
The purpose of this paper is to describe a speech and character combined recognition engine (SCCRE) developed for working on personal digital assistants (PDAs) or on mobile…
Abstract
Purpose
The purpose of this paper is to describe a speech and character combined recognition engine (SCCRE) developed for working on personal digital assistants (PDAs) or on mobile devices. Also, the architecture of a distributed recognition system for providing a more convenient user interface is discussed.
Design/methodology/approach
In SCCRE, feature extraction for speech and for character is carried out separately, but the recognition is performed in an engine. The client recognition engine essentially employs a continuous hidden Markov model (CHMM) structure and this CHMM structure consists of variable parameter topology in order to minimize the number of model parameters and to reduce recognition time. This model also adopts the proposed successive state and mixture splitting (SSMS) method for generating context independent model. SSMS optimizes the number of mixtures through splitting in mixture domain and the number of states through splitting in time domain.
Findings
The recognition results show that the developed engine can reduce the total number of Gaussian up to 40 per cent compared with the fixed parameter models at the same recognition performance when applied to speech recognition for mobile devices. It shows that SSMS can reduce the size of memory for models to 65 per cent and that for processing to 82 per cent. Moreover, the recognition time decreases 17 per cent with the SMS model while maintaining the recognition rate.
Originality/value
The proposed system will be very useful for many on‐line multimodal interfaces such as PDAs and mobile applications.
Details
Keywords
H.A. Dimuthu Maduranga Arachchi and G. Dinesh Samarasinghe
This study aims to examine the influence of the derived attributes of embedded artificial intelligence-mobile smart speech recognition (AI-MSSR) technology, namely perceived…
Abstract
Purpose
This study aims to examine the influence of the derived attributes of embedded artificial intelligence-mobile smart speech recognition (AI-MSSR) technology, namely perceived usefulness, perceived ease of use (PEOU) and perceived enjoyment (PE) on consumer purchase intention (PI) through the chain relationships of attitudes to AI and consumer smart experience, with the moderating effect of consumer innovativeness and Generation (Gen) X and Gen Y in fashion retail.
Design/methodology/approach
The study employed a quantitative survey strategy, drawing a sample of 836 respondents from Sri Lanka and India representing Gen X and Gen Y. The data analysis was carried out using smart partial least squares structural equation modelling (PLS-SEM).
Findings
The findings show a positive relationship between the perceived attributes of MSSR and consumer PI via attitudes towards AI (AAI) and smart consumer experiences. In addition, consumer innovativeness and Generations X and Y have a moderating impact on the aforementioned relationship. The theoretical and managerial implications of the study are discussed with a note on the research limitations and further research directions.
Practical implications
To multiply the effects of embedded AI-MSSR and consumer PI in fashion retail marketing, managers can develop strategies that strengthen the links between awareness, knowledge of the derived attributes of embedded AI-MSSR and PI by encouraging innovative consumers, especially Gen Y consumers, to engage with embedded AI-MSSR.
Originality/value
This study advances the literature on embedded AI-MSSR and consumer PI in fashion retail marketing by providing an integrated view of the technology acceptance model (TAM), the diffusion of innovation (DOI) theory and the generational cohort perspective in predicting PI.
Details
Keywords
Speech recognition machines currently on the market are all built upon the same research foundation. The most important milestones on the road to present‐day systems are reviewed…
Abstract
Speech recognition machines currently on the market are all built upon the same research foundation. The most important milestones on the road to present‐day systems are reviewed in this article based largely on an interview with Dr Roger Moore of the Royal Signals and Radar Establishment.
B.J. Garner, C.L. Forrester and D. Lukose
The concept of a knowledge interface for library users is developed as an extension of intelligent knowledge‐base system (IKBS) concepts. Contemporary directions in intelligent…
Abstract
The concept of a knowledge interface for library users is developed as an extension of intelligent knowledge‐base system (IKBS) concepts. Contemporary directions in intelligent decision support, particularly in the role of search intermediaries, are then examined to identify the significance of intelligent intermediaries as a solution to unstructured decision support requirements of library users. A DISCOURSE SCRIPT is given to illustrate one form of intelligent intermediary.
Li Xiao, Hye-jin Kim and Min Ding
Purpose – The advancement of multimedia technology has spurred the use of multimedia in business practice. The adoption of audio and visual data will accelerate as marketing…
Abstract
Purpose – The advancement of multimedia technology has spurred the use of multimedia in business practice. The adoption of audio and visual data will accelerate as marketing scholars become more aware of the value of audio and visual data and the technologies required to reveal insights into marketing problems. This chapter aims to introduce marketing scholars into this field of research.Design/methodology/approach – This chapter reviews the current technology in audio and visual data analysis and discusses rewarding research opportunities in marketing using these data.Findings – Compared with traditional data like survey and scanner data, audio and visual data provides richer information and is easier to collect. Given these superiority, data availability, feasibility of storage, and increasing computational power, we believe that these data will contribute to better marketing practices with the help of marketing scholars in the near future.Practical implications: The adoption of audio and visual data in marketing practices will help practitioners to get better insights into marketing problems and thus make better decisions.Value/originality – This chapter makes first attempt in the marketing literature to review the current technology in audio and visual data analysis and proposes promising applications of such technology. We hope it will inspire scholars to utilize audio and visual data in marketing research.
Details
Keywords
Rajasekhar B, Kamaraju M and Sumalatha V
Nowadays, the speech emotion recognition (SER) model has enhanced as the main research topic in various fields including human–computer interaction as well as speech processing…
Abstract
Purpose
Nowadays, the speech emotion recognition (SER) model has enhanced as the main research topic in various fields including human–computer interaction as well as speech processing. Generally, it focuses on utilizing the models of machine learning for predicting the exact emotional status from speech. The advanced SER applications go successful in affective computing and human–computer interaction, which is making as the main component of computer system's next generation. This is because the natural human machine interface could grant the automatic service provisions, which need a better appreciation of user's emotional states.
Design/methodology/approach
This paper implements a new SER model that incorporates both gender and emotion recognition. Certain features are extracted and subjected for classification of emotions. For this, this paper uses deep belief network DBN model.
Findings
Through the performance analysis, it is observed that the developed method attains high accuracy rate (for best case) when compared to other methods, and it is 1.02% superior to whale optimization algorithm (WOA), 0.32% better from firefly (FF), 23.45% superior to particle swarm optimization (PSO) and 23.41% superior to genetic algorithm (GA). In case of worst scenario, the mean update of particle swarm and whale optimization (MUPW) in terms of accuracy is 15.63, 15.98, 16.06% and 16.03% superior to WOA, FF, PSO and GA, respectively. Under the mean case, the performance of MUPW is high, and it is 16.67, 10.38, 22.30 and 22.47% better from existing methods like WOA, FF, PSO, as well as GA, respectively.
Originality/value
This paper presents a new model for SER that aids both gender and emotion recognition. For the classification purpose, DBN is used and the weight of DBN is used and this is the first work uses MUPW algorithm for finding the optimal weight of DBN model.
Details