Search results
1 – 10 of 529
The purpose of this paper is to call researchers’ attention to cross-cultural research using online consumer reviews and multilingual textual analysis.
Abstract
Purpose
The purpose of this paper is to call researchers’ attention to cross-cultural research using online consumer reviews and multilingual textual analysis.
Design/methodology/approach
The authors discuss a selected literature review and the highlight of the four studies that show cross-cultural differences in online reviews on ethnic restaurants.
Findings
Applying multilingual textual analysis could prompt new venues to verify and expand future cross-cultural research in tourism and hospitality.
Originality/value
The paper introduces examples of multilingual textual analysis used for cross-cultural studies.
Details
Keywords
Elaine Menard and Margaret Smithglass
The purpose of this paper is to present the results of the first phase of a research project that aims to develop a bilingual interface for the retrieval of digital images. The…
Abstract
Purpose
The purpose of this paper is to present the results of the first phase of a research project that aims to develop a bilingual interface for the retrieval of digital images. The main objective of this extensive exploration was to identify the characteristics and functionalities of existing search interfaces and similar tools available for image retrieval.
Design/methodology/approach
An examination of 159 resources that offer image retrieval was carried out. First, general search functionalities offered by content-based image retrieval systems and text-based systems are described. Second, image retrieval in a multilingual context is explored. Finally, the search functionalities provided by four types of organisations (libraries, museums, image search engines and stock photography databases) are investigated.
Findings
The analysis of functionalities offered by online image resources revealed a very high degree of consistency within the types of resources examined. The resources found to be the most navigable and interesting to use were those built with standardised vocabularies combined with a clear, compact and efficient user interface. The analysis also highlights that many search engines are equipped with multiple language support features. A translation device, however, is implemented in only a few search engines.
Originality/value
The examination of best practices for image retrieval and the analysis of the real users' expectations, which will be obtained in the next phase of the research project, constitute the foundation upon which the search interface model that the authors propose to develop is based. It also provides valuable suggestions and guidelines for search engine researchers, designers and developers.
Details
Keywords
Thea Williamson and Aris Clemons
Little research has been done exploring the nature of multilingual students who are not categorized as English language learners (ELLs) in English language arts (ELA) classes…
Abstract
Purpose
Little research has been done exploring the nature of multilingual students who are not categorized as English language learners (ELLs) in English language arts (ELA) classes. This study about a group of multilingual girls in an ELA class led by a monolingual white teacher aims to show how, when a teacher makes space for translanguaging practices in ELA, multilingual students disrupt norms of English only.
Design/methodology/approach
The authors use reconstructive discourse analysis to understand translanguaging across a variety of linguistic productions for a group of four focal students. Data sources include fieldnotes from 29 classroom observations, writing samples and process documents and 8.5 h of recorded classroom discourse.
Findings
Students used multilingualism across a variety of discourse modes, frequently in spoken language and rarely in written work. Translanguaging was most present in small-group peer talk structures, where students did relationship building, generated ideas for writing and managed their writing agendas, including feelings about writing. In addition, Spanish served as “elevated vocabulary” in writing. Across discourse modes, translanguaging served to develop academic proficiency in writing.
Originality/value
The authors proposed a more expansive approach to data analysis in English-mostly cases – i.e. environments shaped by multilingual students in monolingual school contexts – to argue for anti-deficit approaches to literacy development for multilingual students. Analyzing classroom talk alongside literacy allows for a more nuanced understanding of translanguaging practices in academic writing. They also show how even monolingual teachers can disrupt monolingual hegemony in ELA classrooms with high populations of multilingual students.
Details
Keywords
This paper seeks to examine image retrieval within two different contexts: a monolingual context where the language of the query is the same as the indexing language and a…
Abstract
Purpose
This paper seeks to examine image retrieval within two different contexts: a monolingual context where the language of the query is the same as the indexing language and a multilingual context where the language of the query is different from the indexing language. The study also aims to compare two different approaches for the indexing of ordinary images representing common objects: traditional image indexing with the use of a controlled vocabulary and free image indexing using uncontrolled vocabulary.
Design/methodology/approach
This research uses three data collection methods. An analysis of the indexing terms was employed in order to examine the multiplicity of term types assigned to images. A simulation of the retrieval process involving a set of 30 images was performed with 60 participants. The quantification of the retrieval performance of each indexing approach was based on the usability measures, that is, effectiveness, efficiency and satisfaction of the user. Finally, a questionnaire was used to gather information on searcher satisfaction during and after the retrieval process.
Findings
The results of this research are twofold. The analysis of indexing terms associated with all the 3,950 images provides a comprehensive description of the characteristics of the four non‐combined indexing forms used for the study. Also, the retrieval simulation results offers information about the relative performance of the six indexing forms (combined and non‐combined) in terms of their effectiveness, efficiency (temporal and human) and the image searcher's satisfaction.
Originality/value
The findings of the study suggest that, in the near future, the information systems could benefit from allowing an increased coexistence of controlled vocabularies and uncontrolled vocabularies, resulting from collaborative image tagging, for example, and giving the users the possibility to dynamically participate in the image‐indexing process, in a more user‐centred way.
Details
Keywords
Hanan Alghamdi and Ali Selamat
With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites…
Abstract
Purpose
With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites. Accordingly, the volume of previous research focused on identifying the techniques and activities of terrorist/extremist groups, as revealed by their sites on the so-called dark web, has also grown.
Design/methodology/approach
This study presents a review of the techniques used to detect and process the content of terrorist/extremist sites on the dark web. Forty of the most relevant data sources were examined, and various techniques were identified among them.
Findings
Based on this review, it was found that methods of feature selection and feature extraction can be used as topic modeling with content analysis and text clustering.
Originality/value
At the end of the review, present the current state-of-the- art and certain open issues associated with Arabic dark Web content analysis.
Details
Keywords
Mohamed Hammami, Youssef Chahir and Liming Chen
Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable…
Abstract
Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable web content. In this paper, we investigate this problem through WebGuard, our automatic machine learning based pornographic website classification and filtering system. Facing the Internet more and more visual and multimedia as exemplified by pornographic websites, we focus here our attention on the use of skin color related visual content based analysis along with textual and structural content based analysis for improving pornographic website filtering. While the most commercial filtering products on the marketplace are mainly based on textual content‐based analysis such as indicative keywords detection or manually collected black list checking, the originality of our work resides on the addition of structural and visual content‐based analysis to the classical textual content‐based analysis along with several major‐data mining techniques for learning and classifying. Experimented on a testbed of 400 websites including 200 adult sites and 200 non pornographic ones, WebGuard, our Web filtering engine scored a 96.1% classification accuracy rate when only textual and structural content based analysis are used, and 97.4% classification accuracy rate when skin color related visual content based analysis is driven in addition. Further experiments on a black list of 12 311 adult websites manually collected and classified by the French Ministry of Education showed that WebGuard scored 87.82% classification accuracy rate when using only textual and structural content‐based analysis, and 95.62% classification accuracy rate when the visual content‐based analysis is driven in addition. The basic framework of WebGuard can apply to other categorization problems of websites which combine, as most of them do today, textual and visual content.
Details
Keywords
Matthias Görtz, Thomas Mandl, Katrin Werner and Christa Womser-Hacker
Purpose – Global cooperation between and within organisations has become essential for successful businesses. For the information management within such an international and…
Abstract
Purpose – Global cooperation between and within organisations has become essential for successful businesses. For the information management within such an international and necessarily multilingual environment, new challenges arise due to the diversity of the stakeholders and participants as well as due to the heterogeneity of approaches and traditions of information handling.
Design/methodology/approach – Key technologies like search technologies need to be adapted to support content in multiple languages and efficient access to it. Information processes need to be analysed while bearing in mind that problems may arise due to cross-cultural misunderstandings. The diversity requires appropriate treatment and appropriate methods in information systems in order to improve international information flows.
Findings – This chapter identifies some of these challenges and shows how they can be approached from an information science perspective. User-oriented research at the University of Hildesheim in the areas information retrieval, information seeking and human–computer interaction is presented.
Originality/value – Global enterprises and organisations may use this chapter to identify challenges and solutions for adapting their information technology to an international scale. Researchers who work on multilingual information access and intercultural aspects of information systems get an overview on some current research.
Details
Keywords
The use of an information retrieval (IR) system would be easier if natural language processing were applied. There are essentially two different ways to use NLP techniques: as a…
Abstract
The use of an information retrieval (IR) system would be easier if natural language processing were applied. There are essentially two different ways to use NLP techniques: as a user interface coupled with a factual database, or as an integrated part of a system which deals with a textual database. In this paper, two approaches are presented, that of MGS, a commercialized system in use in France Télécom, and that of Telmi, a France Télécom research system. Telmi is an information retrieval system designed for use with medium sized databases of short text. The characteristics of the system include fine‐grained NLP, an open domain and large scale knowledge base, automated indexing based on conceptual representation of texts, and reusability of the NLP tools. The knowledge base is (semi) automatically extracted from a monolingual machine‐readable dictionary (MRD). Telmi is integrated into a production‐scale prototype which implements a Minitel Information Service (IS) for the use of the general public. France Télécom Minitel(i) and its problems are described, along with the solutions Telmi offers. The paper then goes on to describe how France Télécom intends to reuse, in a continuation of the present project, the Telmi tools in a multilingual system, particularly in (semi)automatic data acquisition from multilingual MRDs.
Victor Diogho Heuer de Carvalho and Ana Paula Cabral Seixas Costa
This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is…
Abstract
Purpose
This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is supporting analyses, so security authorities can make appropriate decisions about their actions.
Design/methodology/approach
The corpora were obtained through web scraping from a newspaper's website and tweets from a Brazilian metropolitan region. Natural language processing was applied considering: text cleaning, lemmatization, summarization, part-of-speech and dependencies parsing, named entities recognition, and topic modeling.
Findings
Several results were obtained based on the methodology used, highlighting some: an example of a summarization using an automated process; dependency parsing; the most common topics in each corpus; the forty named entities and the most common slogans were extracted, highlighting those linked to public security.
Research limitations/implications
Some critical tasks were identified for the research perspective, related to the applied methodology: the treatment of noise from obtaining news on their source websites, passing through textual elements quite present in social network posts such as abbreviations, emojis/emoticons, and even writing errors; the treatment of subjectivity, to eliminate noise from irony and sarcasm; the search for authentic news of issues within the target domain. All these tasks aim to improve the process to enable interested authorities to perform accurate analyses.
Practical implications
The corpora dedicated to the public security domain enable several analyses, such as mining public opinion on security actions in a given location; understanding criminals' behaviors reported in the news or even on social networks and drawing their attitudes timeline; detecting movements that may cause damage to public property and people welfare through texts from social networks; extracting the history and repercussions of police actions, crossing news with records on social networks; among many other possibilities.
Originality/value
The work on behalf of the corpora reported in this text represents one of the first initiatives to create textual bases in Portuguese, dedicated to Brazil's specific public security domain.
Details
Keywords
Djamila Mohdeb, Meriem Laifa, Fayssal Zerargui and Omar Benzaoui
The present study was designed to investigate eight research questions that are related to the analysis and the detection of dialectal Arabic hate speech that targeted African…
Abstract
Purpose
The present study was designed to investigate eight research questions that are related to the analysis and the detection of dialectal Arabic hate speech that targeted African refugees and illegal migrants on the YouTube Algerian space.
Design/methodology/approach
The transfer learning approach which recently presents the state-of-the-art approach in natural language processing tasks has been exploited to classify and detect hate speech in Algerian dialectal Arabic. Besides, a descriptive analysis has been conducted to answer the analytical research questions that aim at measuring and evaluating the presence of the anti-refugee/migrant discourse on the YouTube social platform.
Findings
Data analysis revealed that there has been a gradual modest increase in the number of anti-refugee/migrant hateful comments on YouTube since 2014, a sharp rise in 2017 and a sharp decline in later years until 2021. Furthermore, our findings stemming from classifying hate content using multilingual and monolingual pre-trained language transformers demonstrate a good performance of the AraBERT monolingual transformer in comparison with the monodialectal transformer DziriBERT and the cross-lingual transformers mBERT and XLM-R.
Originality/value
Automatic hate speech detection in languages other than English is quite a challenging task that the literature has tried to address by various approaches of machine learning. Although the recent approach of cross-lingual transfer learning offers a promising solution, tackling this problem in the context of the Arabic language, particularly dialectal Arabic makes it even more challenging. Our results cast a new light on the actual ability of the transfer learning approach to deal with low-resource languages that widely differ from high-resource languages as well as other Latin-based, low-resource languages.
Details