Search results

1 – 10 of 968
Open Access
Article
Publication date: 27 September 2022

Fahad Ali Hakami

This study aims to identify and measure the lexical gap between the old and young generations in the Jizani dialect and determine the causes of that gap.

2605

Abstract

Purpose

This study aims to identify and measure the lexical gap between the old and young generations in the Jizani dialect and determine the causes of that gap.

Design/methodology/approach

A 20-item questionnaire was distributed randomly among 104 participants. Next, 12 participants were selected and interviewed. SPSS software was used to analyse the quantitative data from the questionnaire. The data elicited from the interviews was qualitatively analysed, considering age and gender factors.

Findings

The major findings revealed that a lexical gap between old and young language speakers in the Jizani dialect exists. The gap between young females and the older generation was greater than that between young and old males. Some old words are likely to disappear in the coming decades. Social media, which is a time-consuming and word-borrowing medium for young people, was one of the reasons, besides the tendency of females to use prestigious words.

Originality/value

This study attempted to find the differences between the vocabularies of old and young speakers. If it does exist, is it significant? What are the reasons for this lexical gap? This will help other researchers and dialectologists register the old words before they die out and try to bridge that lexical gap.

Details

Saudi Journal of Language Studies, vol. 2 no. 4
Type: Research Article
ISSN: 2634-243X

Keywords

Article
Publication date: 2 February 2015

Jiunn-Liang Guo, Hei-Chia Wang and Ming-Way Lai

The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents – e-books of online library system. The…

Abstract

Purpose

The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents – e-books of online library system. The main idea mainly aims on automatically identifying the discourse features in order to improving the feature selection process rather than focussing on the size of the corpus.

Design/methodology/approach

The proposed framework intends to automatically identify the discourse segments within e-books and capture proper discourse subtopics that are cohesively expressed in discourse segments and treating these subtopics as informative and prominent features. The selected set of features is then used to train and perform the e-book classification task based on the support vector machine technique.

Findings

The evaluation of the proposed framework shows that identifying discourse segments and capturing subtopic features leads to better performance, in comparison with two conventional feature selection techniques: TFIDF and mutual information. It also demonstrates that discourse features play important roles among textual features, especially for large documents such as e-books.

Research limitations/implications

Automatically extracted subtopic features cannot be directly entered into FS process but requires control of the threshold.

Practical implications

The proposed technique has demonstrated the promised application of using discourse analysis to enhance the classification of large digital documents – e-books as against to conventional techniques.

Originality/value

A new FS technique is proposed which can inspect the narrative structure of large documents and it is new to the text classification domain. The other contribution is that it inspires the consideration of discourse information in future text analysis, by providing more evidences through evaluation of the results. The proposed system can be integrated into other library management systems.

Details

Program, vol. 49 no. 1
Type: Research Article
ISSN: 0033-0337

Keywords

Open Access
Article
Publication date: 18 November 2021

Shin'ichiro Ishikawa

Using a newly compiled corpus module consisting of utterances from Asian learners during L2 English interviews, this study examined how Asian EFL learners' L1s (Chinese…

Abstract

Purpose

Using a newly compiled corpus module consisting of utterances from Asian learners during L2 English interviews, this study examined how Asian EFL learners' L1s (Chinese, Indonesian, Japanese, Korean, Taiwanese and Thai), their L2 proficiency levels (A2, B1 low, B1 upper and B2+) and speech task types (picture descriptions, roleplays and QA-based conversations) affected four aspects of vocabulary usage (number of tokens, standardized type/token ratio, mean word length and mean sentence length).

Design/methodology/approach

Four aspects concern speech fluency, lexical richness, lexical complexity and structural complexity, respectively.

Findings

Subsequent corpus-based quantitative data analyses revealed that (1) learner/native speaker differences existed during the conversation and roleplay tasks in terms of the number of tokens, type/token ratio and sentence length; (2) an L1 group effect existed in all three task types in terms of the number of tokens and sentence length; (3) an L2 proficiency effect existed in all three task types in terms of the number of tokens, type-token ratio and sentence length; and (4) the usage of high-frequency vocabulary was influenced more strongly by the task type and it was classified into four types: Type A vocabulary for grammar control, Type B vocabulary for speech maintenance, Type C vocabulary for negotiation and persuasion and Type D vocabulary for novice learners.

Originality/value

These findings provide clues for better understanding L2 English vocabulary usage among Asian learners during speech.

Details

PSU Research Review, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2399-1747

Keywords

Open Access
Article
Publication date: 12 December 2022

Anwar A. H. Al-Athwary

Investigating technical terms of vehicle spare parts used in the mechanics' jargon in Saudi Arabic (SA) and Yemeni Arabic (YA) has received scant attention. The current study…

Abstract

Purpose

Investigating technical terms of vehicle spare parts used in the mechanics' jargon in Saudi Arabic (SA) and Yemeni Arabic (YA) has received scant attention. The current study, therefore, is an attempt to shed some light on the topic. The aim is to identify the strategies used for creating equivalents in vehicle spare parts vocabulary and to pinpoint the most salient variations between the two dialects in this jargon.

Design/methodology/approach

More than 250 terms of vehicle spare parts were collected and analyzed qualitatively. Each list contains nearly 125 items. They were gathered from two main resources: semi-structured interviews with vehicle mechanics, and written lists from spare parts dealers in both countries.

Findings

Three main strategies are found at work: lexical borrowing (from English and French), metaphor and loan translation. Direct borrowing is the most influential strategy where loanwords represent nearly one-third of the data, the majority of which is from English. Metaphorical extensions and literal translations also have an important role to play in the process of spare part naming. While the two dialects share common practices in terms of literal translation, they are characterized by many differences with regard to lexical borrowing and metaphors.

Originality/value

The study approaches an under-researched topic that is related to the mechanic's jargon in Arabic and leaves the door open for further research. The findings of this study may be used as guidelines for Arabic academies and those who are concerned with translating and studying technical terms in the field of mechanical engineering.

Details

Saudi Journal of Language Studies, vol. 3 no. 1
Type: Research Article
ISSN: 2634-243X

Keywords

Open Access
Article
Publication date: 25 May 2023

Hafizah Hamdan

This paper aims to investigate how Bruneian secondary school students employ code-switching in peer interactions. The functions of students' code-switching were analysed using…

1103

Abstract

Purpose

This paper aims to investigate how Bruneian secondary school students employ code-switching in peer interactions. The functions of students' code-switching were analysed using Reyes' (2004) and Appel and Muysken's (2005) typologies.

Design/methodology/approach

The data collected are based on audio-recorded group discussions designed to elicit students’ code-switched utterances.

Findings

The results indicate that the students used 11 functions of code-switching: referential, discourse marker, clarification, expressive, quotation imitation, turn accommodation, insistence, emphasis, question shift, situation shift and poetic.

Research limitations/implications

As the study only focusses on a specific secondary school, results from this school will not represent secondary school students in Brunei.

Originality/value

This paper hopes to provide insight into how students' code-switching can be seen in a positive light. Moreover, understanding how students use code-switching in the classroom is essential for successful knowledge transfer and for cultivating competent bilinguals, which is what the country's education system aims for.

Details

Southeast Asia: A Multidisciplinary Journal, vol. 23 no. 1
Type: Research Article
ISSN: 1819-5091

Keywords

Article
Publication date: 1 April 2001

Kerstin Jorna and Sylvie Davies

In the 21st century, multilingual tools are gaining importance as increasingly diverse user groups from different cultural and linguistic backgrounds seek access to equally…

Abstract

In the 21st century, multilingual tools are gaining importance as increasingly diverse user groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information. The authors of this paper believe that most current forms of multilingual information access are inadequate for this role, and that a new form of multilingual thesaurus is required. The core of this paper introduces their pilot thesaurus InfoDEFT as a possible model for new online thesauri, which are semantically structured, encyclopedic and multilingual. The authors conclude that while the manual construction of such thesauri is labour intensive and hence costly, pilot thesauri can be used as training sets for artificial learning programmes, thus increasing their volume considerably at relatively little extra cost.

Details

Journal of Documentation, vol. 57 no. 2
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 18 May 2023

Rongen Yan, Depeng Dang, Hu Gao, Yan Wu and Wenhui Yu

Question answering (QA) answers the questions asked by people in the form of natural language. In the QA, due to the subjectivity of users, the questions they query have different…

Abstract

Purpose

Question answering (QA) answers the questions asked by people in the form of natural language. In the QA, due to the subjectivity of users, the questions they query have different expressions, which increases the difficulty of text retrieval. Therefore, the purpose of this paper is to explore new query rewriting method for QA that integrates multiple related questions (RQs) to form an optimal question. Moreover, it is important to generate a new dataset of the original query (OQ) with multiple RQs.

Design/methodology/approach

This study collects a new dataset SQuAD_extend by crawling the QA community and uses word-graph to model the collected OQs. Next, Beam search finds the best path to get the best question. To deeply represent the features of the question, pretrained model BERT is used to model sentences.

Findings

The experimental results show three outstanding findings. (1) The quality of the answers is better after adding the RQs of the OQs. (2) The word-graph that is used to model the problem and choose the optimal path is conducive to finding the best question. (3) Finally, BERT can deeply characterize the semantics of the exact problem.

Originality/value

The proposed method can use word-graph to construct multiple questions and select the optimal path for rewriting the question, and the quality of answers is better than the baseline. In practice, the research results can help guide users to clarify their query intentions and finally achieve the best answer.

Details

Data Technologies and Applications, vol. 58 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 9 December 2019

Zhengfa Yang, Qian Liu, Baowen Sun and Xin Zhao

This paper aims to make it convenient for those who have only just begun their research into Community Question Answering (CQA) expert recommendation, and for those who are…

1946

Abstract

Purpose

This paper aims to make it convenient for those who have only just begun their research into Community Question Answering (CQA) expert recommendation, and for those who are already concerned with this issue, to ease the extension of our understanding with future research.

Design/methodology/approach

In this paper, keywords such as “CQA”, “Social Question Answering”, “expert recommendation”, “question routing” and “expert finding” are used to search major digital libraries. The final sample includes a list of 83 relevant articles authored in academia as well as industry that have been published from January 1, 2008 to March 1, 2019.

Findings

This study proposes a comprehensive framework to categorize extant studies into three broad areas of CQA expert recommendation research: understanding profile modeling, recommendation approaches and recommendation system impacts.

Originality/value

This paper focuses on discussing and sorting out the key research issues from these three research genres. Finally, it was found that conflicting and contradictory research results and research gaps in the existing research, and then put forward the urgent research topics.

Details

International Journal of Crowd Science, vol. 3 no. 3
Type: Research Article
ISSN: 2398-7294

Keywords

Article
Publication date: 22 October 2019

Ming Li, Lisheng Chen and Yingcheng Xu

A large number of questions are posted on community question answering (CQA) websites every day. Providing a set of core questions will ease the question overload problem. These…

Abstract

Purpose

A large number of questions are posted on community question answering (CQA) websites every day. Providing a set of core questions will ease the question overload problem. These core questions should cover the main content of the original question set. There should be low redundancy within the core questions and a consistent distribution with the original question set. The paper aims to discuss these issues.

Design/methodology/approach

In the paper, a method named QueExt method for extracting core questions is proposed. First, questions are modeled using a biterm topic model. Then, these questions are clustered based on particle swarm optimization (PSO). With the clustering results, the number of core questions to be extracted from each cluster can be determined. Afterwards, the multi-objective PSO algorithm is proposed to extract the core questions. Both PSO algorithms are integrated with operators in genetic algorithms to avoid the local optimum.

Findings

Extensive experiments on real data collected from the famous CQA website Zhihu have been conducted and the experimental results demonstrate the superior performance over other benchmark methods.

Research limitations/implications

The proposed method provides new insight into and enriches research on information overload in CQA. It performs better than other methods in extracting core short text documents, and thus provides a better way to extract core data. The PSO is a novel method used for selecting core questions. The research on the application of the PSO model is expanded. The study also contributes to research on PSO-based clustering. With the integration of K-means++, the key parameter number of clusters is optimized.

Originality/value

The novel core question extraction method in CQA is proposed, which provides a novel and efficient way to alleviate the question overload. The PSO model is extended and novelty used in selecting core questions. The PSO model is integrated with K-means++ method to optimize the number of clusters, which is just the key parameter in text clustering based on PSO. It provides a new way to cluster texts.

Details

Data Technologies and Applications, vol. 53 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Book part
Publication date: 26 January 2011

Kenneth Farrall

The Nationwide Suspicious Activity Reporting Initiative (NSI) is the focal point of the Information Sharing Environment (ISE), a radical reformulation of policies governing…

Abstract

The Nationwide Suspicious Activity Reporting Initiative (NSI) is the focal point of the Information Sharing Environment (ISE), a radical reformulation of policies governing government intelligence activities within US borders. In the wake of the September 11th attacks, long-standing informational norms for the production, use, and circulation of domestic intelligence records containing personal information are being replaced with far less restrictive norms, altering a status quo that had been in effect since mid-1970s. Although the NSI represents an unprecedented expansion of human resources dedicated to the collection and production of domestic intelligence, it is not well known in privacy advocacy community. This chapter considers these and other terms in the context of relevant US law and policy, including the Privacy Act of 1974, the E-Government Act of 2002, Executive Order 12333, and 28 CFR Part 23. In addition to describing the federal (ISE-SAR) standard, the chapter examines the critical role of guidance in the logic of suspicious activity report (SAR) production, and the problematic role finished ISE-SARs seem to play in the matrix of federal and state-level watch lists. The program, if not properly regulated, could pose a considerable threat to personal privacy and the life chances and self-determination of all US persons. The chapter considers this threat in terms of Nissenbaum's (2010) “contextual integrity,” a theory of context-relative informational norms.

1 – 10 of 968