Search results

1 – 10 of 138
Open Access
Article
Publication date: 17 July 2020

Imad Zeroual and Abdelhak Lakhouaja

Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual parallel…

2676

Abstract

Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual parallel corpora are becoming the focus of many Natural Language Processing (NLP) scientific groups. Unlike monolingual corpora, the number of available multilingual parallel corpora is limited. In this paper, the MulTed, a corpus of subtitles extracted from TEDx talks is introduced. It is multilingual, Part of Speech (PoS) tagged, and bilingually sentence-aligned with English as a pivot language. This corpus is designed for many NLP applications, where the sentence-alignment, the PoS tagging, and the size of corpora are influential such as statistical machine translation, language recognition, and bilingual dictionary generation. Currently, the corpus has subtitles that cover 1100 talks available in over 100 languages. The subtitles are classified based on a variety of topics such as Business, Education, and Sport. Regarding the PoS tagging, the Treetagger, a language-independent PoS tagger, is used; then, to make the PoS tagging maximally useful, a mapping process to a universal common tagset is performed. Finally, we believe that making the MulTed corpus available for a public use can be a significant contribution to the literature of NLP and corpus linguistics, especially for under-resourced languages.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2210-8327

Keywords

Open Access
Article
Publication date: 19 December 2023

Qinxu Ding, Ding Ding, Yue Wang, Chong Guan and Bosheng Ding

The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive…

2427

Abstract

Purpose

The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive examination of the research landscape in LLMs, providing an overview of the prevailing themes and topics within this dynamic domain.

Design/methodology/approach

Drawing from an extensive corpus of 198 records published between 1996 to 2023 from the relevant academic database encompassing journal articles, books, book chapters, conference papers and selected working papers, this study delves deep into the multifaceted world of LLM research. In this study, the authors employed the BERTopic algorithm, a recent advancement in topic modeling, to conduct a comprehensive analysis of the data after it had been meticulously cleaned and preprocessed. BERTopic leverages the power of transformer-based language models like bidirectional encoder representations from transformers (BERT) to generate more meaningful and coherent topics. This approach facilitates the identification of hidden patterns within the data, enabling authors to uncover valuable insights that might otherwise have remained obscure. The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.

Findings

The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.

Practical implications

This classification offers practical guidance for researchers, developers, educators, and policymakers to focus efforts and resources. The study underscores the importance of addressing challenges in LLMs, including potential biases, transparency, data privacy, and responsible deployment. Policymakers can utilize this information to shape regulations, while developers can tailor technology development based on the diverse applications identified. The findings also emphasize the need for interdisciplinary collaboration and highlight ethical considerations, providing a roadmap for navigating the complex landscape of LLM research and applications.

Originality/value

This study stands out as the first to examine the evolution of LLMs across such a long time frame and across such diversified disciplines. It provides a unique perspective on the key areas of LLM research, highlighting the breadth and depth of LLM’s evolution.

Details

Journal of Electronic Business & Digital Economics, vol. 3 no. 1
Type: Research Article
ISSN: 2754-4214

Keywords

Open Access
Article
Publication date: 13 August 2020

Mariam AlKandari and Imtiaz Ahmad

Solar power forecasting will have a significant impact on the future of large-scale renewable energy plants. Predicting photovoltaic power generation depends heavily on climate…

11015

Abstract

Solar power forecasting will have a significant impact on the future of large-scale renewable energy plants. Predicting photovoltaic power generation depends heavily on climate conditions, which fluctuate over time. In this research, we propose a hybrid model that combines machine-learning methods with Theta statistical method for more accurate prediction of future solar power generation from renewable energy plants. The machine learning models include long short-term memory (LSTM), gate recurrent unit (GRU), AutoEncoder LSTM (Auto-LSTM) and a newly proposed Auto-GRU. To enhance the accuracy of the proposed Machine learning and Statistical Hybrid Model (MLSHM), we employ two diversity techniques, i.e. structural diversity and data diversity. To combine the prediction of the ensemble members in the proposed MLSHM, we exploit four combining methods: simple averaging approach, weighted averaging using linear approach and using non-linear approach, and combination through variance using inverse approach. The proposed MLSHM scheme was validated on two real-time series datasets, that sre Shagaya in Kuwait and Cocoa in the USA. The experiments show that the proposed MLSHM, using all the combination methods, achieved higher accuracy compared to the prediction of the traditional individual models. Results demonstrate that a hybrid model combining machine-learning methods with statistical method outperformed a hybrid model that only combines machine-learning models without statistical method.

Details

Applied Computing and Informatics, vol. 20 no. 3/4
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 17 July 2020

Mukesh Kumar and Palak Rehan

Social media networks like Twitter, Facebook, WhatsApp etc. are most commonly used medium for sharing news, opinions and to stay in touch with peers. Messages on twitter are…

1216

Abstract

Social media networks like Twitter, Facebook, WhatsApp etc. are most commonly used medium for sharing news, opinions and to stay in touch with peers. Messages on twitter are limited to 140 characters. This led users to create their own novel syntax in tweets to express more in lesser words. Free writing style, use of URLs, markup syntax, inappropriate punctuations, ungrammatical structures, abbreviations etc. makes it harder to mine useful information from them. For each tweet, we can get an explicit time stamp, the name of the user, the social network the user belongs to, or even the GPS coordinates if the tweet is created with a GPS-enabled mobile device. With these features, Twitter is, in nature, a good resource for detecting and analyzing the real time events happening around the world. By using the speed and coverage of Twitter, we can detect events, a sequence of important keywords being talked, in a timely manner which can be used in different applications like natural calamity relief support, earthquake relief support, product launches, suspicious activity detection etc. The keyword detection process from Twitter can be seen as a two step process: detection of keyword in the raw text form (words as posted by the users) and keyword normalization process (reforming the users’ unstructured words in the complete meaningful English language words). In this paper a keyword detection technique based upon the graph, spanning tree and Page Rank algorithm is proposed. A text normalization technique based upon hybrid approach using Levenshtein distance, demetaphone algorithm and dictionary mapping is proposed to work upon the unstructured keywords as produced by the proposed keyword detector. The proposed normalization technique is validated using the standard lexnorm 1.2 dataset. The proposed system is used to detect the keywords from Twiter text being posted at real time. The detected and normalized keywords are further validated from the search engine results at later time for detection of events.

Details

Applied Computing and Informatics, vol. 17 no. 2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 19 July 2022

Shreyesh Doppalapudi, Tingyan Wang and Robin Qiu

Clinical notes typically contain medical jargons and specialized words and phrases that are complicated and technical to most people, which is one of the most challenging…

1143

Abstract

Purpose

Clinical notes typically contain medical jargons and specialized words and phrases that are complicated and technical to most people, which is one of the most challenging obstacles in health information dissemination to consumers by healthcare providers. The authors aim to investigate how to leverage machine learning techniques to transform clinical notes of interest into understandable expressions.

Design/methodology/approach

The authors propose a natural language processing pipeline that is capable of extracting relevant information from long unstructured clinical notes and simplifying lexicons by replacing medical jargons and technical terms. Particularly, the authors develop an unsupervised keywords matching method to extract relevant information from clinical notes. To automatically evaluate completeness of the extracted information, the authors perform a multi-label classification task on the relevant texts. To simplify lexicons in the relevant text, the authors identify complex words using a sequence labeler and leverage transformer models to generate candidate words for substitution. The authors validate the proposed pipeline using 58,167 discharge summaries from critical care services.

Findings

The results show that the proposed pipeline can identify relevant information with high completeness and simplify complex expressions in clinical notes so that the converted notes have a high level of readability but a low degree of meaning change.

Social implications

The proposed pipeline can help healthcare consumers well understand their medical information and therefore strengthen communications between healthcare providers and consumers for better care.

Originality/value

An innovative pipeline approach is developed to address the health literacy problem confronted by healthcare providers and consumers in the ongoing digital transformation process in the healthcare industry.

Open Access
Article
Publication date: 6 March 2017

Zhuoxuan Jiang, Chunyan Miao and Xiaoming Li

Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated by…

2141

Abstract

Purpose

Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated by learners all over the world, unprecedented massive educational resources are aggregated. The educational resources include videos, subtitles, lecture notes, quizzes, etc., on the teaching side, and forum contents, Wiki, log of learning behavior, log of homework, etc., on the learning side. However, the data are both unstructured and diverse. To facilitate knowledge management and mining on MOOCs, extracting keywords from the resources is important. This paper aims to adapt the state-of-the-art techniques to MOOC settings and evaluate the effectiveness on real data. In terms of practice, this paper also tries to answer the questions for the first time that to what extend can the MOOC resources support keyword extraction models, and how many human efforts are required to make the models work well.

Design/methodology/approach

Based on which side generates the data, i.e instructors or learners, the data are classified to teaching resources and learning resources, respectively. The approach used on teaching resources is based on machine learning models with labels, while the approach used on learning resources is based on graph model without labels.

Findings

From the teaching resources, the methods used by the authors can accurately extract keywords with only 10 per cent labeled data. The authors find a characteristic of the data that the resources of various forms, e.g. subtitles and PPTs, should be separately considered because they have the different model ability. From the learning resources, the keywords extracted from MOOC forums are not as domain-specific as those extracted from teaching resources, but they can reflect the topics which are lively discussed in forums. Then instructors can get feedback from the indication. The authors implement two applications with the extracted keywords: generating concept map and generating learning path. The visual demos show they have the potential to improve learning efficiency when they are integrated into a real MOOC platform.

Research limitations/implications

Conducting keyword extraction on MOOC resources is quite difficult because teaching resources are hard to be obtained due to copyrights. Also, getting labeled data is tough because usually expertise of the corresponding domain is required.

Practical implications

The experiment results support that MOOC resources are good enough for building models of keyword extraction, and an acceptable balance between human efforts and model accuracy can be achieved.

Originality/value

This paper presents a pioneer study on keyword extraction on MOOC resources and obtains some new findings.

Details

International Journal of Crowd Science, vol. 1 no. 1
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 26 July 2021

Yixin Zhang, Lizhen Cui, Wei He, Xudong Lu and Shipeng Wang

The behavioral decision-making of digital-self is one of the important research contents of the network of crowd intelligence. The factors and mechanisms that affect…

Abstract

Purpose

The behavioral decision-making of digital-self is one of the important research contents of the network of crowd intelligence. The factors and mechanisms that affect decision-making have attracted the attention of many researchers. Among the factors that influence decision-making, the mind of digital-self plays an important role. Exploring the influence mechanism of digital-selfs’ mind on decision-making is helpful to understand the behaviors of the crowd intelligence network and improve the transaction efficiency in the network of CrowdIntell.

Design/methodology/approach

In this paper, the authors use behavioral pattern perception layer, multi-aspect perception layer and memory network enhancement layer to adaptively explore the mind of a digital-self and generate the mental representation of a digital-self from three aspects including external behavior, multi-aspect factors of the mind and memory units. The authors use the mental representations to assist behavioral decision-making.

Findings

The evaluation in real-world open data sets shows that the proposed method can model the mind and verify the influence of the mind on the behavioral decisions, and its performance is better than the universal baseline methods for modeling user interest.

Originality/value

In general, the authors use the behaviors of the digital-self to mine and explore its mind, which is used to assist the digital-self to make decisions and promote the transaction in the network of CrowdIntell. This work is one of the early attempts, which uses neural networks to model the mental representation of digital-self.

Details

International Journal of Crowd Science, vol. 5 no. 2
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 9 January 2024

Waleed Obaidallah Alsubhi

Effective translation has become essential for seamless cross-cultural communication in an era of global interconnectedness. Translation management systems (TMS) have redefined…

Abstract

Purpose

Effective translation has become essential for seamless cross-cultural communication in an era of global interconnectedness. Translation management systems (TMS) have redefined the translation landscape, revolutionizing project management and execution. This study examines the attitudes of translation agencies and professional translators towards integrating and utilizing TMS, with a specific focus on Saudi Arabia.

Design/methodology/approach

The study's design was based on a thorough mixed-methods strategy that purposefully combined quantitative and qualitative procedures to create an array of findings. Through a survey involving 35 participants (both project managers and professional translators) and a series of interviews, this research explores the adoption of TMS, perceived benefits, influencing factors and future considerations. This integrated approach sought to investigate the nuanced perceptions of Saudi translation companies and expert translators about TMS. By combining the strengths of quantitative data's broad scopes and qualitative insights' depth, this mixed-methods approach sought to overcome the limitations of each method, ultimately resulting in a holistic understanding of the multifaceted factors shaping attitudes within Saudi Arabia's unique translation landscape.

Findings

Based on questionnaires and interviews, the study shows that 80% of participants were familiar with TMS, and 57% had adopted it in their work. Benefits included enhanced project efficiency, collaboration and quality assurance. Factors influencing adoption encompassed cost, compatibility and resistance to change. The study further delved into participants' demographic profiles and years of experience, with a notable concentration in the 6–10 years range. TMS adoption was linked to improved translation processes, and participants expressed interest in AI integration and mobile compatibility. Deployment models favored cloud-based solutions, and compliance with industry standards was deemed vital. The findings underscore the evolving nature of TMS adoption in Saudi Arabia, with diverse attitudes shaped by cultural influences, technological compatibility and awareness.

Originality/value

This research provides a holistic and profound perspective on the integration of TMS, fostering a more comprehensive understanding of the opportunities, obstacles and potential pathways to success. As the translation landscape continues to evolve, the findings from this study will serve as a valuable compass guiding practitioners and researchers towards effectively harnessing the power of technology for enhanced translation outcomes.

Details

Saudi Journal of Language Studies, vol. 4 no. 1
Type: Research Article
ISSN: 2634-243X

Keywords

Open Access
Article
Publication date: 10 May 2021

Zakaryia Almahasees and Mutahar Qassem

The spread of Covid-19 has led to the closure of educational institutions worldwide, forcing academic institutions to find online platforms. The purpose of this paper is to…

4168

Abstract

Purpose

The spread of Covid-19 has led to the closure of educational institutions worldwide, forcing academic institutions to find online platforms. The purpose of this paper is to accelerate the development of the online learning (OL) environments within those institutions. The Covid-19 pandemic has unfolded the extent of the academic institutions' readiness to deal with such a crisis.

Design/methodology/approach

In this vein, the study aimed to identify the perception of translation instructors in teaching translation courses online during Covid-19, using a questionnaire to explore the strategies and challenges of teaching and assessing students' performance. The analysis revealed instructors' reliance on Zoom and Microsoft Teams in offering virtual classes and WhatsApp in communication with students outside the class.

Findings

The findings revealed the relative effectiveness of online education, but its efficacy is less than face-to-face learning according to the respondents' views. It was also found that students faced difficulties in OL, which lie in adapting to the online environment, lack of interaction and motivation and the deficiency of data connections. Even though online education could work as an aid during Covid-19, but it could not replace face-to-face instruction. Based on the findings, the study recommended blended learning. Combining online education with face-to-face instruction, i.e. face-to-face plus synchronous and asynchronous, would result in a rigorous OL environment.

Originality/value

The research is genuine and there is no conflict of interest.

Details

PSU Research Review, vol. 6 no. 3
Type: Research Article
ISSN: 2399-1747

Keywords

Open Access
Article
Publication date: 7 April 2015

Yujun Cao, Xin Li, Zhixiong Zhang and Jianzhong Shang

This paper aims to clarify the predicting and compensating method of aeroplane assembly. It proposes modeling the process of assembly. The paper aims to solve the precision…

1480

Abstract

Purpose

This paper aims to clarify the predicting and compensating method of aeroplane assembly. It proposes modeling the process of assembly. The paper aims to solve the precision assembly of aeroplane, which includes predicting the assembly variation and compensating the assembly errors.

Design/methodology/approach

The paper opted for an exploratory study using the state space theory and small displacement torsor theory. The assembly variation propagation model is established. The experiment data are obtained by a real small aeroplane assembly process.

Findings

The paper provides the predicting and compensating method for aeroplane assembly accuracy.

Originality/value

This paper fulfils an identified need to study how the assembly variation propagates in the assembly process.

Details

Assembly Automation, vol. 35 no. 2
Type: Research Article
ISSN: 0144-5154

Keywords

1 – 10 of 138