Search results

1 – 10 of 94
Article
Publication date: 25 October 2022

Victor Diogho Heuer de Carvalho and Ana Paula Cabral Seixas Costa

This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is…

Abstract

Purpose

This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is supporting analyses, so security authorities can make appropriate decisions about their actions.

Design/methodology/approach

The corpora were obtained through web scraping from a newspaper's website and tweets from a Brazilian metropolitan region. Natural language processing was applied considering: text cleaning, lemmatization, summarization, part-of-speech and dependencies parsing, named entities recognition, and topic modeling.

Findings

Several results were obtained based on the methodology used, highlighting some: an example of a summarization using an automated process; dependency parsing; the most common topics in each corpus; the forty named entities and the most common slogans were extracted, highlighting those linked to public security.

Research limitations/implications

Some critical tasks were identified for the research perspective, related to the applied methodology: the treatment of noise from obtaining news on their source websites, passing through textual elements quite present in social network posts such as abbreviations, emojis/emoticons, and even writing errors; the treatment of subjectivity, to eliminate noise from irony and sarcasm; the search for authentic news of issues within the target domain. All these tasks aim to improve the process to enable interested authorities to perform accurate analyses.

Practical implications

The corpora dedicated to the public security domain enable several analyses, such as mining public opinion on security actions in a given location; understanding criminals' behaviors reported in the news or even on social networks and drawing their attitudes timeline; detecting movements that may cause damage to public property and people welfare through texts from social networks; extracting the history and repercussions of police actions, crossing news with records on social networks; among many other possibilities.

Originality/value

The work on behalf of the corpora reported in this text represents one of the first initiatives to create textual bases in Portuguese, dedicated to Brazil's specific public security domain.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 17 July 2020

It has come to the attention of the publisher that the article Zeroual, I. and Lakhouana, A. (2020), “MulTed: a multilingual aligned and tagged parallel corpus”, published in…

Abstract

It has come to the attention of the publisher that the article Zeroual, I. and Lakhouana, A. (2020), “MulTed: a multilingual aligned and tagged parallel corpus”, published in Applied Computing and Informatics, https://doi.org/10.1016/ACI-j.aci.2018.12.003 was published twice due to a production error while onboarding the journal. The original article can be seen here: https://doi.org/10.1016/j.aci.2018.12.003

Details

Applied Computing and Informatics, vol. no.
Type: Research Article
ISSN: 2210-8327

Article
Publication date: 1 April 2024

Xiaoxian Yang, Zhifeng Wang, Qi Wang, Ke Wei, Kaiqi Zhang and Jiangang Shi

This study aims to adopt a systematic review approach to examine the existing literature on law and LLMs.It involves analyzing and synthesizing relevant research papers, reports…

Abstract

Purpose

This study aims to adopt a systematic review approach to examine the existing literature on law and LLMs.It involves analyzing and synthesizing relevant research papers, reports and scholarly articles that discuss the use of LLMs in the legal domain. The review encompasses various aspects, including an analysis of LLMs, legal natural language processing (NLP), model tuning techniques, data processing strategies and frameworks for addressing the challenges associated with legal question-and-answer (Q&A) systems. Additionally, the study explores potential applications and services that can benefit from the integration of LLMs in the field of intelligent justice.

Design/methodology/approach

This paper surveys the state-of-the-art research on law LLMs and their application in the field of intelligent justice. The study aims to identify the challenges associated with developing Q&A systems based on LLMs and explores potential directions for future research and development. The ultimate goal is to contribute to the advancement of intelligent justice by effectively leveraging LLMs.

Findings

To effectively apply a law LLM, systematic research on LLM, legal NLP and model adjustment technology is required.

Originality/value

This study contributes to the field of intelligent justice by providing a comprehensive review of the current state of research on law LLMs.

Details

International Journal of Web Information Systems, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 28 November 2023

Mohamad Javad Baghiat Esfahani and Saeed Ketabi

This study attempts to evaluate the effect of the corpus-based inductive teaching approach with multiple academic corpora (PICA, CAEC and Oxford Corpus of Academic English) and…

Abstract

Purpose

This study attempts to evaluate the effect of the corpus-based inductive teaching approach with multiple academic corpora (PICA, CAEC and Oxford Corpus of Academic English) and conventional deductive teaching approach (i.e., multiple-choice items, filling the gap, matching and underlining) on learning academic collocations by Iranian advanced EFL learners (students learning English as a foreign language).

Design/methodology/approach

This is a quasi-experimental, quantitative and qualitative study.

Findings

The result showed the experimental group outperformed significantly compared with the control group. The experimental group also shared their perception of the advantages and disadvantages of the corpus-assisted language teaching approach.

Originality/value

Despite growing progress in language pedagogy, methodologies and language curriculum design, there are still many teachers who experience poor performance in their students' vocabulary, whether in comprehension or production. In Iran, for example, even though mandatory English education begins at the age of 13, which is junior and senior high school, students still have serious problems in language production and comprehension when they reach university levels.

Details

Journal of Applied Research in Higher Education, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-7003

Keywords

Article
Publication date: 12 September 2022

Adah-Kole Onjewu, Razieh Sadraei and Vahid Jafari-Sadeghi

In spite of wide civic and academic interest in obesity, there are no bibliometric records of this issue in the marketing corpus. Thus, this inquiry is conceived to address this…

Abstract

Purpose

In spite of wide civic and academic interest in obesity, there are no bibliometric records of this issue in the marketing corpus. Thus, this inquiry is conceived to address this shortcoming with a bibliometric analysis of Scopus indexed articles published on the subject.

Design/methodology/approach

The analysis followed a five-step science mapping approach of study design, data collection, data analysis, data visualisation and data interpretation. R programming software was used to review 88 peer reviewed journals published between 1987 and 2021.

Findings

A sizable stream of literature exploring obesity has accrued in the marketing area as authors have drawn parallels between the influence of persuasive communication and advertising on human wellbeing and child health. The United States of America is found to be by far the country with the highest number of publications on obesity, followed by Australia and the United Kingdom. The topic dendrogram indicates two strands of obesity discourse: (1) social and policy intervention opportunities and (2) the effects on social groups in the population.

Research limitations/implications

This review will shape future enquiries investigating obesity. Beyond the focus on children, males and females, an emerging focus on cola, ethics, food waste, milk, policy-making and students is highlighted.

Originality/value

This is the first bibliometric review of obesity in the marketing literature. This is especially timely for weighing up the utility of research aimed at understanding and reporting the trends, influences and role of stakeholders in addressing obesity.

Details

EuroMed Journal of Business, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1450-2194

Keywords

Article
Publication date: 25 July 2023

Aida Khakimova, Oleg Zolotarev and Sanjay Kaushal

Effective communication is crucial in the medical field where different stakeholders use various terminologies to describe and classify healthcare concepts such as ICD, SNOMED CT…

Abstract

Purpose

Effective communication is crucial in the medical field where different stakeholders use various terminologies to describe and classify healthcare concepts such as ICD, SNOMED CT, UMLS and MeSH, but the problem of polysemy can make natural language processing difficult. This study explores the contextual meanings of the term “pattern” in the biomedical literature, compares them to existing definitions, annotates a corpus for use in machine learning and proposes new definitions of terms such as “Syndrome, feature” and “pattern recognition.”

Design/methodology/approach

Entrez API was used to retrieve articles form PubMed for the study which assembled a corpus of 398 articles using a search query for the ambiguous term “pattern” in the titles or abstracts. The python NLTK library was used to extract the terms and their contexts, and an expert check was carried out. To understand the various meanings of the term, the contextual environment was analyzed by extracting the surrounding words of the term. The expert determined the appropriate size of the context for analysis to gain a more nuanced understanding of the different meanings of the term pattern.

Findings

The study found that the categories of meanings of the term “pattern” are broader in biomedical publications than in common definitions, and new categories have been emerging from the term's use in the biomedical field. The study highlights the importance of annotated corpora in advancing natural language processing techniques and provides valuable insights into the nuances of biomedical language.

Originality/value

The study's findings demonstrate the importance of exploring contextual meanings and proposing new definitions of terms in the biomedical field to improve natural language processing techniques.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 31 October 2023

Hong Zhou, Binwei Gao, Shilong Tang, Bing Li and Shuyu Wang

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly…

Abstract

Purpose

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly promote the overall performance of the project life cycle. The miss of clauses may result in a failure to match with standard contracts. If the contract, modified by the owner, omits key clauses, potential disputes may lead to contractors paying substantial compensation. Therefore, the identification of construction project contract missing clauses has heavily relied on the manual review technique, which is inefficient and highly restricted by personnel experience. The existing intelligent means only work for the contract query and storage. It is urgent to raise the level of intelligence for contract clause management. Therefore, this paper aims to propose an intelligent method to detect construction project contract missing clauses based on Natural Language Processing (NLP) and deep learning technology.

Design/methodology/approach

A complete classification scheme of contract clauses is designed based on NLP. First, construction contract texts are pre-processed and converted from unstructured natural language into structured digital vector form. Following the initial categorization, a multi-label classification of long text construction contract clauses is designed to preliminary identify whether the clause labels are missing. After the multi-label clause missing detection, the authors implement a clause similarity algorithm by creatively integrating the image detection thought, MatchPyramid model, with BERT to identify missing substantial content in the contract clauses.

Findings

1,322 construction project contracts were tested. Results showed that the accuracy of multi-label classification could reach 93%, the accuracy of similarity matching can reach 83%, and the recall rate and F1 mean of both can reach more than 0.7. The experimental results verify the feasibility of intelligently detecting contract risk through the NLP-based method to some extent.

Originality/value

NLP is adept at recognizing textual content and has shown promising results in some contract processing applications. However, the mostly used approaches of its utilization for risk detection in construction contract clauses predominantly are rule-based, which encounter challenges when handling intricate and lengthy engineering contracts. This paper introduces an NLP technique based on deep learning which reduces manual intervention and can autonomously identify and tag types of contractual deficiencies, aligning with the evolving complexities anticipated in future construction contracts. Moreover, this method achieves the recognition of extended contract clause texts. Ultimately, this approach boasts versatility; users simply need to adjust parameters such as segmentation based on language categories to detect omissions in contract clauses of diverse languages.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Open Access
Article
Publication date: 1 February 2024

Christian Schwägerl, Peter Stücheli-Herlach, Philipp Dreesen and Julia Krasselt

This study operationalizes risks in stakeholder dialog (SD). It conceptualizes SD as co-produced organizational discourse and examines the capacities of organizers' and…

Abstract

Purpose

This study operationalizes risks in stakeholder dialog (SD). It conceptualizes SD as co-produced organizational discourse and examines the capacities of organizers' and stakeholders' practices to create a shared understanding of an organization’s risks to their mutual benefit. The meetings and online forum of a German public service media (PSM) organization were used as a case study.

Design/methodology/approach

The authors applied corpus-driven linguistic discourse analysis (topic modeling) to analyze citizens' (n = 2,452) forum posts (n = 14,744). Conversation analysis was used to examine video-recorded online meetings.

Findings

Organizers suspended actors' reciprocity in meetings. In the forums, topics emerged autonomously. Citizens' articulation of their identities was more diverse than the categories the organizer provided, and organizers did not respond to the autonomous emergence of contextualizations of citizens' perceptions of PSM performance in relation to their identities. The results suggest that risks arise from interactionally achieved occasions that prevent reasoned agreement and from actors' practices, which constituted autonomous discursive formations of topics and identities in the forums.

Originality/value

This study disentangles actors' practices, mutuality orientation and risk enactment during SD. It advances the methodological knowledge of strategic communication research on SD, utilizing social constructivist research methods to examine the contingencies of organization-stakeholder interaction in SD.

Article
Publication date: 2 August 2022

Zhongbao Liu and Wenjuan Zhao

The research on structure function recognition mainly concentrates on identifying a specific part of academic literature and its applicability in the multidiscipline perspective…

Abstract

Purpose

The research on structure function recognition mainly concentrates on identifying a specific part of academic literature and its applicability in the multidiscipline perspective. A specific part of academic literature, such as sentences, paragraphs and chapter contents are also called a level of academic literature in this paper. There are a few comparative research works on the relationship between models, disciplines and levels in the process of structure function recognition. In view of this, comparative research on structure function recognition based on deep learning has been conducted in this paper.

Design/methodology/approach

An experimental corpus, including the academic literature of traditional Chinese medicine, library and information science, computer science, environmental science and phytology, was constructed. Meanwhile, deep learning models such as convolutional neural networks (CNN), long and short-term memory (LSTM) and bidirectional encoder representation from transformers (BERT) were used. The comparative experiments of structure function recognition were conducted with the help of the deep learning models from the multilevel perspective.

Findings

The experimental results showed that (1) the BERT model performed best, with F1 values of 78.02, 89.41 and 94.88%, respectively at the level of sentence, paragraph and chapter content. (2) The deep learning models performed better on the academic literature of traditional Chinese medicine than on other disciplines in most cases, e.g. F1 values of CNN, LSTM and BERT, respectively arrived at 71.14, 69.96 and 78.02% at the level of sentence. (3) The deep learning models performed better at the level of chapter content than other levels, the maximum F1 values of CNN, LSTM and BERT at 91.92, 74.90 and 94.88%, respectively. Furthermore, the confusion matrix of recognition results on the academic literature was introduced to find out the reason for misrecognition.

Originality/value

This paper may inspire other research on structure function recognition, and provide a valuable reference for the analysis of influencing factors.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 16 April 2024

Himani Sharma, Varsha Jain, Emmanuel Mogaji and Anantha S. Babbilid

Proponents of micro-credentials envision them as vehicles for upskilling or re-skilling individuals. The study examines how integrating micro-credentials in the higher education…

Abstract

Purpose

Proponents of micro-credentials envision them as vehicles for upskilling or re-skilling individuals. The study examines how integrating micro-credentials in the higher education ecosystem enhances employability. It aims to offer insights from the perspective of stakeholders who may benefit from these credentials at an institutional or individual level.

Design/methodology/approach

Online in-depth interviews are conducted with 65 participants from India, Nigeria, the United Arab Emirates and the United Kingdom to explore how micro-credentials can be a valuable addition to the higher education ecosystem. A multi-stakeholder approach is adopted to collect data.

Findings

The analysis highlights two possible methods of integrating micro-credentials into the higher education ecosystem. First, micro-credentials-driven courses can be offered using a blended approach that provides a flexible learning path. Second, there is also the possibility of wide-scale integration of micro-credentials as an outcome of standalone online programs. However, the effectiveness of such programs is driven by enablers like student profiles, standardization and the dynamics of the labor market. Finally, the study stipulates that micro-credentials can enhance employability.

Originality/value

The study's findings suggest that, for successful integration of micro-credentials, an operational understanding of micro-credentials, their enablers and strategic deliberation are critical in higher education. Institutions must identify the determinants, address technological limitations and select a suitable delivery mode to accelerate integration. However, micro-credentials can augment employability, considering the increasing emphasis on lifelong learning. An overview of the findings is presented through a comprehensive framework.

Details

International Journal of Educational Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0951-354X

Keywords

1 – 10 of 94