Search results

1 – 9 of 9
Article
Publication date: 25 October 2022

Victor Diogho Heuer de Carvalho and Ana Paula Cabral Seixas Costa

This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is…

Abstract

Purpose

This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is supporting analyses, so security authorities can make appropriate decisions about their actions.

Design/methodology/approach

The corpora were obtained through web scraping from a newspaper's website and tweets from a Brazilian metropolitan region. Natural language processing was applied considering: text cleaning, lemmatization, summarization, part-of-speech and dependencies parsing, named entities recognition, and topic modeling.

Findings

Several results were obtained based on the methodology used, highlighting some: an example of a summarization using an automated process; dependency parsing; the most common topics in each corpus; the forty named entities and the most common slogans were extracted, highlighting those linked to public security.

Research limitations/implications

Some critical tasks were identified for the research perspective, related to the applied methodology: the treatment of noise from obtaining news on their source websites, passing through textual elements quite present in social network posts such as abbreviations, emojis/emoticons, and even writing errors; the treatment of subjectivity, to eliminate noise from irony and sarcasm; the search for authentic news of issues within the target domain. All these tasks aim to improve the process to enable interested authorities to perform accurate analyses.

Practical implications

The corpora dedicated to the public security domain enable several analyses, such as mining public opinion on security actions in a given location; understanding criminals' behaviors reported in the news or even on social networks and drawing their attitudes timeline; detecting movements that may cause damage to public property and people welfare through texts from social networks; extracting the history and repercussions of police actions, crossing news with records on social networks; among many other possibilities.

Originality/value

The work on behalf of the corpora reported in this text represents one of the first initiatives to create textual bases in Portuguese, dedicated to Brazil's specific public security domain.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 17 July 2020

It has come to the attention of the publisher that the article Zeroual, I. and Lakhouana, A. (2020), “MulTed: a multilingual aligned and tagged parallel corpus”, published in…

Abstract

It has come to the attention of the publisher that the article Zeroual, I. and Lakhouana, A. (2020), “MulTed: a multilingual aligned and tagged parallel corpus”, published in Applied Computing and Informatics, https://doi.org/10.1016/ACI-j.aci.2018.12.003 was published twice due to a production error while onboarding the journal. The original article can be seen here: https://doi.org/10.1016/j.aci.2018.12.003

Details

Applied Computing and Informatics, vol. no.
Type: Research Article
ISSN: 2210-8327

Article
Publication date: 1 April 2024

Xiaoxian Yang, Zhifeng Wang, Qi Wang, Ke Wei, Kaiqi Zhang and Jiangang Shi

This study aims to adopt a systematic review approach to examine the existing literature on law and LLMs.It involves analyzing and synthesizing relevant research papers, reports…

Abstract

Purpose

This study aims to adopt a systematic review approach to examine the existing literature on law and LLMs.It involves analyzing and synthesizing relevant research papers, reports and scholarly articles that discuss the use of LLMs in the legal domain. The review encompasses various aspects, including an analysis of LLMs, legal natural language processing (NLP), model tuning techniques, data processing strategies and frameworks for addressing the challenges associated with legal question-and-answer (Q&A) systems. Additionally, the study explores potential applications and services that can benefit from the integration of LLMs in the field of intelligent justice.

Design/methodology/approach

This paper surveys the state-of-the-art research on law LLMs and their application in the field of intelligent justice. The study aims to identify the challenges associated with developing Q&A systems based on LLMs and explores potential directions for future research and development. The ultimate goal is to contribute to the advancement of intelligent justice by effectively leveraging LLMs.

Findings

To effectively apply a law LLM, systematic research on LLM, legal NLP and model adjustment technology is required.

Originality/value

This study contributes to the field of intelligent justice by providing a comprehensive review of the current state of research on law LLMs.

Details

International Journal of Web Information Systems, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 25 July 2023

Aida Khakimova, Oleg Zolotarev and Sanjay Kaushal

Effective communication is crucial in the medical field where different stakeholders use various terminologies to describe and classify healthcare concepts such as ICD, SNOMED CT…

Abstract

Purpose

Effective communication is crucial in the medical field where different stakeholders use various terminologies to describe and classify healthcare concepts such as ICD, SNOMED CT, UMLS and MeSH, but the problem of polysemy can make natural language processing difficult. This study explores the contextual meanings of the term “pattern” in the biomedical literature, compares them to existing definitions, annotates a corpus for use in machine learning and proposes new definitions of terms such as “Syndrome, feature” and “pattern recognition.”

Design/methodology/approach

Entrez API was used to retrieve articles form PubMed for the study which assembled a corpus of 398 articles using a search query for the ambiguous term “pattern” in the titles or abstracts. The python NLTK library was used to extract the terms and their contexts, and an expert check was carried out. To understand the various meanings of the term, the contextual environment was analyzed by extracting the surrounding words of the term. The expert determined the appropriate size of the context for analysis to gain a more nuanced understanding of the different meanings of the term pattern.

Findings

The study found that the categories of meanings of the term “pattern” are broader in biomedical publications than in common definitions, and new categories have been emerging from the term's use in the biomedical field. The study highlights the importance of annotated corpora in advancing natural language processing techniques and provides valuable insights into the nuances of biomedical language.

Originality/value

The study's findings demonstrate the importance of exploring contextual meanings and proposing new definitions of terms in the biomedical field to improve natural language processing techniques.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Open Access
Article
Publication date: 29 June 2022

Ibtissam Touahri

This paper purposed a multi-facet sentiment analysis system.

Abstract

Purpose

This paper purposed a multi-facet sentiment analysis system.

Design/methodology/approach

Hence, This paper uses multidomain resources to build a sentiment analysis system. The manual lexicon based features that are extracted from the resources are fed into a machine learning classifier to compare their performance afterward. The manual lexicon is replaced with a custom BOW to deal with its time consuming construction. To help the system run faster and make the model interpretable, this will be performed by employing different existing and custom approaches such as term occurrence, information gain, principal component analysis, semantic clustering, and POS tagging filters.

Findings

The proposed system featured by lexicon extraction automation and characteristics size optimization proved its efficiency when applied to multidomain and benchmark datasets by reaching 93.59% accuracy which makes it competitive to the state-of-the-art systems.

Originality/value

The construction of a custom BOW. Optimizing features based on existing and custom feature selection and clustering approaches.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 27 February 2023

Dilawar Ali, Kenzo Milleville, Steven Verstockt, Nico Van de Weghe, Sally Chambers and Julie M. Birkholz

Historical newspaper collections provide a wealth of information about the past. Although the digitization of these collections significantly improves their accessibility, a large…

Abstract

Purpose

Historical newspaper collections provide a wealth of information about the past. Although the digitization of these collections significantly improves their accessibility, a large portion of digitized historical newspaper collections, such as those of KBR, the Royal Library of Belgium, are not yet searchable at article-level. However, recent developments in AI-based research methods, such as document layout analysis, have the potential for further enriching the metadata to improve the searchability of these historical newspaper collections. This paper aims to discuss the aforementioned issue.

Design/methodology/approach

In this paper, the authors explore how existing computer vision and machine learning approaches can be used to improve access to digitized historical newspapers. To do this, the authors propose a workflow, using computer vision and machine learning approaches to (1) provide article-level access to digitized historical newspaper collections using document layout analysis, (2) extract specific types of articles (e.g. feuilletons – literary supplements from Le Peuple from 1938), (3) conduct image similarity analysis using (un)supervised classification methods and (4) perform named entity recognition (NER) to link the extracted information to open data.

Findings

The results show that the proposed workflow improves the accessibility and searchability of digitized historical newspapers, and also contributes to the building of corpora for digital humanities research. The AI-based methods enable automatic extraction of feuilletons, clustering of similar images and dynamic linking of related articles.

Originality/value

The proposed workflow enables automatic extraction of articles, including detection of a specific type of article, such as a feuilleton or literary supplement. This is particularly valuable for humanities researchers as it improves the searchability of these collections and enables corpora to be built around specific themes. Article-level access to, and improved searchability of, KBR's digitized newspapers are demonstrated through the online tool (https://tw06v072.ugent.be/kbr/).

Article
Publication date: 5 October 2022

Mahvia Gull, Muhammad Aqeel, Aniqa Kanwal, Kamran Khan and Tanvir Akhtar

Despite the fact that shame is recognized as a significant factor in clinical encounters, it is under-recognized, under-researched and under-theorized in health prevention…

Abstract

Purpose

Despite the fact that shame is recognized as a significant factor in clinical encounters, it is under-recognized, under-researched and under-theorized in health prevention, assessment and cross-cultural contexts. Thus, this study aims to investigate the psychometric properties of the most widely used scale, the “Other as Shamer Scale” (OAS), to assess the risk and proclivities of external shame in adults. As in health care, there is a barrier between what is known through research in one culture and what is acceptable in practice in another culture.

Design/methodology/approach

The Urdu version was prepared using the standard back-translation method, and the study was conducted from June 2021 to January 2022. The translation and adaptation were completed in four steps: forward translation, adaptation and translation, back translation, committee approach and cross-language validation. The sample, selected through the purposive sampling method, is comprised of 200 adults (men = 100 and women = 100), with an age range of 18–60 years (M = 28, SD = 5.5), spanning all stages of life. The Cronbach's alpha reliability and factorial validity of the OAS were assessed through confirmatory factor analysis and Pearson correlation analyses. Internal consistency and test–retest reliability (at a two-week interval) were used to evaluate the reliability. Statistical analyses were performed using Statistical Package for Social Sciences (version 22) software.

Findings

Preliminary analysis revealed that the overall instrument had good internal consistency (Urdu OAS a = 0.91; English OAS a = 0.92) as well as test–retest correlation coefficients for 15 days (r = 0.88). The factor loading of all items ranged from 0.69 to 0.9, which explained the significant level and indicated the model's overall goodness of fit.

Originality/value

Findings suggest that this scale has significant psychometric properties and the potential to be used as a valid, reliable and cost-effective clinical and research instrument. This study contributes to scientific knowledge and helps to develop and test indigenous cross-cultural instruments that can be used to examine external shame in Pakistani people.

Details

International Journal of Human Rights in Healthcare, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2056-4902

Keywords

Article
Publication date: 16 February 2024

Khameel B. Mustapha, Eng Hwa Yap and Yousif Abdalla Abakr

Following the recent rise in generative artificial intelligence (GenAI) tools, fundamental questions about their wider impacts have started to reverberate around various…

Abstract

Purpose

Following the recent rise in generative artificial intelligence (GenAI) tools, fundamental questions about their wider impacts have started to reverberate around various disciplines. This study aims to track the unfolding landscape of general issues surrounding GenAI tools and to elucidate the specific opportunities and limitations of these tools as part of the technology-assisted enhancement of mechanical engineering education and professional practices.

Design/methodology/approach

As part of the investigation, the authors conduct and present a brief scientometric analysis of recently published studies to unravel the emerging trend on the subject matter. Furthermore, experimentation was done with selected GenAI tools (Bard, ChatGPT, DALL.E and 3DGPT) for mechanical engineering-related tasks.

Findings

The study identified several pedagogical and professional opportunities and guidelines for deploying GenAI tools in mechanical engineering. Besides, the study highlights some pitfalls of GenAI tools for analytical reasoning tasks (e.g., subtle errors in computation involving unit conversions) and sketching/image generation tasks (e.g., poor demonstration of symmetry).

Originality/value

To the best of the authors’ knowledge, this study presents the first thorough assessment of the potential of GenAI from the lens of the mechanical engineering field. Combining scientometric analysis, experimentation and pedagogical insights, the study provides a unique focus on the implications of GenAI tools for material selection/discovery in product design, manufacturing troubleshooting, technical documentation and product positioning, among others.

Details

Interactive Technology and Smart Education, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1741-5659

Keywords

Article
Publication date: 30 June 2023

Ruan Wang, Jun Deng, Xinhui Guan and Yuming He

With the development of data mining technology, diverse and broader domain knowledge can be extracted automatically. However, the research on applying knowledge mapping and data…

159

Abstract

Purpose

With the development of data mining technology, diverse and broader domain knowledge can be extracted automatically. However, the research on applying knowledge mapping and data visualization techniques to genealogical data is limited. This paper aims to fill this research gap by providing a systematic framework and process guidance for practitioners seeking to uncover hidden knowledge from genealogy.

Design/methodology/approach

Based on a literature review of genealogy's current knowledge reasoning research, the authors constructed an integrated framework for knowledge inference and visualization application using a knowledge graph. Additionally, the authors applied this framework in a case study using “Manchu Clan Genealogy” as the data source.

Findings

The case study shows that the proposed framework can effectively decompose and reconstruct genealogy. It demonstrates the reasoning, discovery, and web visualization application process of implicit information in genealogy. It enhances the effective utilization of Manchu genealogy resources by highlighting the intricate relationships among people, places, and time entities.

Originality/value

This study proposed a framework for genealogy knowledge reasoning and visual analysis utilizing a knowledge graph, including five dimensions: the target layer, the resource layer, the data layer, the inference layer, and the application layer. It helps to gather the scattered genealogy information and establish a data network with semantic correlations while establishing reasoning rules to enable inference discovery and visualization of hidden relationships.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

1 – 9 of 9