Search results

1 – 10 of over 24000
Article
Publication date: 8 February 2013

Fattane Zarrinkalam and Mohsen Kahani

The purpose of this paper is to propose a novel citation recommendation system that inputs a text and recommends publications that should be cited by it. Its goal is to help…

Abstract

Purpose

The purpose of this paper is to propose a novel citation recommendation system that inputs a text and recommends publications that should be cited by it. Its goal is to help researchers in finding related works. Further, this paper seeks to explore the effect of using relational features in addition to textual features on the quality of recommended citations.

Design/methodology/approach

In order to propose a novel citation recommendation system, first a new relational similarity measure is proposed for calculating the relatedness of two publications. Then, a recommendation algorithm is presented that uses both relational and textual features to compute the semantic distances of publications of a bibliographic dataset from the input text.

Findings

The evaluation of the proposed system shows that combining relational features with textual features leads to better recommendations, in comparison with relying only on the textual features. It also demonstrates that citation context plays an important role among textual features. In addition, it is concluded that different relational features have different contributions to the proposed similarity measure.

Originality/value

A new citation recommendation system is proposed which uses a novel semantic distance measure. This measure is based on textual similarities and a new relational similarity concept. The other contribution of this paper is that it sheds more light on the importance of citation context in citation recommendation, by providing more evidences through analysis of the results. In addition, a genetic algorithm is developed for assigning weights to the relational features in the similarity measure.

Article
Publication date: 1 March 1980

Karen Markey, Pauline Atherton and Claudia Newton

A series of studies was performed in late 1978 and early 1979 by the ERIC Clearinghouse on Information Resources Special Project staff and consultants to discover how ERIC online…

Abstract

A series of studies was performed in late 1978 and early 1979 by the ERIC Clearinghouse on Information Resources Special Project staff and consultants to discover how ERIC online searchers were using free and controlled vocabulary in their search statements. Over 40 experienced ERIC online searchers were interviewed and more than half of these volunteered to show us their activity online. Over 650 search traces from 18 different ERIC online searchers using three different retrieval systems were analyzed. Findings show that controlled vocabulary terms are used in 68% of ERIC searches. Controlled vocabulary searching is higher if only terms which resulted in offline printed output are considered. The use of free text was more carefully scrutinized to see if we could discover ways to improve the ERIC database and its online access, since free text searchers have few online search aids besides the basic index with postings information. The vocabulary in 165 free text search statements was analyzed to determine whether ERIC descriptors and identifiers were incapable of representing these concepts. Information for every free text statement was sought on the following questions: (1) Is it a descriptor? (2) Is it cross‐referenced to an accepted term? (3) Is it a variant of a descriptor? (4) Is it expressible using descriptors? (5) Is it expressible using identifiers? (6) Is it only expressible using free text? Five tables summarize the output results of our comparative analysis of free text and controlled vocabulary searching. Six categories of search concepts were discovered for which searchers could only input free text terms or phrases. Online searching tests comparing free and controlled vocabulary were performed using six search topics. The free text search formulations had higher recall and controlled vocabulary formulations had higher precision. Search objective (high recall or high precision), we concluded, should dictate which formulation to use. All our findings helped point to the need for new searching aids for online ERIC searchers. Four displays exhibit what these might be; for example, an on‐line rotated descriptor or identifier display, and some linkage between free text and the Thesaurus. Responsible agents for effecting changes in searching ERIC online are identified and suggestions for improvements are addressed to these three agents.

Details

Online Review, vol. 4 no. 3
Type: Research Article
ISSN: 0309-314X

Article
Publication date: 3 November 2020

Jagroop Kaur and Jaswinder Singh

Normalization is an important step in all the natural language processing applications that are handling social media text. The text from social media poses a different kind of…

Abstract

Purpose

Normalization is an important step in all the natural language processing applications that are handling social media text. The text from social media poses a different kind of problems that are not present in regular text. Recently, a considerable amount of work has been done in this direction, but mostly in the English language. People who do not speak English code mixed the text with their native language and posted text on social media using the Roman script. This kind of text further aggravates the problem of normalizing. This paper aims to discuss the concept of normalization with respect to code-mixed social media text, and a model has been proposed to normalize such text.

Design/methodology/approach

The system is divided into two phases – candidate generation and most probable sentence selection. Candidate generation task is treated as machine translation task where the Roman text is treated as source language and Gurmukhi text is treated as the target language. Character-based translation system has been proposed to generate candidate tokens. Once candidates are generated, the second phase uses the beam search method for selecting the most probable sentence based on hidden Markov model.

Findings

Character error rate (CER) and bilingual evaluation understudy (BLEU) score are reported. The proposed system has been compared with Akhar software and RB\_R2G system, which are also capable of transliterating Roman text to Gurmukhi. The performance of the system outperforms Akhar software. The CER and BLEU scores are 0.268121 and 0.6807939, respectively, for ill-formed text.

Research limitations/implications

It was observed that the system produces dialectical variations of a word or the word with minor errors like diacritic missing. Spell checker can improve the output of the system by correcting these minor errors. Extensive experimentation is needed for optimizing language identifier, which will further help in improving the output. The language model also seeks further exploration. Inclusion of wider context, particularly from social media text, is an important area that deserves further investigation.

Practical implications

The practical implications of this study are: (1) development of parallel dataset containing Roman and Gurmukhi text; (2) development of dataset annotated with language tag; (3) development of the normalizing system, which is first of its kind and proposes translation based solution for normalizing noisy social media text from Roman to Gurmukhi. It can be extended for any pair of scripts. (4) The proposed system can be used for better analysis of social media text. Theoretically, our study helps in better understanding of text normalization in social media context and opens the doors for further research in multilingual social media text normalization.

Originality/value

Existing research work focus on normalizing monolingual text. This study contributes towards the development of a normalization system for multilingual text.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 30 July 2020

V. Srilakshmi, K. Anuradha and C. Shoba Bindu

This paper aims to model a technique that categorizes the texts from huge documents. The progression in internet technologies has raised the count of document accessibility, and…

Abstract

Purpose

This paper aims to model a technique that categorizes the texts from huge documents. The progression in internet technologies has raised the count of document accessibility, and thus the documents available online become countless. The text documents comprise of research article, journal papers, newspaper, technical reports and blogs. These large documents are useful and valuable for processing real-time applications. Also, these massive documents are used in several retrieval methods. Text classification plays a vital role in information retrieval technologies and is considered as an active field for processing massive applications. The aim of text classification is to categorize the large-sized documents into different categories on the basis of its contents. There exist numerous methods for performing text-related tasks such as profiling users, sentiment analysis and identification of spams, which is considered as a supervised learning issue and is addressed with text classifier.

Design/methodology/approach

At first, the input documents are pre-processed using the stop word removal and stemming technique such that the input is made effective and capable for feature extraction. In the feature extraction process, the features are extracted using the vector space model (VSM) and then, the feature selection is done for selecting the highly relevant features to perform text categorization. Once the features are selected, the text categorization is progressed using the deep belief network (DBN). The training of the DBN is performed using the proposed grasshopper crow optimization algorithm (GCOA) that is the integration of the grasshopper optimization algorithm (GOA) and Crow search algorithm (CSA). Moreover, the hybrid weight bounding model is devised using the proposed GCOA and range degree. Thus, the proposed GCOA + DBN is used for classifying the text documents.

Findings

The performance of the proposed technique is evaluated using accuracy, precision and recall is compared with existing techniques such as naive bayes, k-nearest neighbors, support vector machine and deep convolutional neural network (DCNN) and Stochastic Gradient-CAViaR + DCNN. Here, the proposed GCOA + DBN has improved performance with the values of 0.959, 0.959 and 0.96 for precision, recall and accuracy, respectively.

Originality/value

This paper proposes a technique that categorizes the texts from massive sized documents. From the findings, it can be shown that the proposed GCOA-based DBN effectively classifies the text documents.

Details

International Journal of Web Information Systems, vol. 16 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 31 October 2023

Hong Zhou, Binwei Gao, Shilong Tang, Bing Li and Shuyu Wang

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly…

Abstract

Purpose

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly promote the overall performance of the project life cycle. The miss of clauses may result in a failure to match with standard contracts. If the contract, modified by the owner, omits key clauses, potential disputes may lead to contractors paying substantial compensation. Therefore, the identification of construction project contract missing clauses has heavily relied on the manual review technique, which is inefficient and highly restricted by personnel experience. The existing intelligent means only work for the contract query and storage. It is urgent to raise the level of intelligence for contract clause management. Therefore, this paper aims to propose an intelligent method to detect construction project contract missing clauses based on Natural Language Processing (NLP) and deep learning technology.

Design/methodology/approach

A complete classification scheme of contract clauses is designed based on NLP. First, construction contract texts are pre-processed and converted from unstructured natural language into structured digital vector form. Following the initial categorization, a multi-label classification of long text construction contract clauses is designed to preliminary identify whether the clause labels are missing. After the multi-label clause missing detection, the authors implement a clause similarity algorithm by creatively integrating the image detection thought, MatchPyramid model, with BERT to identify missing substantial content in the contract clauses.

Findings

1,322 construction project contracts were tested. Results showed that the accuracy of multi-label classification could reach 93%, the accuracy of similarity matching can reach 83%, and the recall rate and F1 mean of both can reach more than 0.7. The experimental results verify the feasibility of intelligently detecting contract risk through the NLP-based method to some extent.

Originality/value

NLP is adept at recognizing textual content and has shown promising results in some contract processing applications. However, the mostly used approaches of its utilization for risk detection in construction contract clauses predominantly are rule-based, which encounter challenges when handling intricate and lengthy engineering contracts. This paper introduces an NLP technique based on deep learning which reduces manual intervention and can autonomously identify and tag types of contractual deficiencies, aligning with the evolving complexities anticipated in future construction contracts. Moreover, this method achieves the recognition of extended contract clause texts. Ultimately, this approach boasts versatility; users simply need to adjust parameters such as segmentation based on language categories to detect omissions in contract clauses of diverse languages.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 25 April 2024

Abdul-Manan Sadick, Argaw Gurmu and Chathuri Gunarathna

Developing a reliable cost estimate at the early stage of construction projects is challenging due to inadequate project information. Most of the information during this stage is…

Abstract

Purpose

Developing a reliable cost estimate at the early stage of construction projects is challenging due to inadequate project information. Most of the information during this stage is qualitative, posing additional challenges to achieving accurate cost estimates. Additionally, there is a lack of tools that use qualitative project information and forecast the budgets required for project completion. This research, therefore, aims to develop a model for setting project budgets (excluding land) during the pre-conceptual stage of residential buildings, where project information is mainly qualitative.

Design/methodology/approach

Due to the qualitative nature of project information at the pre-conception stage, a natural language processing model, DistilBERT (Distilled Bidirectional Encoder Representations from Transformers), was trained to predict the cost range of residential buildings at the pre-conception stage. The training and evaluation data included 63,899 building permit activity records (2021–2022) from the Victorian State Building Authority, Australia. The input data comprised the project description of each record, which included project location and basic material types (floor, frame, roofing, and external wall).

Findings

This research designed a novel tool for predicting the project budget based on preliminary project information. The model achieved 79% accuracy in classifying residential buildings into three cost_classes ($100,000-$300,000, $300,000-$500,000, $500,000-$1,200,000) and F1-scores of 0.85, 0.73, and 0.74, respectively. Additionally, the results show that the model learnt the contextual relationship between qualitative data like project location and cost.

Research limitations/implications

The current model was developed using data from Victoria state in Australia; hence, it would not return relevant outcomes for other contexts. However, future studies can adopt the methods to develop similar models for their context.

Originality/value

This research is the first to leverage a deep learning model, DistilBERT, for cost estimation at the pre-conception stage using basic project information like location and material types. Therefore, the model would contribute to overcoming data limitations for cost estimation at the pre-conception stage. Residential building stakeholders, like clients, designers, and estimators, can use the model to forecast the project budget at the pre-conception stage to facilitate decision-making.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 30 January 2023

Zhongbao Liu and Wenjuan Zhao

In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly…

Abstract

Purpose

In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly. It is not practical to directly migrate achievements obtained in English sentiment analysis to the analysis of Chinese because of the huge difference between the two languages.

Design/methodology/approach

In view of the particularity of Chinese text and the requirement of sentiment analysis, a Chinese sentiment analysis model integrating multi-granularity semantic features is proposed in this paper. This model introduces the radical and part-of-speech features based on the character and word features, with the application of bidirectional long short-term memory, attention mechanism and recurrent convolutional neural network.

Findings

The comparative experiments showed that the F1 values of this model reaches 88.28 and 84.80 per cent on the man-made dataset and the NLPECC dataset, respectively. Meanwhile, an ablation experiment was conducted to verify the effectiveness of attention mechanism, part of speech, radical, character and word factors in Chinese sentiment analysis. The performance of the proposed model exceeds that of existing models to some extent.

Originality/value

The academic contribution of this paper is as follows: first, in view of the particularity of Chinese texts and the requirement of sentiment analysis, this paper focuses on solving the deficiency problem of Chinese sentiment analysis under the big data context. Second, this paper borrows ideas from multiple interdisciplinary frontier theories and methods, such as information science, linguistics and artificial intelligence, which makes it innovative and comprehensive. Finally, this paper deeply integrates multi-granularity semantic features such as character, word, radical and part of speech, which further complements the theoretical framework and method system of Chinese sentiment analysis.

Details

Data Technologies and Applications, vol. 57 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 6 September 2022

Hanane Sebbaq and Nour-eddine El Faddouli

The purpose of this study is, First, to leverage the limitation of annotated data and to identify the cognitive level of learning objectives efficiently, this study adopts…

Abstract

Purpose

The purpose of this study is, First, to leverage the limitation of annotated data and to identify the cognitive level of learning objectives efficiently, this study adopts transfer learning by using word2vec and a bidirectional gated recurrent units (GRU) that can fully take into account the context and improves the classification of the model. This study adds a layer based on attention mechanism (AM), which captures the context vector and gives keywords higher weight for text classification. Second, this study explains the authors’ model’s results with local interpretable model-agnostic explanations (LIME).

Design/methodology/approach

Bloom's taxonomy levels of cognition are commonly used as a reference standard for identifying e-learning contents. Many action verbs in Bloom's taxonomy, however, overlap at different levels of the hierarchy, causing uncertainty regarding the cognitive level expected. Some studies have looked into the cognitive classification of e-learning content but none has looked into learning objectives. On the other hand, most of these research papers just adopt classical machine learning algorithms. The main constraint of this study is the availability of annotated learning objectives data sets. This study managed to build a data set of 2,400 learning objectives, but this size remains limited.

Findings

This study’s experiments show that the proposed model achieves highest scores of accuracy: 90.62%, F1-score and loss. The proposed model succeeds in classifying learning objectives, which contain ambiguous verb from the Bloom’s taxonomy action verbs, while the same model without the attention layer fails. This study’s LIME explainer aids in visualizing the most essential features of the text, which contributes to justifying the final classification.

Originality/value

In this study, the main objective is to propose a model that outperforms the baseline models for learning objectives classification based on the six cognitive levels of Bloom's taxonomy. In this sense, this study builds the bidirectional GRU (BiGRU)-attention model based on the combination of the BiGRU algorithm with the AM. This study feeds the architecture with word2vec embeddings. To prove the effectiveness of the proposed model, this study compares it with four classical machine learning algorithms that are widely used for the cognitive classification of text: Bayes naive, logistic regression, support vector machine and K-nearest neighbors and with GRU. The main constraint related to this study is the absence of annotated data; there is no annotated learning objective data set based on Bloom’s taxonomy's cognitive levels. To overcome this problem, this study seemed to have no choice but to build the data set.

Article
Publication date: 31 January 2022

Yejun Wu, Xiaxian Wang, Peilin Yu and YongKai Huang

The purpose of this research is to achieve automatic and accurate book purchase forecasts for the university libraries and improve efficiency of manual book purchase.

Abstract

Purpose

The purpose of this research is to achieve automatic and accurate book purchase forecasts for the university libraries and improve efficiency of manual book purchase.

Design/methodology/approach

The authors presented a Book Purchase Forecast model with A Lite BERT(ALBERT-BPF) to achieve their goals. First, the authors process all the book data to unify format of books' features, such as ISBN, title, authors, brief introduction and so on. Second, they exploit the book order data to label all books supplied by booksellers with “purchased” or “non-purchased”. The labelled data will be used for model training. Last, the authors regard the book purchase task as a text classification problem and present a model named ALBERT-BPF, which applies ALBERT to extract text features of books and BPF classification layer to forecast purchased books, to solve the problem.

Findings

The application of deep learning in book purchase task is effective. The data the authors exploited are the historical book purchase data from their university library. The authors’ experiments on the data show that ALBERT-BPF can seek out the books that need to be purchased with an accuracy of over 82%. And the highest accuracy reached is 88.06%. These indicate that the deep learning model is sufficient to assist the traditional manual book purchase way.

Originality/value

This research applies ALBERT, which is based on the latest Natural Language Processing (NLP) architecture Transformer, to library book purchase task.

Details

Aslib Journal of Information Management, vol. 74 no. 4
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 26 July 2021

Zekun Yang and Zhijie Lin

Tags help promote customer engagement on video-sharing platforms. Video tag recommender systems are artificial intelligence-enabled frameworks that strive for recommending precise…

813

Abstract

Purpose

Tags help promote customer engagement on video-sharing platforms. Video tag recommender systems are artificial intelligence-enabled frameworks that strive for recommending precise tags for videos. Extant video tag recommender systems are uninterpretable, which leads to distrust of the recommendation outcome, hesitation in tag adoption and difficulty in the system debugging process. This study aims at constructing an interpretable and novel video tag recommender system to assist video-sharing platform users in tagging their newly uploaded videos.

Design/methodology/approach

The proposed interpretable video tag recommender system is a multimedia deep learning framework composed of convolutional neural networks (CNNs), which receives texts and images as inputs. The interpretability of the proposed system is realized through layer-wise relevance propagation.

Findings

The case study and user study demonstrate that the proposed interpretable multimedia CNN model could effectively explain its recommended tag to users by highlighting keywords and key patches that contribute the most to the recommended tag. Moreover, the proposed model achieves an improved recommendation performance by outperforming state-of-the-art models.

Practical implications

The interpretability of the proposed recommender system makes its decision process more transparent, builds users’ trust in the recommender systems and prompts users to adopt the recommended tags. Through labeling videos with human-understandable and accurate tags, the exposure of videos to their target audiences would increase, which enhances information technology (IT) adoption, customer engagement, value co-creation and precision marketing on the video-sharing platform.

Originality/value

The proposed model is not only the first explainable video tag recommender system but also the first explainable multimedia tag recommender system to the best of our knowledge.

Details

Internet Research, vol. 32 no. 2
Type: Research Article
ISSN: 1066-2243

Keywords

1 – 10 of over 24000