Search results

1 – 10 of over 2000
Article
Publication date: 31 October 2023

Hong Zhou, Binwei Gao, Shilong Tang, Bing Li and Shuyu Wang

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly…

Abstract

Purpose

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly promote the overall performance of the project life cycle. The miss of clauses may result in a failure to match with standard contracts. If the contract, modified by the owner, omits key clauses, potential disputes may lead to contractors paying substantial compensation. Therefore, the identification of construction project contract missing clauses has heavily relied on the manual review technique, which is inefficient and highly restricted by personnel experience. The existing intelligent means only work for the contract query and storage. It is urgent to raise the level of intelligence for contract clause management. Therefore, this paper aims to propose an intelligent method to detect construction project contract missing clauses based on Natural Language Processing (NLP) and deep learning technology.

Design/methodology/approach

A complete classification scheme of contract clauses is designed based on NLP. First, construction contract texts are pre-processed and converted from unstructured natural language into structured digital vector form. Following the initial categorization, a multi-label classification of long text construction contract clauses is designed to preliminary identify whether the clause labels are missing. After the multi-label clause missing detection, the authors implement a clause similarity algorithm by creatively integrating the image detection thought, MatchPyramid model, with BERT to identify missing substantial content in the contract clauses.

Findings

1,322 construction project contracts were tested. Results showed that the accuracy of multi-label classification could reach 93%, the accuracy of similarity matching can reach 83%, and the recall rate and F1 mean of both can reach more than 0.7. The experimental results verify the feasibility of intelligently detecting contract risk through the NLP-based method to some extent.

Originality/value

NLP is adept at recognizing textual content and has shown promising results in some contract processing applications. However, the mostly used approaches of its utilization for risk detection in construction contract clauses predominantly are rule-based, which encounter challenges when handling intricate and lengthy engineering contracts. This paper introduces an NLP technique based on deep learning which reduces manual intervention and can autonomously identify and tag types of contractual deficiencies, aligning with the evolving complexities anticipated in future construction contracts. Moreover, this method achieves the recognition of extended contract clause texts. Ultimately, this approach boasts versatility; users simply need to adjust parameters such as segmentation based on language categories to detect omissions in contract clauses of diverse languages.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 21 October 2019

Priyadarshini R., Latha Tamilselvan and Rajendran N.

The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the…

Abstract

Purpose

The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the recommendation of the source documents is facilitated by means of a framework in which the fourfold semantic similarity is implied. The latest trends in technology emerge with the continuous growth of resources on the collaborative web. This interactive and collaborative web pretense big challenges in recent technologies like cloud and big data.

Design/methodology/approach

The enormous growth of resources should be accessed in a more efficient manner, and this requires clustering and classification techniques. The resources on the web are described in a more meaningful manner.

Findings

It can be descripted in the form of metadata that is constituted by resource description framework (RDF). Fourfold similarity is proposed compared to three-fold similarity proposed in the existing literature. The fourfold similarity includes the semantic annotation based on the named entity recognition in the user interface, domain-based concept matching and improvised score-based classification of domain-based concept matching based on ontology, sequence-based word sensing algorithm and RDF-based updating of triples. The aggregation of all these similarity measures including the components such as semantic user interface, semantic clustering, and sequence-based classification and semantic recommendation system with RDF updating in change detection.

Research limitations/implications

The existing work suggests that linking resources semantically increases the retrieving and searching ability. Previous literature shows that keywords can be used to retrieve linked information from the article to determine the similarity between the documents using semantic analysis.

Practical implications

These traditional systems also lack in scalability and efficiency issues. The proposed study is to design a model that pulls and prioritizes knowledge-based content from the Hadoop distributed framework. This study also proposes the Hadoop-based pruning system and recommendation system.

Social implications

The pruning system gives an alert about the dynamic changes in the article (virtual document). The changes in the document are automatically updated in the RDF document. This helps in semantic matching and retrieval of the most relevant source with the virtual document.

Originality/value

The recommendation and detection of changes in the blogs are performed semantically using n-triples and automated data structures. User-focussed and choice-based crawling that is proposed in this system also assists the collaborative filtering. Consecutively collaborative filtering recommends the user focussed source documents. The entire clustering and retrieval system is deployed in multi-node Hadoop in the Amazon AWS environment and graphs are plotted and analyzed.

Details

International Journal of Intelligent Unmanned Systems, vol. 7 no. 4
Type: Research Article
ISSN: 2049-6427

Keywords

Article
Publication date: 14 June 2022

Gitaek Lee, Seonghyeon Moon and Seokho Chi

Contractors must check the provisions that may cause disputes in the specifications to manage project risks when bidding for a construction project. However, since the…

Abstract

Purpose

Contractors must check the provisions that may cause disputes in the specifications to manage project risks when bidding for a construction project. However, since the specification is mainly written regarding many national standards, determining which standard each section of the specification is derived from and whether the content is appropriate for the local site is a labor-intensive task. To develop an automatic reference section identification model that helps complete the specification review process in short bidding steps, the authors proposed a framework that integrates rules and machine learning algorithms.

Design/methodology/approach

The study begins by collecting 7,795 sections from construction specifications and the national standards from different countries. Then, the collected sections were retrieved for similar section pairs with syntactic rules generated by the construction domain knowledge. Finally, to improve the reliability and expandability of the section paring, the authors built a deep structured semantic model that increases the cosine similarity between documents dealing with the same topic by learning human-labeled similarity information.

Findings

The integrated model developed in this study showed 0.812, 0.898, and 0.923 levels of performance in NDCG@1, NDCG@5, and NDCG@10, respectively, confirming that the model can adequately select document candidates that require comparative analysis of clauses for practitioners.

Originality/value

The results contribute to more efficient and objective identification of potential disputes within the specifications by automatically providing practitioners with the reference section most relevant to the analysis target section.

Details

Engineering, Construction and Architectural Management, vol. 30 no. 9
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 29 November 2023

Hui Shi, Drew Hwang, Dazhi Chong and Gongjun Yan

Today’s in-demand skills may not be needed tomorrow. As companies are adopting a new group of technologies, they are in huge need of information technology (IT) professionals who…

22

Abstract

Purpose

Today’s in-demand skills may not be needed tomorrow. As companies are adopting a new group of technologies, they are in huge need of information technology (IT) professionals who can fill various IT positions with a mixture of technical and problem-solving skills. This study aims to adopt a sematic analysis approach to explore how the US Information Systems (IS) programs meet the challenges of emerging IT topics.

Design/methodology/approach

This study considers the application of a hybrid semantic analysis approach to the analysis of IS higher education programs in the USA. It proposes a semantic analysis framework and a semantic analysis algorithm to analyze and evaluate the context of the IS programs. To be more specific, the study uses digital transformation as a case study to examine the readiness of the IS programs in the USA to meet the challenges of digital transformation. First, this study developed a knowledge pool of 15 principles and 98 keywords from an extensive literature review on digital transformation. Second, this study collects 4,093 IS courses from 315 IS programs in the USA and 493,216 scientific publication records from the Web of Science Core Collection.

Findings

Using the knowledge pool and two collected data sets, the semantic analysis algorithm was implemented to compute a semantic similarity score (DxScore) between an IS course’s context and digital transformation. To present the credibility of the research results of this paper, the state ranking using the similarity scores and the state employment ranking were compared. The research results can be used by IS educators in the future in the process of updating the IS curricula. Regarding IT professionals in the industry, the results can provide insights into the training of their current/future employees.

Originality/value

This study explores the status of the IS programs in the USA by proposing a semantic analysis framework, using digital transformation as a case study to illustrate the application of the proposed semantic analysis framework, and developing a knowledge pool, a corpus and a course information collection.

Details

Information Discovery and Delivery, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2398-6247

Keywords

Open Access
Article
Publication date: 11 October 2023

Bachriah Fatwa Dhini, Abba Suganda Girsang, Unggul Utan Sufandi and Heny Kurniawati

The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes…

Abstract

Purpose

The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes essay scoring, which is conducted through two parameters, semantic and keyword similarities, using a SentenceTransformers pre-trained model that can construct the highest vector embedding. Combining these models is used to optimize the model with increasing accuracy.

Design/methodology/approach

The development of the model in the study is divided into seven stages: (1) data collection, (2) pre-processing data, (3) selected pre-trained SentenceTransformers model, (4) semantic similarity (sentence pair), (5) keyword similarity, (6) calculate final score and (7) evaluating model.

Findings

The multilingual paraphrase-multilingual-MiniLM-L12-v2 and distilbert-base-multilingual-cased-v1 models got the highest scores from comparisons of 11 pre-trained multilingual models of SentenceTransformers with Indonesian data (Dhini and Girsang, 2023). Both multilingual models were adopted in this study. A combination of two parameters is obtained by comparing the response of the keyword extraction responses with the rubric keywords. Based on the experimental results, proposing a combination can increase the evaluation results by 0.2.

Originality/value

This study uses discussion forum data from the general biology course in online learning at the open university for the 2020.2 and 2021.2 semesters. Forum discussion ratings are still manual. In this survey, the authors created a model that automatically calculates the value of discussion forums, which are essays based on the lecturer's answers moreover rubrics.

Details

Asian Association of Open Universities Journal, vol. 18 no. 3
Type: Research Article
ISSN: 1858-3431

Keywords

Article
Publication date: 7 August 2017

Xiaolan Cui, Shuqin Cai and Yuchu Qin

The purpose of this paper is to propose a similarity-based approach to accurately retrieve reference solutions for the intelligent handling of online complaints.

Abstract

Purpose

The purpose of this paper is to propose a similarity-based approach to accurately retrieve reference solutions for the intelligent handling of online complaints.

Design/methodology/approach

This approach uses a case-based reasoning framework and firstly formalizes existing online complaints and their solutions, new online complaints, and complaint products, problems and content as source cases, target cases and distinctive features of each case, respectively. Then the process of using existing word-level, sense-level and text-level measures to assess the similarities between complaint products, problems and contents is explained. Based on these similarities, a measure with high accuracy in assessing the overall similarity between cases is designed. The effectiveness of the approach is evaluated by numerical and empirical experiments.

Findings

The evaluation results show that a measure simultaneously considering the features of similarity at word, sense and text levels can obtain higher accuracy than those measures that consider only one level feature of similarity; and that the designed measure is more accurate than all of its linear combinations.

Practical implications

The approach offers a feasible way to reduce manual intervention in online complaint handling. Complaint products, problems and content should be synthetically considered when handling an online complaint. The designed procedure of the measure with high accuracy can be applied in other applications that consider multiple similarity features or linguistic levels.

Originality/value

A method for linearly combining the similarities at all linguistic levels to accurately assess the overall similarities between online complaint cases is presented. This method is experimentally verified to be helpful to improve the accuracy of online complaint case retrieval. This is the first study that considers the accuracy of the similarity measures for online complaint case retrieval.

Details

Kybernetes, vol. 46 no. 7
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 12 May 2021

Maryam Yaghtin, Hajar Sotudeh, Alireza Nikseresht and Mahdieh Mirzabeigi

Co-citation frequency, defined as the number of documents co-citing two articles, is considered as a quantitative, and thus, an efficient proxy of subject relatedness or prestige…

Abstract

Purpose

Co-citation frequency, defined as the number of documents co-citing two articles, is considered as a quantitative, and thus, an efficient proxy of subject relatedness or prestige of the co-cited articles. Despite its quantitative nature, it is found effective in retrieving and evaluating documents, signifying its linkage with the related documents' contents. To better understand the dynamism of the citation network, the present study aims to investigate various content features giving rise to the measure.

Design/methodology/approach

The present study examined the interaction of different co-citation features in explaining the co-citation frequency. The features include the co-cited works' similarities in their full-texts, Medical Subject Headings (MeSH) terms, co-citation proximity, opinions and co-citances. A test collection is built using the CITREC dataset. The data were analyzed using natural language processing (NLP) and opinion mining techniques. A linear model was developed to regress the objective and subjective content-based co-citation measures against the natural log of the co-citation frequency.

Findings

The dimensions of co-citation similarity, either subjective or objective, play significant roles in predicting co-citation frequency. The model can predict about half of the co-citation variance. The interaction of co-opinionatedness and non-co-opinionatedness is the strongest factor in the model.

Originality/value

It is the first study in revealing that both the objective and subjective similarities could significantly predict the co-citation frequency. The findings re-confirm the citation analysis assumption claiming the connection between the cognitive layers of cited documents and citation measures in general and the co-citation frequency in particular.

Peer review

The peer review history for this article is available at https://publons.com/publon/10.1108/OIR-04-2020-0126.

Article
Publication date: 14 October 2020

Haihua Chen, Yunhan Yang, Wei Lu and Jiangping Chen

Citation contexts have been found useful in many scenarios. However, existing context-based recommendations ignored the importance of diversity in reducing the redundant issues…

Abstract

Purpose

Citation contexts have been found useful in many scenarios. However, existing context-based recommendations ignored the importance of diversity in reducing the redundant issues and thus cannot cover the broad range of user interests. To address this gap, the paper aims to propose a novelty task that can recommend a set of diverse citation contexts extracted from a list of citing articles. This will assist users in understanding how other scholars have cited an article and deciding which articles they should cite in their own writing.

Design/methodology/approach

This research combines three semantic distance algorithms and three diversification re-ranking algorithms for the diversifying recommendation based on the CiteSeerX data set and then evaluates the generated citation context lists by applying a user case study on 30 articles.

Findings

Results show that a diversification strategy that combined “word2vec” and “Integer Linear Programming” leads to better reading experience for participants than other diversification strategies, such as CiteSeerX using a list sorted by citation counts.

Practical implications

This diversifying recommendation task is valuable for developing better systems in information retrieval, automatic academic recommendations and summarization.

Originality/value

The originality of the research lies in the proposal of a novelty task that can recommend a diversification context list describing how other scholars cited an article, thereby making citing decisions easier. A novel mixed approach is explored to generate the most efficient diversifying strategy. Besides, rather than traditional information retrieval evaluation, a user evaluation framework is introduced to reflect user information needs more objectively.

Article
Publication date: 29 August 2008

Marco Kalz, Jan van Bruggen, Bas Giesbers, Wim Waterink, Jannes Eshuis and Rob Koper

The purpose of this paper is twofold: first the paper aims to sketch the theoretical basis for the use of electronic portfolios for prior learning assessment; second it endeavours…

Abstract

Purpose

The purpose of this paper is twofold: first the paper aims to sketch the theoretical basis for the use of electronic portfolios for prior learning assessment; second it endeavours to introduce latent semantic analysis (LSA) as a powerful method for the computation of semantic similarity between texts and a basis for a new observation link for prior learning assessment.

Design/methodology/approach

A short literature review about e‐assessment was conducted with the result that none of the reviews included new and innovative methods for the assessment of open responses and narrative of learners. On a theoretical basis the connection between e‐portfolio research and research about prior learning assessment is explained based on existing literature. After that, LSA is introduced and several examples from similar educational applications are provided. A model for prior learning assessment on the basis of LSA is presented. A case study at the Open University of The Netherlands is presented and preliminary results are discussed.

Findings

A first inspection of the results shows that the similarity measurement that is produced by the system can differentiate between learners who sent in different material and between the learning activities and chapters.

Originality/value

The paper is original because it combines research from natural language processing with very practical educational problems in higher education and technology‐enhanced learning. For faculty members the presented model and technology can help them in the assessment phase in an APL procedure. In addition, the presented model offers a dynamic method for reasoning about prior knowledge in adaptive e‐learning systems.

Details

Campus-Wide Information Systems, vol. 25 no. 4
Type: Research Article
ISSN: 1065-0741

Keywords

Article
Publication date: 21 June 2023

Debasis Majhi and Bhaskar Mukherjee

The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where…

Abstract

Purpose

The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where natural language processing (NLP) is being applied significantly.

Design/methodology/approach

By excavating international databases, 3,087 core papers that received at least 5% of the total citations have been identified. By calculating the average mean years of these core papers, and total citations received, a CPT (citation/publication/time) value was calculated in all 20 fronts to understand how a front is relatively receiving greater attention among peers within a course of time. One theme article has been finally identified from each of these 20 fronts.

Findings

Bidirectional encoder representations from transformers with CPT value 1.608 followed by sentiment analysis with CPT 1.292 received highest attention in NLP research. Columbia University New York, in terms of University, Journal of the American Medical Informatics Association, in terms of journals, USA followed by People Republic of China, in terms of country and Xu, H., University of Texas, in terms of author are the top in these fronts. It is identified that the NLP applications boost the performance of digital libraries and automated library systems in the digital environment.

Practical implications

Any research fronts that are identified in the findings of this paper may be used as a base for researchers who intended to perform extensive research on NLP.

Originality/value

To the best of the authors’ knowledge, the methodology adopted in this paper is the first of its kind where meta-analysis approach has been used for understanding the research fronts in sub field like NLP for a broad domain like LIS.

Details

Digital Library Perspectives, vol. 39 no. 3
Type: Research Article
ISSN: 2059-5816

Keywords

1 – 10 of over 2000