Search results
1 – 10 of over 2000Hong Zhou, Binwei Gao, Shilong Tang, Bing Li and Shuyu Wang
The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly…
Abstract
Purpose
The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly promote the overall performance of the project life cycle. The miss of clauses may result in a failure to match with standard contracts. If the contract, modified by the owner, omits key clauses, potential disputes may lead to contractors paying substantial compensation. Therefore, the identification of construction project contract missing clauses has heavily relied on the manual review technique, which is inefficient and highly restricted by personnel experience. The existing intelligent means only work for the contract query and storage. It is urgent to raise the level of intelligence for contract clause management. Therefore, this paper aims to propose an intelligent method to detect construction project contract missing clauses based on Natural Language Processing (NLP) and deep learning technology.
Design/methodology/approach
A complete classification scheme of contract clauses is designed based on NLP. First, construction contract texts are pre-processed and converted from unstructured natural language into structured digital vector form. Following the initial categorization, a multi-label classification of long text construction contract clauses is designed to preliminary identify whether the clause labels are missing. After the multi-label clause missing detection, the authors implement a clause similarity algorithm by creatively integrating the image detection thought, MatchPyramid model, with BERT to identify missing substantial content in the contract clauses.
Findings
1,322 construction project contracts were tested. Results showed that the accuracy of multi-label classification could reach 93%, the accuracy of similarity matching can reach 83%, and the recall rate and F1 mean of both can reach more than 0.7. The experimental results verify the feasibility of intelligently detecting contract risk through the NLP-based method to some extent.
Originality/value
NLP is adept at recognizing textual content and has shown promising results in some contract processing applications. However, the mostly used approaches of its utilization for risk detection in construction contract clauses predominantly are rule-based, which encounter challenges when handling intricate and lengthy engineering contracts. This paper introduces an NLP technique based on deep learning which reduces manual intervention and can autonomously identify and tag types of contractual deficiencies, aligning with the evolving complexities anticipated in future construction contracts. Moreover, this method achieves the recognition of extended contract clause texts. Ultimately, this approach boasts versatility; users simply need to adjust parameters such as segmentation based on language categories to detect omissions in contract clauses of diverse languages.
Details
Keywords
Priyadarshini R., Latha Tamilselvan and Rajendran N.
The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the…
Abstract
Purpose
The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the recommendation of the source documents is facilitated by means of a framework in which the fourfold semantic similarity is implied. The latest trends in technology emerge with the continuous growth of resources on the collaborative web. This interactive and collaborative web pretense big challenges in recent technologies like cloud and big data.
Design/methodology/approach
The enormous growth of resources should be accessed in a more efficient manner, and this requires clustering and classification techniques. The resources on the web are described in a more meaningful manner.
Findings
It can be descripted in the form of metadata that is constituted by resource description framework (RDF). Fourfold similarity is proposed compared to three-fold similarity proposed in the existing literature. The fourfold similarity includes the semantic annotation based on the named entity recognition in the user interface, domain-based concept matching and improvised score-based classification of domain-based concept matching based on ontology, sequence-based word sensing algorithm and RDF-based updating of triples. The aggregation of all these similarity measures including the components such as semantic user interface, semantic clustering, and sequence-based classification and semantic recommendation system with RDF updating in change detection.
Research limitations/implications
The existing work suggests that linking resources semantically increases the retrieving and searching ability. Previous literature shows that keywords can be used to retrieve linked information from the article to determine the similarity between the documents using semantic analysis.
Practical implications
These traditional systems also lack in scalability and efficiency issues. The proposed study is to design a model that pulls and prioritizes knowledge-based content from the Hadoop distributed framework. This study also proposes the Hadoop-based pruning system and recommendation system.
Social implications
The pruning system gives an alert about the dynamic changes in the article (virtual document). The changes in the document are automatically updated in the RDF document. This helps in semantic matching and retrieval of the most relevant source with the virtual document.
Originality/value
The recommendation and detection of changes in the blogs are performed semantically using n-triples and automated data structures. User-focussed and choice-based crawling that is proposed in this system also assists the collaborative filtering. Consecutively collaborative filtering recommends the user focussed source documents. The entire clustering and retrieval system is deployed in multi-node Hadoop in the Amazon AWS environment and graphs are plotted and analyzed.
Details
Keywords
Gitaek Lee, Seonghyeon Moon and Seokho Chi
Contractors must check the provisions that may cause disputes in the specifications to manage project risks when bidding for a construction project. However, since the…
Abstract
Purpose
Contractors must check the provisions that may cause disputes in the specifications to manage project risks when bidding for a construction project. However, since the specification is mainly written regarding many national standards, determining which standard each section of the specification is derived from and whether the content is appropriate for the local site is a labor-intensive task. To develop an automatic reference section identification model that helps complete the specification review process in short bidding steps, the authors proposed a framework that integrates rules and machine learning algorithms.
Design/methodology/approach
The study begins by collecting 7,795 sections from construction specifications and the national standards from different countries. Then, the collected sections were retrieved for similar section pairs with syntactic rules generated by the construction domain knowledge. Finally, to improve the reliability and expandability of the section paring, the authors built a deep structured semantic model that increases the cosine similarity between documents dealing with the same topic by learning human-labeled similarity information.
Findings
The integrated model developed in this study showed 0.812, 0.898, and 0.923 levels of performance in NDCG@1, NDCG@5, and NDCG@10, respectively, confirming that the model can adequately select document candidates that require comparative analysis of clauses for practitioners.
Originality/value
The results contribute to more efficient and objective identification of potential disputes within the specifications by automatically providing practitioners with the reference section most relevant to the analysis target section.
Details
Keywords
Hui Shi, Drew Hwang, Dazhi Chong and Gongjun Yan
Today’s in-demand skills may not be needed tomorrow. As companies are adopting a new group of technologies, they are in huge need of information technology (IT) professionals who…
Abstract
Purpose
Today’s in-demand skills may not be needed tomorrow. As companies are adopting a new group of technologies, they are in huge need of information technology (IT) professionals who can fill various IT positions with a mixture of technical and problem-solving skills. This study aims to adopt a sematic analysis approach to explore how the US Information Systems (IS) programs meet the challenges of emerging IT topics.
Design/methodology/approach
This study considers the application of a hybrid semantic analysis approach to the analysis of IS higher education programs in the USA. It proposes a semantic analysis framework and a semantic analysis algorithm to analyze and evaluate the context of the IS programs. To be more specific, the study uses digital transformation as a case study to examine the readiness of the IS programs in the USA to meet the challenges of digital transformation. First, this study developed a knowledge pool of 15 principles and 98 keywords from an extensive literature review on digital transformation. Second, this study collects 4,093 IS courses from 315 IS programs in the USA and 493,216 scientific publication records from the Web of Science Core Collection.
Findings
Using the knowledge pool and two collected data sets, the semantic analysis algorithm was implemented to compute a semantic similarity score (DxScore) between an IS course’s context and digital transformation. To present the credibility of the research results of this paper, the state ranking using the similarity scores and the state employment ranking were compared. The research results can be used by IS educators in the future in the process of updating the IS curricula. Regarding IT professionals in the industry, the results can provide insights into the training of their current/future employees.
Originality/value
This study explores the status of the IS programs in the USA by proposing a semantic analysis framework, using digital transformation as a case study to illustrate the application of the proposed semantic analysis framework, and developing a knowledge pool, a corpus and a course information collection.
Details
Keywords
Bachriah Fatwa Dhini, Abba Suganda Girsang, Unggul Utan Sufandi and Heny Kurniawati
The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes…
Abstract
Purpose
The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes essay scoring, which is conducted through two parameters, semantic and keyword similarities, using a SentenceTransformers pre-trained model that can construct the highest vector embedding. Combining these models is used to optimize the model with increasing accuracy.
Design/methodology/approach
The development of the model in the study is divided into seven stages: (1) data collection, (2) pre-processing data, (3) selected pre-trained SentenceTransformers model, (4) semantic similarity (sentence pair), (5) keyword similarity, (6) calculate final score and (7) evaluating model.
Findings
The multilingual paraphrase-multilingual-MiniLM-L12-v2 and distilbert-base-multilingual-cased-v1 models got the highest scores from comparisons of 11 pre-trained multilingual models of SentenceTransformers with Indonesian data (Dhini and Girsang, 2023). Both multilingual models were adopted in this study. A combination of two parameters is obtained by comparing the response of the keyword extraction responses with the rubric keywords. Based on the experimental results, proposing a combination can increase the evaluation results by 0.2.
Originality/value
This study uses discussion forum data from the general biology course in online learning at the open university for the 2020.2 and 2021.2 semesters. Forum discussion ratings are still manual. In this survey, the authors created a model that automatically calculates the value of discussion forums, which are essays based on the lecturer's answers moreover rubrics.
Details
Keywords
Xiaolan Cui, Shuqin Cai and Yuchu Qin
The purpose of this paper is to propose a similarity-based approach to accurately retrieve reference solutions for the intelligent handling of online complaints.
Abstract
Purpose
The purpose of this paper is to propose a similarity-based approach to accurately retrieve reference solutions for the intelligent handling of online complaints.
Design/methodology/approach
This approach uses a case-based reasoning framework and firstly formalizes existing online complaints and their solutions, new online complaints, and complaint products, problems and content as source cases, target cases and distinctive features of each case, respectively. Then the process of using existing word-level, sense-level and text-level measures to assess the similarities between complaint products, problems and contents is explained. Based on these similarities, a measure with high accuracy in assessing the overall similarity between cases is designed. The effectiveness of the approach is evaluated by numerical and empirical experiments.
Findings
The evaluation results show that a measure simultaneously considering the features of similarity at word, sense and text levels can obtain higher accuracy than those measures that consider only one level feature of similarity; and that the designed measure is more accurate than all of its linear combinations.
Practical implications
The approach offers a feasible way to reduce manual intervention in online complaint handling. Complaint products, problems and content should be synthetically considered when handling an online complaint. The designed procedure of the measure with high accuracy can be applied in other applications that consider multiple similarity features or linguistic levels.
Originality/value
A method for linearly combining the similarities at all linguistic levels to accurately assess the overall similarities between online complaint cases is presented. This method is experimentally verified to be helpful to improve the accuracy of online complaint case retrieval. This is the first study that considers the accuracy of the similarity measures for online complaint case retrieval.
Details
Keywords
Maryam Yaghtin, Hajar Sotudeh, Alireza Nikseresht and Mahdieh Mirzabeigi
Co-citation frequency, defined as the number of documents co-citing two articles, is considered as a quantitative, and thus, an efficient proxy of subject relatedness or prestige…
Abstract
Purpose
Co-citation frequency, defined as the number of documents co-citing two articles, is considered as a quantitative, and thus, an efficient proxy of subject relatedness or prestige of the co-cited articles. Despite its quantitative nature, it is found effective in retrieving and evaluating documents, signifying its linkage with the related documents' contents. To better understand the dynamism of the citation network, the present study aims to investigate various content features giving rise to the measure.
Design/methodology/approach
The present study examined the interaction of different co-citation features in explaining the co-citation frequency. The features include the co-cited works' similarities in their full-texts, Medical Subject Headings (MeSH) terms, co-citation proximity, opinions and co-citances. A test collection is built using the CITREC dataset. The data were analyzed using natural language processing (NLP) and opinion mining techniques. A linear model was developed to regress the objective and subjective content-based co-citation measures against the natural log of the co-citation frequency.
Findings
The dimensions of co-citation similarity, either subjective or objective, play significant roles in predicting co-citation frequency. The model can predict about half of the co-citation variance. The interaction of co-opinionatedness and non-co-opinionatedness is the strongest factor in the model.
Originality/value
It is the first study in revealing that both the objective and subjective similarities could significantly predict the co-citation frequency. The findings re-confirm the citation analysis assumption claiming the connection between the cognitive layers of cited documents and citation measures in general and the co-citation frequency in particular.
Peer review
The peer review history for this article is available at https://publons.com/publon/10.1108/OIR-04-2020-0126.
Details
Keywords
Haihua Chen, Yunhan Yang, Wei Lu and Jiangping Chen
Citation contexts have been found useful in many scenarios. However, existing context-based recommendations ignored the importance of diversity in reducing the redundant issues…
Abstract
Purpose
Citation contexts have been found useful in many scenarios. However, existing context-based recommendations ignored the importance of diversity in reducing the redundant issues and thus cannot cover the broad range of user interests. To address this gap, the paper aims to propose a novelty task that can recommend a set of diverse citation contexts extracted from a list of citing articles. This will assist users in understanding how other scholars have cited an article and deciding which articles they should cite in their own writing.
Design/methodology/approach
This research combines three semantic distance algorithms and three diversification re-ranking algorithms for the diversifying recommendation based on the CiteSeerX data set and then evaluates the generated citation context lists by applying a user case study on 30 articles.
Findings
Results show that a diversification strategy that combined “word2vec” and “Integer Linear Programming” leads to better reading experience for participants than other diversification strategies, such as CiteSeerX using a list sorted by citation counts.
Practical implications
This diversifying recommendation task is valuable for developing better systems in information retrieval, automatic academic recommendations and summarization.
Originality/value
The originality of the research lies in the proposal of a novelty task that can recommend a diversification context list describing how other scholars cited an article, thereby making citing decisions easier. A novel mixed approach is explored to generate the most efficient diversifying strategy. Besides, rather than traditional information retrieval evaluation, a user evaluation framework is introduced to reflect user information needs more objectively.
Details
Keywords
Marco Kalz, Jan van Bruggen, Bas Giesbers, Wim Waterink, Jannes Eshuis and Rob Koper
The purpose of this paper is twofold: first the paper aims to sketch the theoretical basis for the use of electronic portfolios for prior learning assessment; second it endeavours…
Abstract
Purpose
The purpose of this paper is twofold: first the paper aims to sketch the theoretical basis for the use of electronic portfolios for prior learning assessment; second it endeavours to introduce latent semantic analysis (LSA) as a powerful method for the computation of semantic similarity between texts and a basis for a new observation link for prior learning assessment.
Design/methodology/approach
A short literature review about e‐assessment was conducted with the result that none of the reviews included new and innovative methods for the assessment of open responses and narrative of learners. On a theoretical basis the connection between e‐portfolio research and research about prior learning assessment is explained based on existing literature. After that, LSA is introduced and several examples from similar educational applications are provided. A model for prior learning assessment on the basis of LSA is presented. A case study at the Open University of The Netherlands is presented and preliminary results are discussed.
Findings
A first inspection of the results shows that the similarity measurement that is produced by the system can differentiate between learners who sent in different material and between the learning activities and chapters.
Originality/value
The paper is original because it combines research from natural language processing with very practical educational problems in higher education and technology‐enhanced learning. For faculty members the presented model and technology can help them in the assessment phase in an APL procedure. In addition, the presented model offers a dynamic method for reasoning about prior knowledge in adaptive e‐learning systems.
Details
Keywords
Debasis Majhi and Bhaskar Mukherjee
The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where…
Abstract
Purpose
The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where natural language processing (NLP) is being applied significantly.
Design/methodology/approach
By excavating international databases, 3,087 core papers that received at least 5% of the total citations have been identified. By calculating the average mean years of these core papers, and total citations received, a CPT (citation/publication/time) value was calculated in all 20 fronts to understand how a front is relatively receiving greater attention among peers within a course of time. One theme article has been finally identified from each of these 20 fronts.
Findings
Bidirectional encoder representations from transformers with CPT value 1.608 followed by sentiment analysis with CPT 1.292 received highest attention in NLP research. Columbia University New York, in terms of University, Journal of the American Medical Informatics Association, in terms of journals, USA followed by People Republic of China, in terms of country and Xu, H., University of Texas, in terms of author are the top in these fronts. It is identified that the NLP applications boost the performance of digital libraries and automated library systems in the digital environment.
Practical implications
Any research fronts that are identified in the findings of this paper may be used as a base for researchers who intended to perform extensive research on NLP.
Originality/value
To the best of the authors’ knowledge, the methodology adopted in this paper is the first of its kind where meta-analysis approach has been used for understanding the research fronts in sub field like NLP for a broad domain like LIS.
Details