Search results

1 – 10 of 514
Article
Publication date: 16 July 2021

Young Man Ko, Min Sun Song and Seung Jun Lee

This study aims to develop metadata of conceptual elements based on the text structure of research articles on Korean studies, to propose a search algorithm that reflects the…

Abstract

Purpose

This study aims to develop metadata of conceptual elements based on the text structure of research articles on Korean studies, to propose a search algorithm that reflects the combination of semantically relevant data in accordance with the search intention of research paper and to examine the algorithm whether there is a difference in the intention-based search results.

Design/methodology/approach

This study constructed a metadata database of 5,007 research articles on Korean studies arranged by conceptual elements of text structure and developed F1(w)-score weighted to conceptual elements based on the F1-score and the number of data points from each element. This study evaluated the algorithm by comparing search results of the F1(w)-score algorithm with those of the Term Frequency- Inverse Document Frequency (TF-IDF) algorithm and simple keyword search.

Findings

The authors find that the higher the F1(w)-score, the closer the semantic relevance of search intention. Furthermore, F1(w)-score generated search results were more closely related to the search intention than those of TF-IDF and simple keyword search.

Research limitations/implications

Even though the F1(w)-score was developed in this study to evaluate the search results of metadata database structured by conceptual elements of text structure of Korean studies, the algorithm can be used as a tool for searching the database which is a tuning process of weighting required.

Practical implications

A metadata database based on text structure and a search method based on weights of metadata elements – F1(w)-score – can be useful for interdisciplinary studies, especially for semantic search in regional studies.

Originality/value

This paper presents a methodology for supporting IR using F1(w)-score—a novel model for weighting metadata elements based on text structure. The F1(w)-score-based search results show the combination of semantically relevant data, which are otherwise difficult to search for using similarity of search words.

Details

The Electronic Library , vol. 39 no. 5
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 28 February 2023

Meltem Aksoy, Seda Yanık and Mehmet Fatih Amasyali

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals…

Abstract

Purpose

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals are primarily based on manual matching of similar topics, discipline areas and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and requires excessive time. This paper aims to demonstrate how to effectively use the rich information in the titles and abstracts of Turkish project proposals to group them automatically.

Design/methodology/approach

This study proposes a model that effectively groups Turkish project proposals by combining word embedding, clustering and classification techniques. The proposed model uses FastText, BERT and term frequency/inverse document frequency (TF/IDF) word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish. The extracted terms were grouped using both the clustering and classification techniques. Natural groups contained within the corpus were discovered using k-means, k-means++, k-medoids and agglomerative clustering algorithms. Additionally, this study employs classification approaches to predict the target class for each document in the corpus. To classify project proposals, various classifiers, including k-nearest neighbors (KNN), support vector machines (SVM), artificial neural networks (ANN), classification and regression trees (CART) and random forest (RF), are used. Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.

Findings

The results show that the generated word embeddings can effectively represent proposal texts as vectors, and can be used as inputs for clustering or classification algorithms. Using clustering algorithms, the document corpus is divided into five groups. In addition, the results demonstrate that the proposals can easily be categorized into predefined categories using classification algorithms. SVM-Linear achieved the highest prediction accuracy (89.2%) with the FastText word embedding method. A comparison of manual grouping with automatic classification and clustering results revealed that both classification and clustering techniques have a high success rate.

Research limitations/implications

The proposed model automatically benefits from the rich information in project proposals and significantly reduces numerous time-consuming tasks that managers must perform manually. Thus, it eliminates the drawbacks of the current manual methods and yields significantly more accurate results. In the future, additional experiments should be conducted to validate the proposed method using data from other funding organizations.

Originality/value

This study presents the application of word embedding methods to effectively use the rich information in the titles and abstracts of Turkish project proposals. Existing research studies focus on the automatic grouping of proposals; traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals. Unlike previous research, this study employs two outperforming neural network-based textual feature extraction techniques to obtain terms representing the proposals: BERT as a contextual word embedding method and FastText as a static word embedding method. Moreover, to the best of our knowledge, there has been no research conducted on the grouping of project proposals in Turkish.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 8 April 2021

Mariem Bounabi, Karim Elmoutaouakil and Khalid Satori

This paper aims to present a new term weighting approach for text classification as a text mining task. The original method, neutrosophic term frequency – inverse term frequency…

Abstract

Purpose

This paper aims to present a new term weighting approach for text classification as a text mining task. The original method, neutrosophic term frequency – inverse term frequency (NTF-IDF), is an extended version of the popular fuzzy TF-IDF (FTF-IDF) and uses the neutrosophic reasoning to analyze and generate weights for terms in natural languages. The paper also propose a comparative study between the popular FTF-IDF and NTF-IDF and their impacts on different machine learning (ML) classifiers for document categorization goals.

Design/methodology/approach

After preprocessing textual data, the original Neutrosophic TF-IDF applies the neutrosophic inference system (NIS) to produce weights for terms representing a document. Using the local frequency TF, global frequency IDF and text N's length as NIS inputs, this study generate two neutrosophic weights for a given term. The first measure provides information on the relevance degree for a word, and the second one represents their ambiguity degree. Next, the Zhang combination function is applied to combine neutrosophic weights outputs and present the final term weight, inserted in the document's representative vector. To analyze the NTF-IDF impact on the classification phase, this study uses a set of ML algorithms.

Findings

Practicing the neutrosophic logic (NL) characteristics, the authors have been able to study the ambiguity of the terms and their degree of relevance to represent a document. NL's choice has proven its effectiveness in defining significant text vectorization weights, especially for text classification tasks. The experimentation part demonstrates that the new method positively impacts the categorization. Moreover, the adopted system's recognition rate is higher than 91%, an accuracy score not attained using the FTF-IDF. Also, using benchmarked data sets, in different text mining fields, and many ML classifiers, i.e. SVM and Feed-Forward Network, and applying the proposed term scores NTF-IDF improves the accuracy by 10%.

Originality/value

The novelty of this paper lies in two aspects. First, a new term weighting method, which uses the term frequencies as components to define the relevance and the ambiguity of term; second, the application of NL to infer weights is considered as an original model in this paper, which also aims to correct the shortcomings of the FTF-IDF which uses fuzzy logic and its drawbacks. The introduced technique was combined with different ML models to improve the accuracy and relevance of the obtained feature vectors to fed the classification mechanism.

Details

International Journal of Web Information Systems, vol. 17 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 29 April 2021

Hossein Toosi, Mohammad Amin Ghaaderi and Zahra Shokrani

The purpose of this study is to compare the trend of academic project management research in Iran and the World in five-year periods with a text mining approach and TF–IDF method.

Abstract

Purpose

The purpose of this study is to compare the trend of academic project management research in Iran and the World in five-year periods with a text mining approach and TF–IDF method.

Design/methodology/approach

The research population consists of 1205 theses presented between 2000 and 2019 in Iranian universities. The central library website of the mentioned universities was used for data collection, and the text mining approach with the TF–IDF method was used for data analysis.

Findings

The remarkable results of this study include: Concrete structures are the most frequent among structural systems, Risk Management is the most frequent among PMBOK Knowledge Areas, Design-build (DB) system is the most frequent among Project Delivery Systems, Engineering, procurement and construction (EPC) is the most frequent among DB Project Delivery Systems, Financial Management is the most frequent among specialized construction knowledge areas, Soft Skills is the most frequent among Global Trends, Contracting Companies is the most frequent among Project Parties, Construction Projects is the most frequent among Project Areas, Power Plant and Refinery is the most frequent among Project Subjects, Optimization is the most frequent among Problem-Solving Approaches, Fuzzy Logic is the most frequent among Novel Algorithms and Motivation is the most frequent among Soft Skills.

Originality/value

The innovative aspect of this research is that for the first time, text mining has been used to analyze academic research on project and construction management, and also for the first time, academic research on construction industry in Iran has been compared with global research.

Details

Engineering, Construction and Architectural Management, vol. 29 no. 3
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 12 June 2017

San-Yih Hwang, Chih-Ping Wei, Chien-Hsiang Lee and Yu-Siang Chen

The information needs of the users of literature database systems often come from the task at hand, which is short term and can be represented as a small number of articles…

Abstract

Purpose

The information needs of the users of literature database systems often come from the task at hand, which is short term and can be represented as a small number of articles. Previous works on recommending articles to satisfy users’ short-term interests have utilized article content, usage logs, and more recently, coauthorship networks. The usefulness of coauthorship has been demonstrated by some research works, which, however, tend to adopt a simple coauthorship network that records only the strength of coauthorships. The purpose of this paper is to enhance the effectiveness of coauthorship-based recommendation by incorporating scholars’ collaboration topics into the coauthorship network.

Design/methodology/approach

The authors propose a latent Dirichlet allocation (LDA)-coauthorship-network-based method that integrates topic information into the links of the coauthorship networks using LDA, and a task-focused technique is developed for recommending literature articles.

Findings

The experimental results using information systems journal articles show that the proposed method is more effective than the previous coauthorship network-based method over all scenarios examined. The authors further develop a hybrid method that combines the results of content-based and LDA-coauthorship-network-based recommendations. The resulting hybrid method achieves greater or comparable recommendation effectiveness under all scenarios when compared to the content-based method.

Originality/value

This paper makes two contributions. The authors first show that topic model is indeed useful and can be incorporated into the construction of coaurthoship-network to improve literature recommendation. The authors subsequently demonstrate that coauthorship-network-based and content-based recommendations are complementary in their hit article rank distributions, and then devise a hybrid recommendation method to further improve the effectiveness of literature recommendation.

Details

Online Information Review, vol. 41 no. 3
Type: Research Article
ISSN: 1468-4527

Keywords

Abstract

Details

Big Data Analytics for the Prediction of Tourist Preferences Worldwide
Type: Book
ISBN: 978-1-83549-339-7

Article
Publication date: 27 April 2023

Neha Singh, Rohit Biswas and Mamoni Banerjee

The purpose of this article is to develop relationships between many major issues relevant to the agriculture supply chain.

Abstract

Purpose

The purpose of this article is to develop relationships between many major issues relevant to the agriculture supply chain.

Design/methodology/approach

With the purpose of gaining an all-encompassing understanding of the agriculture supply chain, this work uses 233 filtered research articles and three bibliometric analysis tools, namely VOSviewer, term frequency-inverse document frequency (TF-IDF) and Person correlation. The collected research publications were also catalogued using Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA).

Findings

Using analytic techniques, a total of 12 keywords were obtained. The study found that agri-products are in dire need of digitisation via Internet of things (IoT) and blockchain due to the usage of economic variables and comprehensive management of total food waste throughout transportation, anchoring quality and the predominant variable.

Research limitations/implications

The study was limited to the Scopus and Web of Science (WoS) indexing in order to assess the viability of the linked idea and problem.

Originality/value

This study aims to generate vital knowledge in the field of horticulture-focused agriculture supply chain based on previous justification and relationship formation.

Details

Journal of Agribusiness in Developing and Emerging Economies, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2044-0839

Keywords

Article
Publication date: 16 November 2015

Hsien-Tsung Chang, Shu-Wei Liu and Nilamadhab Mishra

The purpose of this paper is to design and implement new tracking and summarization algorithms for Chinese news content. Based on the proposed methods and algorithms, the authors…

Abstract

Purpose

The purpose of this paper is to design and implement new tracking and summarization algorithms for Chinese news content. Based on the proposed methods and algorithms, the authors extract the important sentences that are contained in topic stories and list those sentences according to timestamp order to ensure ease of understanding and to visualize multiple news stories on a single screen.

Design/methodology/approach

This paper encompasses an investigational approach that implements a new Dynamic Centroid Summarization algorithm in addition to a Term Frequency (TF)-Density algorithm to empirically compute three target parameters, i.e., recall, precision, and F-measure.

Findings

The proposed TF-Density algorithm is implemented and compared with the well-known algorithms Term Frequency-Inverse Word Frequency (TF-IWF) and Term Frequency-Inverse Document Frequency (TF-IDF). Three test data sets are configured from Chinese news web sites for use during the investigation, and two important findings are obtained that help the authors provide more precision and efficiency when recognizing the important words in the text. First, the authors evaluate three topic tracking algorithms, i.e., TF-Density, TF-IDF, and TF-IWF, with the said target parameters and find that the recall, precision, and F-measure of the proposed TF-Density algorithm is better than those of the TF-IWF and TF-IDF algorithms. In the context of the second finding, the authors implement a blind test approach to obtain the results of topic summarizations and find that the proposed Dynamic Centroid Summarization process can more accurately select topic sentences than the LexRank process.

Research limitations/implications

The results show that the tracking and summarization algorithms for news topics can provide more precise and convenient results for users tracking the news. The analysis and implications are limited to Chinese news content from Chinese news web sites such as Apple Library, UDN, and well-known portals like Yahoo and Google.

Originality/value

The research provides an empirical analysis of Chinese news content through the proposed TF-Density and Dynamic Centroid Summarization algorithms. It focusses on improving the means of summarizing a set of news stories to appear for browsing on a single screen and carries implications for innovative word measurements in practice.

Details

Aslib Journal of Information Management, vol. 67 no. 6
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 1 October 2004

Stephen Robertson

The term‐weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a…

5439

Abstract

The term‐weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon's Information Theory) seeking to establish some theoretical basis for it. Some of these attempts are reviewed, and it is shown that the Information Theory approaches are problematic, but that there are good theoretical justifications of both IDF and TF*IDF in the traditional probabilistic model of information retrieval.

Details

Journal of Documentation, vol. 60 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 22 June 2020

Lei Li, Yaxuan Dai and Yudong Sun

Employing big data analysis tools, this study examines the significance of supply chain integration affecting online financial consumption, analyzes the online financial…

Abstract

Purpose

Employing big data analysis tools, this study examines the significance of supply chain integration affecting online financial consumption, analyzes the online financial consumption demand of mobile phone consumers, promotes the optimization of supply chain services with consumers as the focus and proposes full integration of a mobile phone supply chain in terms of product, logistics and marketing, in order to improve the supply and demand relationship between consumers and suppliers; the overall objective is to promote further development of online financial consumption.

Design/methodology/approach

In this study, TF-IDF (term frequency–inverse document frequency) and cosine similarity text analysis are used for analyzing online demand for mobile phone products, studying the influence of supply chain services on consumption demand and identifying strategies for promoting overall optimization of the supply chain to meet online financial consumption demands of consumers; the study analyzes online reviews on mobile phone topics from the JingDong (JD) platform and Weibo platform.

Findings

Research results show that online demand for mobile phone products is greatly influenced by supply chain links such as product design, logistics transportation and marketing promotion. The consumption demand for different mobile phone products has different emphases, but the differences are not significant. The overall improvement of the supply chain should focus on product research and development, logistics layout optimization and marketing promotion, in order to meet and guide the online financial demand of consumers and improve the effectiveness of supply chain management.

Research limitations/implications

This study only considered data from China's largest online mobile phone sales platform and Weibo text data owing to the data sensitivity involved.

Originality/value

There are few supply chain optimization studies based on online financial consumption reviews from customers. Therefore, this study integrates online consumption trends into a supply chain analysis framework to explore strategies for promoting supply chain optimization according to customer demands, improving the benign interaction of participants in the supply chain and promoting the development of online financial consumption.

Details

Industrial Management & Data Systems, vol. 121 no. 4
Type: Research Article
ISSN: 0263-5577

Keywords

1 – 10 of 514