Search results

1 – 10 of 73

View access options

Article

Publication date: 28 February 2023

A comparative analysis of text representation, classification and clustering methods over real project proposals

Meltem Aksoy, Seda Yanık and Mehmet Fatih Amasyali

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals…

HTML

PDF (5.3 MB)

Downloads

259

Abstract

Purpose

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals are primarily based on manual matching of similar topics, discipline areas and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and requires excessive time. This paper aims to demonstrate how to effectively use the rich information in the titles and abstracts of Turkish project proposals to group them automatically.

Design/methodology/approach

This study proposes a model that effectively groups Turkish project proposals by combining word embedding, clustering and classification techniques. The proposed model uses FastText, BERT and term frequency/inverse document frequency (TF/IDF) word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish. The extracted terms were grouped using both the clustering and classification techniques. Natural groups contained within the corpus were discovered using k-means, k-means++, k-medoids and agglomerative clustering algorithms. Additionally, this study employs classification approaches to predict the target class for each document in the corpus. To classify project proposals, various classifiers, including k-nearest neighbors (KNN), support vector machines (SVM), artificial neural networks (ANN), classification and regression trees (CART) and random forest (RF), are used. Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.

Findings

The results show that the generated word embeddings can effectively represent proposal texts as vectors, and can be used as inputs for clustering or classification algorithms. Using clustering algorithms, the document corpus is divided into five groups. In addition, the results demonstrate that the proposals can easily be categorized into predefined categories using classification algorithms. SVM-Linear achieved the highest prediction accuracy (89.2%) with the FastText word embedding method. A comparison of manual grouping with automatic classification and clustering results revealed that both classification and clustering techniques have a high success rate.

Research limitations/implications

The proposed model automatically benefits from the rich information in project proposals and significantly reduces numerous time-consuming tasks that managers must perform manually. Thus, it eliminates the drawbacks of the current manual methods and yields significantly more accurate results. In the future, additional experiments should be conducted to validate the proposed method using data from other funding organizations.

Originality/value

This study presents the application of word embedding methods to effectively use the rich information in the titles and abstracts of Turkish project proposals. Existing research studies focus on the automatic grouping of proposals; traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals. Unlike previous research, this study employs two outperforming neural network-based textual feature extraction techniques to obtain terms representing the proposals: BERT as a contextual word embedding method and FastText as a static word embedding method. Moreover, to the best of our knowledge, there has been no research conducted on the grouping of project proposals in Turkish.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 13 October 2021

A review of cluster analysis techniques and their uses in library and information science research: k-means and k-medoids clustering

Brady Lund and Jinxuan Ma

This literature review explores the definitions and characteristics of cluster analysis, a machine-learning technique that is frequently implemented to identify groupings in big…

HTML

PDF (614 KB)

Downloads

796

Abstract

Purpose

This literature review explores the definitions and characteristics of cluster analysis, a machine-learning technique that is frequently implemented to identify groupings in big datasets and its applicability to library and information science (LIS) research. This overview is intended for researchers who are interested in expanding their data analysis repertory to include cluster analysis, rather than for existing experts in this area.

Design/methodology/approach

A review of LIS articles included in the Library and Information Source (EBSCO) database that employ cluster analysis is performed. An overview of cluster analysis in general (how it works from a statistical standpoint, and how it can be performed by researchers), the most popular cluster analysis techniques and the uses of cluster analysis in LIS is presented.

Findings

The number of LIS studies that employ a cluster analytic approach has grown from about 5 per year in the early 2000s to an average of 35 studies per year in the mid- and late-2010s. The journal Scientometrics has the most articles published within LIS that use cluster analysis (102 studies). Scientometrics is the most common subject area to employ a cluster analytic approach (152 studies). The findings of this review indicate that cluster analysis could make LIS research more accessible by providing an innovative and insightful process of knowledge discovery.

Originality/value

This review is the first to present cluster analysis as an accessible data analysis approach, specifically from an LIS perspective.

Details

Performance Measurement and Metrics, vol. 22 no. 3

Type: Research Article

DOI:

ISSN: 1467-8047

Keywords

View access options

Article

Publication date: 18 May 2012

Technology forecasting using matrix map and patent clustering

Sunghae Jun, Sang Sung Park and Dong Sik Jang

The purpose of this paper is to propose an objective method for technology forecasting (TF). For the construction of the proposed model, the paper aims to consider new approaches…

HTML

PDF (230 KB)

Downloads

2994

Abstract

Purpose

The purpose of this paper is to propose an objective method for technology forecasting (TF). For the construction of the proposed model, the paper aims to consider new approaches to patent mapping and clustering. In addition, the paper aims to introduce a matrix map and K‐medoids clustering based on support vector clustering (KM‐SVC) for vacant TF.

Design/methodology/approach

TF is an important research and development (R&D) policy issue for both companies and government. Vacant TF is one of the key technological planning methods for improving the competitive power of firms and governments. In general, a forecasting process is facilitated subjectively based on the researcher's knowledge, resulting in unstable TF performance. In this paper, the authors forecast the vacant technology areas in a given technology field by analyzing patent documents and employing the proposed matrix map and KM‐SVC to forecast vacant technology areas in the management of technology (MOT).

Findings

The paper examines the vacant technology areas for MOT patent documents from the USA, Europe, and China by comparing these countries in terms of technology trends in MOT and identifying the vacant technology areas by country. The matrix map provides broad vacant technology areas, whereas KM‐SVC provides more specific vacant technology areas. Thus, the paper identifies the vacant technology areas of a given technology field by using the results for both the matrix map and KM‐SVC.

Practical implications

The authors use patent documents as objective data to develop a model for vacant TF. The paper attempts to objectively forecast the vacant technology areas in a given technology field. To verify the performance of the matrix map and KM‐SVC, the authors conduct an experiment using patent documents related to MOT (the given technology field in this paper). The results suggest that the proposed forecasting model can be applied to diverse technology fields, including R&D management, technology marketing, and intellectual property management.

Originality/value

Most TF models are based on qualitative and subjective methods such as Delphi. That is, there are few objective models. In this regard, this paper proposes a quantitative and objective TF model that employs patent documents as objective data and a matrix map and KM‐SVC as quantitative methods.

Details

Industrial Management & Data Systems, vol. 112 no. 5

Type: Research Article

DOI:

ISSN: 0263-5577

Keywords

View access options

Article

Publication date: 21 August 2023

Does e-government development moderate the impact of female labor participation on national cybersecurity maturity? An empirical investigation

Manimay Dev and Debashis Saha

This paper aims to investigate the relationship of female participation in labor force with the cybersecurity maturity of nations and the enabling role of e-government development…

HTML

PDF (640 KB)

Downloads

146

Abstract

Purpose

This paper aims to investigate the relationship of female participation in labor force with the cybersecurity maturity of nations and the enabling role of e-government development in moderating the same.

Design/methodology/approach

The authors have conducted fixed-effects regression using archival data for 149 countries taken from secondary sources. Furthermore, the authors have grouped the sample countries into four levels of cybersecurity maturity (unprepared, reactive, anticipatory and innovative) using clustering techniques, and studied the influence of their interest variables for individual groups.

Findings

Results show that female participation in labor force positively influences national cybersecurity maturity, and e-government development positively moderates the said relationship, thereby enabling the empowerment of women.

Practical implications

Encouraging broader participation of women in the labor force and prioritizing investments in e-government development are essential steps that organizations and governments may take to enhance a country’s cybersecurity maturity level.

Originality/value

This study empirically demonstrates the impact of the nuanced interplay between female participation in labor force and the e-government development of a nation on its cybersecurity maturity.

Details

Information & Computer Security, vol. 32 no. 1

Type: Research Article

DOI:

ISSN: 2056-4961

Keywords

View access options

Article

Publication date: 16 February 2022

Quality of hire: expanding the multi-level fit employee selection using machine learning

Sateesh Shet and Binesh Nair

Organizational psychologists and human resource management (HRM) practitioners often have to select the “right fit” candidate by manually scouting data from various sources…

HTML

PDF (773 KB)

Downloads

838

Abstract

Purpose

Organizational psychologists and human resource management (HRM) practitioners often have to select the “right fit” candidate by manually scouting data from various sources including job portals and social media. Given the constant pressure to lower the recruitment costs and the time taken to extend an offer to the right talent, the HR function has to inevitably adopt data analytics and machine learning for employee selection. This paper aims to propose the “Quality of Hire” concept for employee selection using the person-environment (P-E) fit theory and machine learning.

Design/methodology/approach

The authors demonstrate the aforementioned concept using a clustering algorithm, namely, partition around mediod (PAM). Based on a curated data set published by the IBM, the authors examine the dimensions of different P-E fits and determine how these dimensions can lead to selection of the “right fit” candidate by evaluating the outcome of PAM.

Findings

The authors propose a multi-level fit model rooted in the P-E theory, which can improve the quality of hire for an organization.

Research limitations/implications

Theoretically, the authors contribute in the domain of quality of hire using a multi-level fit approach based on the P-E theory. Methodologically, the authors contribute in expanding the HR analytics landscape by implementing PAM algorithm in employee selection.

Originality/value

The proposed work is expected to present a useful case on the application of machine learning for practitioners in organizational psychology, HRM and data science.

Details

International Journal of Organizational Analysis, vol. 31 no. 6

Type: Research Article

DOI:

ISSN: 1934-8835

Keywords

Open Access

Article

Publication date: 22 August 2022

The structure and information spread capability of the network formed by integrated fitness apps

Euodia Vermeulen and Sara Grobbelaar

In this article we aim to understand how the network formed by fitness tracking devices and associated apps as a subset of the broader health-related Internet of things is capable…

HTML

PDF (2.2 MB)

Downloads

704

Abstract

Purpose

In this article we aim to understand how the network formed by fitness tracking devices and associated apps as a subset of the broader health-related Internet of things is capable of spreading information.

Design/methodology/approach

The authors used a combination of a content analysis, network analysis, community detection and simulation. A sample of 922 health-related apps (including manufacturers' apps and developers) were collected through snowball sampling after an initial content analysis from a Google search for fitness tracking devices.

Findings

The network of fitness apps is disassortative with high-degree nodes connecting to low-degree nodes, follow a power-law degree distribution and present with low community structure. Information spreads faster through the network than an artificial small-world network and fastest when nodes with high degree centrality are the seeds.

Practical implications

This capability to spread information holds implications for both intended and unintended data sharing.

Originality/value

The analysis confirms and supports evidence of widespread mobility of data between fitness and health apps that were initially reported in earlier work and in addition provides evidence for the dynamic diffusion capability of the network based on its structure. The structure of the network enables the duality of the purpose of data sharing.

Details

Information Technology & People, vol. 35 no. 8

Type: Research Article

DOI:

ISSN: 0959-3845

Keywords

View access options

Article

Publication date: 28 January 2020

Text mining analysis roadmap (TMAR) for service research

Mohamed Zaki and Janet R. McColl-Kennedy

The purpose of this paper is to offer a step-by-step text mining analysis roadmap (TMAR) for service researchers. The paper provides guidance on how to choose between alternative…

HTML

PDF (1.4 MB)

Downloads

1128

Abstract

Purpose

The purpose of this paper is to offer a step-by-step text mining analysis roadmap (TMAR) for service researchers. The paper provides guidance on how to choose between alternative tools, using illustrative examples from a range of business contexts.

Design/methodology/approach

The authors provide a six-stage TMAR on how to use text mining methods in practice. At each stage, the authors provide a guiding question, articulate the aim, identify a range of methods and demonstrate how machine learning and linguistic techniques can be used in practice with illustrative examples drawn from business, from an array of data types, services and contexts.

Findings

At each of the six stages, this paper demonstrates useful insights that result from the text mining techniques to provide an in-depth understanding of the phenomenon and actionable insights for research and practice.

Originality/value

There is little research to guide scholars and practitioners on how to gain insights from the extensive “big data” that arises from the different data sources. In a first, this paper addresses this important gap highlighting the advantages of using text mining to gain useful insights for theory testing and practice in different service contexts.

Details

Journal of Services Marketing, vol. 34 no. 1

Type: Research Article

DOI:

ISSN: 0887-6045

Keywords

View access options

Article

Publication date: 18 January 2021

Critical analysis: bat algorithm-based investigation and application on several domains

Shahla U. Umar and Tarik A. Rashid

The purpose of this study is to provide the reader with a full study of the bat algorithm, including its limitations, the fields that the algorithm has been applied, versatile…

HTML

PDF (430 KB)

Downloads

121

Abstract

Purpose

The purpose of this study is to provide the reader with a full study of the bat algorithm, including its limitations, the fields that the algorithm has been applied, versatile optimization problems in different domains and all the studies that assess its performance against other meta-heuristic algorithms.

Design/methodology/approach

Bat algorithm is given in-depth in terms of backgrounds, characteristics, limitations, it has also displayed the algorithms that hybridized with BA (K-Medoids, back-propagation neural network, harmony search algorithm, differential evaluation strategies, enhanced particle swarm optimization and Cuckoo search algorithm) and their theoretical results, as well as to the modifications that have been performed of the algorithm (modified bat algorithm, enhanced bat algorithm, bat algorithm with mutation (BAM), uninhabited combat aerial vehicle-BAM and non-linear optimization). It also provides a summary review that focuses on improved and new bat algorithm (directed artificial bat algorithm, complex-valued bat algorithm, principal component analyzes-BA, multiple strategies coupling bat algorithm and directional bat algorithm).

Findings

Shed light on the advantages and disadvantages of this algorithm through all the research studies that dealt with the algorithm in addition to the fields and applications it has addressed in the hope that it will help scientists understand and develop it.

Originality/value

As far as the research community knowledge, there is no comprehensive survey study conducted on this algorithm covering all its aspects.

Details

World Journal of Engineering, vol. 18 no. 4

Type: Research Article

DOI:

ISSN: 1708-5284

Keywords

View access options

Article

Publication date: 8 February 2013

Interlinking educational resources and the web of data: A survey of challenges and approaches

Stefan Dietze, Salvador Sanchez‐Alonso, Hannes Ebner, Hong Qing Yu, Daniela Giordano, Ivana Marenzi and Bernardo Pereira Nunes

Research in the area of technology‐enhanced learning (TEL) throughout the last decade has largely focused on sharing and reusing educational resources and data. This effort has…

HTML

PDF (260 KB)

Downloads

1461

Abstract

Purpose

Research in the area of technology‐enhanced learning (TEL) throughout the last decade has largely focused on sharing and reusing educational resources and data. This effort has led to a fragmented landscape of competing metadata schemas, or interface mechanisms. More recently, semantic technologies were taken into account to improve interoperability. The linked data approach has emerged as the de facto standard for sharing data on the web. To this end, it is obvious that the application of linked data principles offers a large potential to solve interoperability issues in the field of TEL. This paper aims to address this issue.

Design/methodology/approach

In this paper, approaches are surveyed that are aimed towards a vision of linked education, i.e. education which exploits educational web data. It particularly considers the exploitation of the wealth of already existing TEL data on the web by allowing its exposure as linked data and by taking into account automated enrichment and interlinking techniques to provide rich and well‐interlinked data for the educational domain.

Findings

So far web‐scale integration of educational resources is not facilitated, mainly due to the lack of take‐up of shared principles, datasets and schemas. However, linked data principles increasingly are recognized by the TEL community. The paper provides a structured assessment and classification of existing challenges and approaches, serving as potential guideline for researchers and practitioners in the field.

Originality/value

Being one of the first comprehensive surveys on the topic of linked data for education, the paper has the potential to become a widely recognized reference publication in the area.

Details

Program, vol. 47 no. 1

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

Open Access

Article

Publication date: 10 February 2020

Financial performance of Hungarian and Romanian retail food small businesses

Veronika Fenyves, Kinga Emese Zsido, Ioan Bircea and Tibor Tarnoczi

Changes in food retailing (globalization, concentration) have negative impacts on smaller, “traditional” food retail businesses. Their market share decreasing year by year. The…

HTML

PDF (979 KB)

Downloads

3986

Abstract

Purpose

Changes in food retailing (globalization, concentration) have negative impacts on smaller, “traditional” food retail businesses. Their market share decreasing year by year. The purpose of this study is to examine and compare the financial performances of these businesses under the given circumstances and current economic environment in a Hungarian and a Romanian county.

Design/methodology/approach

The study is based on two complete databases, including all companies that behoove retail food activity (considering the NACE cod) in the counties of Hajdu-Bihar (Hungary) and Cluj (Romania). The database analyzed contains the financial statements for five consecutive years for 212 and 690 businesses. Databases were examined by the most typical financial indicators using the multivariate and univariate analysis of variance and the k-medoid cluster analysis methods.

Findings

The results of the analysis have shown that there are differences in the number of retail food companies in the case of two counties, both in number and in financial performance. Companies in Hajdú-Bihar county perform better in terms of financial ratios than those in Cluj county. The groups created by k-medoids cluster analysis are relatively well distinguished in the case of Hajdú-Bihar county, while the picture is much more mixed in the case of Kolozs county. However, it is also important to note that the companies analyzed should generally perform better to survive.

Research limitations/implications

Among the limitations of the study, it is important to note that the findings are relevant only to the two counties examined. Another limiting factor is that quite several companies had to be excluded from the analysis due to missing data or outliers.

Practical implications

The study presents for the corporate decision-makers the current performance of the companies of the sector examined in the two counties. The results of the study highlight the business areas of concern in management. The findings show that they need to change this performance to strengthen their market position. We believe that it is not enough to complain about the expansion of the supermarket chains, but they should take appropriate actions to improve their situation. Based on the results of the study, it can be concluded that there is a need to improve the financial efficiency of retail food companies in both counties to survive in the long run. This improvement is essential because retailers can play an important role in smaller settlements and narrower residential environments.

Originality/value

Comparative analysis of retail food companies in similar counties in these two neighboring countries has not been conducted using complex financial analysis. The study revealed the common and/or individual characteristics of these companies.

Details

British Food Journal, vol. 122 no. 11

Type: Research Article

DOI:

ISSN: 0007-070X

Keywords

Access

Year

Content type

1 – 10 of 73