Search results

1 – 10 of 173
Article
Publication date: 29 December 2023

Thanh-Nghi Do and Minh-Thu Tran-Nguyen

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD…

Abstract

Purpose

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD and FL-lSVM. These algorithms are designed to address the challenge of large-scale ImageNet classification.

Design/methodology/approach

The authors’ FL-lSGD and FL-lSVM trains in a parallel and incremental manner to build an ensemble local classifier on Raspberry Pis without requiring data exchange. The algorithms load small data blocks of the local training subset stored on the Raspberry Pi sequentially to train the local classifiers. The data block is split into k partitions using the k-means algorithm, and models are trained in parallel on each data partition to enable local data classification.

Findings

Empirical test results on the ImageNet data set show that the authors’ FL-lSGD and FL-lSVM algorithms with 4 Raspberry Pis (Quad core Cortex-A72, ARM v8, 64-bit SoC @ 1.5GHz, 4GB RAM) are faster than the state-of-the-art LIBLINEAR algorithm run on a PC (Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores, 32GB RAM).

Originality/value

Efficiently addressing the challenge of large-scale ImageNet classification, the authors’ novel federated learning algorithms of local classifiers have been tailored to work on the Raspberry Pi. These algorithms can handle 1,281,167 images and 1,000 classes effectively.

Details

International Journal of Web Information Systems, vol. 20 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 22 July 2022

Thanh-Nghi Do

This paper aims to propose the new incremental and parallel training algorithm of proximal support vector machines (Inc-Par-PSVM) tailored on the edge device (i.e. the Jetson…

Abstract

Purpose

This paper aims to propose the new incremental and parallel training algorithm of proximal support vector machines (Inc-Par-PSVM) tailored on the edge device (i.e. the Jetson Nano) to handle the large-scale ImageNet challenging problem.

Design/methodology/approach

The Inc-Par-PSVM trains in the incremental and parallel manner ensemble binary PSVM classifiers used for the One-Versus-All multiclass strategy on the Jetson Nano. The binary PSVM model is the average in bagged binary PSVM models built in undersampling training data block.

Findings

The empirical test results on the ImageNet data set show that the Inc-Par-PSVM algorithm with the Jetson Nano (Quad-core ARM A57 @ 1.43 GHz, 128-core NVIDIA Maxwell architecture-based graphics processing unit, 4 GB RAM) is faster and more accurate than the state-of-the-art linear SVM algorithm run on a PC [Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores, 32 GB RAM].

Originality/value

The new incremental and parallel PSVM algorithm tailored on the Jetson Nano is able to efficiently handle the large-scale ImageNet challenge with 1.2 million images and 1,000 classes.

Details

International Journal of Web Information Systems, vol. 18 no. 2/3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 19 October 2023

Huaxiang Song

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition…

Abstract

Purpose

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition of RSI, and feature fusion is a research hotspot for its great potential to boost performance. However, RSI has a unique imaging condition and cluttered scenes with complicated backgrounds. This larger difference from nature images has made the previous feature fusion methods present insignificant performance improvements.

Design/methodology/approach

This work proposed a two-convolutional neural network (CNN) fusion method named main and branch CNN fusion network (MBC-Net) as an improved solution for classifying RSI. In detail, the MBC-Net employs an EfficientNet-B3 as its main CNN stream and an EfficientNet-B0 as a branch, named MC-B3 and BC-B0, respectively. In particular, MBC-Net includes a long-range derivation (LRD) module, which is specially designed to learn the dependence of different features. Meanwhile, MBC-Net also uses some unique ideas to tackle the problems coming from the two-CNN fusion and the inherent nature of RSI.

Findings

Extensive experiments on three RSI sets prove that MBC-Net outperforms the other 38 state-of-the-art (STOA) methods published from 2020 to 2023, with a noticeable increase in overall accuracy (OA) values. MBC-Net not only presents a 0.7% increased OA value on the most confusing NWPU set but also has 62% fewer parameters compared to the leading approach that ranks first in the literature.

Originality/value

MBC-Net is a more effective and efficient feature fusion approach compared to other STOA methods in the literature. Given the visualizations of grad class activation mapping (Grad-CAM), it reveals that MBC-Net can learn the long-range dependence of features that a single CNN cannot. Based on the tendency stochastic neighbor embedding (t-SNE) results, it demonstrates that the feature representation of MBC-Net is more effective than other methods. In addition, the ablation tests indicate that MBC-Net is effective and efficient for fusing features from two CNNs.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 30 August 2013

Vanessa El‐Khoury, Martin Jergler, Getnet Abebe Bayou, David Coquil and Harald Kosch

A fine‐grained video content indexing, retrieval, and adaptation requires accurate metadata describing the video structure and semantics to the lowest granularity, i.e. to the…

Abstract

Purpose

A fine‐grained video content indexing, retrieval, and adaptation requires accurate metadata describing the video structure and semantics to the lowest granularity, i.e. to the object level. The authors address these requirements by proposing semantic video content annotation tool (SVCAT) for structural and high‐level semantic video annotation. SVCAT is a semi‐automatic MPEG‐7 standard compliant annotation tool, which produces metadata according to a new object‐based video content model introduced in this work. Videos are temporally segmented into shots and shots level concepts are detected automatically using ImageNet as background knowledge. These concepts are used as a guide to easily locate and select objects of interest which are then tracked automatically to generate an object level metadata. The integration of shot based concept detection with object localization and tracking drastically alleviates the task of an annotator. The paper aims to discuss these issues.

Design/methodology/approach

A systematic keyframes classification into ImageNet categories is used as the basis for automatic concept detection in temporal units. This is then followed by an object tracking algorithm to get exact spatial information about objects.

Findings

Experimental results showed that SVCAT is able to provide accurate object level video metadata.

Originality/value

The new contribution in this paper introduces an approach of using ImageNet to get shot level annotations automatically. This approach assists video annotators significantly by minimizing the effort required to locate salient objects in the video.

Details

International Journal of Pervasive Computing and Communications, vol. 9 no. 3
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 12 March 2019

Jingye Qu and Jiangping Chen

This paper aims to introduce the construction methods, image organization, collection use and access of benchmark image collections to the digital library (DL) community. It aims…

Abstract

Purpose

This paper aims to introduce the construction methods, image organization, collection use and access of benchmark image collections to the digital library (DL) community. It aims to connect two distinct communities: the DL community and image processing researchers so that future image collections could be better constructed, organized and managed for both human and computer use.

Design/methodology/approach

Image collections are first identified through an extensive literature review of published journal articles and a web search. Then, a coding scheme focusing on image collections’ creation, organization, access and use is developed. Next, three major benchmark image collections are analysed based on the proposed coding scheme. Finally, the characteristics of benchmark image collections are summarized and compared to DLs.

Findings

Although most of the image collections in DLs are carefully curated and organized using various metadata schema based on an image’s external features to facilitate human use, the benchmark image collections created for promoting image processing algorithms are annotated on an image’s content to the pixel level, which makes each image collection a more fine-grained, organized database appropriate for developing automatic techniques on classification summarization, visualization and content-based retrieval.

Research limitations/implications

This paper overviews image collections by their application fields. The three most representative natural image collections in general areas are analysed in detail based on a homemade coding scheme, which could be further extended. Also, domain-specific image collections, such as medical image collections or collections for scientific purposes, are not covered.

Practical implications

This paper helps DLs with image collections to understand how benchmark image collections used by current image processing research are created, organized and managed. It informs multiple parties pertinent to image collections to collaborate on building, sustaining, enriching and providing access to image collections.

Originality/value

This paper is the first attempt to review and summarize benchmark image collections for DL managers and developers. The collection creation process and image organization used in these benchmark image collections open a new perspective to digital librarians for their future DL collection development.

Details

The Electronic Library , vol. 37 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 14 December 2023

Huaxiang Song, Chai Wei and Zhou Yong

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of…

Abstract

Purpose

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of clustered ground objects and noisy backgrounds. Recent research typically leverages larger volume models to achieve advanced performance. However, the operating environments of remote sensing commonly cannot provide unconstrained computational and storage resources. It requires lightweight algorithms with exceptional generalization capabilities.

Design/methodology/approach

This study introduces an efficient knowledge distillation (KD) method to build a lightweight yet precise convolutional neural network (CNN) classifier. This method also aims to substantially decrease the training time expenses commonly linked with traditional KD techniques. This approach entails extensive alterations to both the model training framework and the distillation process, each tailored to the unique characteristics of RSIs. In particular, this study establishes a robust ensemble teacher by independently training two CNN models using a customized, efficient training algorithm. Following this, this study modifies a KD loss function to mitigate the suppression of non-target category predictions, which are essential for capturing the inter- and intra-similarity of RSIs.

Findings

This study validated the student model, termed KD-enhanced network (KDE-Net), obtained through the KD process on three benchmark RSI data sets. The KDE-Net surpasses 42 other state-of-the-art methods in the literature published from 2020 to 2023. Compared to the top-ranked method’s performance on the challenging NWPU45 data set, KDE-Net demonstrated a noticeable 0.4% increase in overall accuracy with a significant 88% reduction in parameters. Meanwhile, this study’s reformed KD framework significantly enhances the knowledge transfer speed by at least three times.

Originality/value

This study illustrates that the logit-based KD technique can effectively develop lightweight CNN classifiers for RSI classification without substantial sacrifices in computation and storage costs. Compared to neural architecture search or other methods aiming to provide lightweight solutions, this study’s KDE-Net, based on the inherent characteristics of RSIs, is currently more efficient in constructing accurate yet lightweight classifiers for RSI classification.

Details

International Journal of Web Information Systems, vol. 20 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 26 January 2022

K. Venkataravana Nayak, J.S. Arunalatha, G.U. Vasanthakumar and K.R. Venugopal

The analysis of multimedia content is being applied in various real-time computer vision applications. In multimedia content, digital images constitute a significant part. The…

Abstract

Purpose

The analysis of multimedia content is being applied in various real-time computer vision applications. In multimedia content, digital images constitute a significant part. The representation of digital images interpreted by humans is subjective in nature and complex. Hence, searching for relevant images from the archives is difficult. Thus, electronic image analysis strategies have become effective tools in the process of image interpretation.

Design/methodology/approach

The traditional approach used is text-based, i.e. searching images using textual annotations. It consumes time in the manual process of annotating images and is difficult to reduce the dependency in textual annotations if the archive consists of large number of samples. Therefore, content-based image retrieval (CBIR) is adopted in which the high-level visuals of images are represented in terms of feature vectors, which contain numerical values. It is a commonly used approach to understand the content of query images in retrieving relevant images. Still, the performance is less than optimal due to the presence of semantic gap among the image content representation and human visual understanding perspective because of the image content photometric, geometric variations and occlusions in search environments.

Findings

The authors proposed an image retrieval framework to generate semantic response through the feature extraction with convolution network and optimization of extracted features using adaptive moment estimation algorithm towards enhancing the retrieval performance.

Originality/value

The proposed framework is tested on Corel-1k and ImageNet datasets resulted in an accuracy of 98 and 96%, respectively, compared to the state-of-the-art approaches.

Details

International Journal of Intelligent Unmanned Systems, vol. 11 no. 1
Type: Research Article
ISSN: 2049-6427

Keywords

Article
Publication date: 12 July 2023

Hadi Mahamivanan, Navid Ghassemi, Mohammad Tayarani Darbandy, Afshin Shoeibi, Sadiq Hussain, Farnad Nasirzadeh, Roohallah Alizadehsani, Darius Nahavandi, Abbas Khosravi and Saeid Nahavandi

This paper aims to propose a new deep learning technique to detect the type of material to improve automated construction quality monitoring.

Abstract

Purpose

This paper aims to propose a new deep learning technique to detect the type of material to improve automated construction quality monitoring.

Design/methodology/approach

A new data augmentation approach that has improved the model robustness against different illumination conditions and overfitting is proposed. This study uses data augmentation at test time and adds outlier samples to training set to prevent over-fitted network training. For data augmentation at test time, five segments are extracted from each sample image and fed to the network. For these images, the network outputting average values is used as the final prediction. Then, the proposed approach is evaluated on multiple deep networks used as material classifiers. The fully connected layers are removed from the end of the networks, and only convolutional layers are retained.

Findings

The proposed method is evaluated on recognizing 11 types of building materials which include 1,231 images taken from several construction sites. Each image resolution is 4,000 × 3,000. The images are captured with different illumination and camera positions. Different illumination conditions lead to trained networks that are more robust against various environmental conditions. Using VGG16 model, an accuracy of 97.35% is achieved outperforming existing approaches.

Practical implications

It is believed that the proposed method presents a new and robust tool for detecting and classifying different material types. The automated detection of material will aid to monitor the quality and see whether the right type of material has been used in the project based on contract specifications. In addition, the proposed model can be used as a guideline for performing quality control (QC) in construction projects based on project quality plan. It can also be used as an input for automated progress monitoring because the material type detection will provide a critical input for object detection.

Originality/value

Several studies have been conducted to perform quality management, but there are some issues that need to be addressed. In most previous studies, a very limited number of material types were examined. In addition, although some studies have reported high accuracy to detect material types (Bunrit et al., 2020), their accuracy is dramatically reduced when they are used to detect materials with similar texture and color. In this research, the authors propose a new method to solve the mentioned shortcomings.

Details

Construction Innovation , vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1471-4175

Keywords

Article
Publication date: 25 February 2022

Jun Xiang, Ruru Pan and Weidong Gao

The paper aims to propose a novel method based on deep sparse convolutional neural network (CNN) for clothing recognition. A CNN based on inception module is applied to bridge…

Abstract

Purpose

The paper aims to propose a novel method based on deep sparse convolutional neural network (CNN) for clothing recognition. A CNN based on inception module is applied to bridge pixel-level features and high-level category labels. In order to improve the robustness accuracy of the network, six transformation methods are used to preprocess images. To avoid representational bottlenecks, small-sized convolution kernels are adopted in the network. This method first pretrains the network on ImageNet and then fine-tune the model in clothing data set.

Design/methodology/approach

The paper opts for an exploratory study by using the control variable comparison method. To verify the rationality of the network structure, lateral contrast experiments with common network structures such as VGG, GoogLeNet and AlexNet, and longitudinal contrast tests with different structures from one another are performed on the created clothing image data sets. The indicators of comparison include accuracy, average recall, average precise and F-1 score.

Findings

Compared with common methods, the experimental results show that the proposed network has better performance on clothing recognition. It is also can be found that larger input size can effectively improve accuracy. By analyzing the output structure of the model, the model learns a certain “rules” of human recognition clothing.

Originality/value

Clothing analysis and recognition is a meaningful issue, due to its potential values in many areas, including fashion design, e-commerce and retrieval system. Meanwhile, it is challenging because of the diversity of clothing appearance and background. Thus, this paper raises a network based on deep sparse CNN to realize clothing recognition.

Details

International Journal of Clothing Science and Technology, vol. 34 no. 1
Type: Research Article
ISSN: 0955-6222

Keywords

Article
Publication date: 23 July 2019

Heng Ding, Wei Lu and Tingting Jiang

Photographs are a kind of cultural heritage and very useful for cultural and historical studies. However, traditional or manual research methods are costly and cannot be applied…

Abstract

Purpose

Photographs are a kind of cultural heritage and very useful for cultural and historical studies. However, traditional or manual research methods are costly and cannot be applied on a large scale. This paper aims to present an exploratory study for understanding the cultural concerns of libraries based on the automatic analysis of large-scale image collections.

Design/methodology/approach

In this work, an image dataset including 85,023 images preserved and shared by 28 libraries is collected from the Flickr Commons project. Then, a method is proposed for representing the culture with a distribution of visual semantic concepts using a state-of-the-art deep learning technique and measuring the cultural concerns of image collections using two metrics. Case studies on this dataset demonstrated the great potential and promise of the method for understanding large-scale image collections from the perspective of cultural concerns.

Findings

The proposed method has the ability to discover important cultural units from large-scale image collections. The proposed two metrics are able to quantify the cultural concerns of libraries from different perspectives.

Originality/value

To the best of the authors’ knowledge, this is the first automatic analysis of images for the purpose of understanding cultural concerns of libraries. The significance of this study mainly consists in the proposed method of understanding the cultural concerns of libraries based on the automatic analysis of the visual semantic concepts in image collections. Moreover, this paper has examined the cultural concerns (e.g. important cultural units, cultural focus, trends and volatility of cultural concerns) of 28 libraries.

Details

The Electronic Library , vol. 37 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

1 – 10 of 173