To read this content please select one of the options below:

Large‐scale grid computing for content‐based image retrieval

Chris Town (University of Cambridge Computer Laboratory and Imense Ltd, Cambridge, UK)
Karl Harrison (School of Physics and Astronomy, University of Birmingham, Birmingham, UK)

Aslib Proceedings

ISSN: 0001-253X

Article publication date: 8 July 2010

705

Abstract

Purpose

Content‐based image retrieval (CBIR) technologies offer many advantages over purely text‐based image search. However, one of the drawbacks associated with CBIR is the increased computational cost arising from tasks such as image processing, feature extraction, image classification, and object detection and recognition. Consequently CBIR systems have suffered from a lack of scalability, which has greatly hampered their adoption for real‐world public and commercial image search. At the same time, paradigms for large‐scale heterogeneous distributed computing such as grid computing, cloud computing, and utility‐based computing are gaining traction as a way of providing more scalable and efficient solutions to large‐scale computing tasks.

Design/methodology/approach

This paper presents an approach in which a large distributed processing grid has been used to apply a range of CBIR methods to a substantial number of images. By massively distributing the required computational task across thousands of grid nodes, very high through‐put has been achieved at relatively low overheads.

Findings

This has allowed one to analyse and index about 25 million high resolution images thus far, while using just two servers for storage and job submission. The CBIR system was developed by Imense Ltd and is based on automated analysis and recognition of image content using a semantic ontology. It features a range of image‐processing and analysis modules, including image segmentation, region classification, scene analysis, object detection, and face recognition methods.

Originality/value

In the case of content‐based image analysis, the primary performance criterion is the overall through‐put achieved by the system in terms of the number of images that can be processed over a given time frame, irrespective of the time taken to process any given image. As such, grid processing has great potential for massively parallel content‐based image retrieval and other tasks with similar performance requirements.

Keywords

Citation

Town, C. and Harrison, K. (2010), "Large‐scale grid computing for content‐based image retrieval", Aslib Proceedings, Vol. 62 No. 4/5, pp. 438-446. https://doi.org/10.1108/00012531011074681

Publisher

:

Emerald Group Publishing Limited

Copyright © 2010, Emerald Group Publishing Limited

Related articles