Search results
1 – 10 of 382Image segmentation is one of the most essential tasks in image processing applications. It is a valuable tool in many oriented applications such as health-care systems, pattern…
Abstract
Purpose
Image segmentation is one of the most essential tasks in image processing applications. It is a valuable tool in many oriented applications such as health-care systems, pattern recognition, traffic control, surveillance systems, etc. However, an accurate segmentation is a critical task since finding a correct model that fits a different type of image processing application is a persistent problem. This paper develops a novel segmentation model that aims to be a unified model using any kind of image processing application. The proposed precise and parallel segmentation model (PPSM) combines the three benchmark distribution thresholding techniques to estimate an optimum threshold value that leads to optimum extraction of the segmented region: Gaussian, lognormal and gamma distributions. Moreover, a parallel boosting algorithm is proposed to improve the performance of the developed segmentation algorithm and minimize its computational cost. To evaluate the effectiveness of the proposed PPSM, different benchmark data sets for image segmentation are used such as Planet Hunters 2 (PH2), the International Skin Imaging Collaboration (ISIC), Microsoft Research in Cambridge (MSRC), the Berkley Segmentation Benchmark Data set (BSDS) and Common Objects in COntext (COCO). The obtained results indicate the efficacy of the proposed model in achieving high accuracy with significant processing time reduction compared to other segmentation models and using different types and fields of benchmarking data sets.
Design/methodology/approach
The proposed PPSM combines the three benchmark distribution thresholding techniques to estimate an optimum threshold value that leads to optimum extraction of the segmented region: Gaussian, lognormal and gamma distributions.
Findings
On the basis of the achieved results, it can be observed that the proposed PPSM–minimum cross-entropy thresholding (PPSM–MCET)-based segmentation model is a robust, accurate and highly consistent method with high-performance ability.
Originality/value
A novel hybrid segmentation model is constructed exploiting a combination of Gaussian, gamma and lognormal distributions using MCET. Moreover, and to provide an accurate and high-performance thresholding with minimum computational cost, the proposed PPSM uses a parallel processing method to minimize the computational effort in MCET computing. The proposed model might be used as a valuable tool in many oriented applications such as health-care systems, pattern recognition, traffic control, surveillance systems, etc.
Details
Keywords
Kittisak Chotikkakamthorn, Panrasee Ritthipravat, Worapan Kusakunniran, Pimchanok Tuakta and Paitoon Benjapornlert
Mouth segmentation is one of the challenging tasks of development in lip reading applications due to illumination, low chromatic contrast and complex mouth appearance. Recently…
Abstract
Purpose
Mouth segmentation is one of the challenging tasks of development in lip reading applications due to illumination, low chromatic contrast and complex mouth appearance. Recently, deep learning methods effectively solved mouth segmentation problems with state-of-the-art performances. This study presents a modified Mobile DeepLabV3 based technique with a comprehensive evaluation based on mouth datasets.
Design/methodology/approach
This paper presents a novel approach to mouth segmentation by Mobile DeepLabV3 technique with integrating decode and auxiliary heads. Extensive data augmentation, online hard example mining (OHEM) and transfer learning have been applied. CelebAMask-HQ and the mouth dataset from 15 healthy subjects in the department of rehabilitation medicine, Ramathibodi hospital, are used in validation for mouth segmentation performance.
Findings
Extensive data augmentation, OHEM and transfer learning had been performed in this study. This technique achieved better performance on CelebAMask-HQ than existing segmentation techniques with a mean Jaccard similarity coefficient (JSC), mean classification accuracy and mean Dice similarity coefficient (DSC) of 0.8640, 93.34% and 0.9267, respectively. This technique also achieved better performance on the mouth dataset with a mean JSC, mean classification accuracy and mean DSC of 0.8834, 94.87% and 0.9367, respectively. The proposed technique achieved inference time usage per image of 48.12 ms.
Originality/value
The modified Mobile DeepLabV3 technique was developed with extensive data augmentation, OHEM and transfer learning. This technique gained better mouth segmentation performance than existing techniques. This makes it suitable for implementation in further lip-reading applications.
Details
Keywords
Ankang Ji, Xiaolong Xue, Limao Zhang, Xiaowei Luo and Qingpeng Man
Crack detection of pavement is a critical task in the periodic survey. Efficient, effective and consistent tracking of the road conditions by identifying and locating crack…
Abstract
Purpose
Crack detection of pavement is a critical task in the periodic survey. Efficient, effective and consistent tracking of the road conditions by identifying and locating crack contributes to establishing an appropriate road maintenance and repair strategy from the promptly informed managers but still remaining a significant challenge. This research seeks to propose practical solutions for targeting the automatic crack detection from images with efficient productivity and cost-effectiveness, thereby improving the pavement performance.
Design/methodology/approach
This research applies a novel deep learning method named TransUnet for crack detection, which is structured based on Transformer, combined with convolutional neural networks as encoder by leveraging a global self-attention mechanism to better extract features for enhancing automatic identification. Afterward, the detected cracks are used to quantify morphological features from five indicators, such as length, mean width, maximum width, area and ratio. Those analyses can provide valuable information for engineers to assess the pavement condition with efficient productivity.
Findings
In the training process, the TransUnet is fed by a crack dataset generated by the data augmentation with a resolution of 224 × 224 pixels. Subsequently, a test set containing 80 new images is used for crack detection task based on the best selected TransUnet with a learning rate of 0.01 and a batch size of 1, achieving an accuracy of 0.8927, a precision of 0.8813, a recall of 0.8904, an F1-measure and dice of 0.8813, and a Mean Intersection over Union of 0.8082, respectively. Comparisons with several state-of-the-art methods indicate that the developed approach in this research outperforms with greater efficiency and higher reliability.
Originality/value
The developed approach combines TransUnet with an integrated quantification algorithm for crack detection and quantification, performing excellently in terms of comparisons and evaluation metrics, which can provide solutions with potentially serving as the basis for an automated, cost-effective pavement condition assessment scheme.
Details
Keywords
Worapan Kusakunniran, Sarattha Karnjanapreechakorn, Pitipol Choopong, Thanongchai Siriapisith, Nattaporn Tesavibul, Nopasak Phasukkijwatana, Supalert Prakhunhungsit and Sutasinee Boonsopon
This paper aims to propose a solution for detecting and grading diabetic retinopathy (DR) in retinal images using a convolutional neural network (CNN)-based approach. It could…
Abstract
Purpose
This paper aims to propose a solution for detecting and grading diabetic retinopathy (DR) in retinal images using a convolutional neural network (CNN)-based approach. It could classify input retinal images into a normal class or an abnormal class, which would be further split into four stages of abnormalities automatically.
Design/methodology/approach
The proposed solution is developed based on a newly proposed CNN architecture, namely, DeepRoot. It consists of one main branch, which is connected by two side branches. The main branch is responsible for the primary feature extractor of both high-level and low-level features of retinal images. Then, the side branches further extract more complex and detailed features from the features outputted from the main branch. They are designed to capture details of small traces of DR in retinal images, using modified zoom-in/zoom-out and attention layers.
Findings
The proposed method is trained, validated and tested on the Kaggle dataset. The regularization of the trained model is evaluated using unseen data samples, which were self-collected from a real scenario from a hospital. It achieves a promising performance with a sensitivity of 98.18% under the two classes scenario.
Originality/value
The new CNN-based architecture (i.e. DeepRoot) is introduced with the concept of a multi-branch network. It could assist in solving a problem of an unbalanced dataset, especially when there are common characteristics across different classes (i.e. four stages of DR). Different classes could be outputted at different depths of the network.
Details
Keywords
Dilawar Ali, Kenzo Milleville, Steven Verstockt, Nico Van de Weghe, Sally Chambers and Julie M. Birkholz
Historical newspaper collections provide a wealth of information about the past. Although the digitization of these collections significantly improves their accessibility, a large…
Abstract
Purpose
Historical newspaper collections provide a wealth of information about the past. Although the digitization of these collections significantly improves their accessibility, a large portion of digitized historical newspaper collections, such as those of KBR, the Royal Library of Belgium, are not yet searchable at article-level. However, recent developments in AI-based research methods, such as document layout analysis, have the potential for further enriching the metadata to improve the searchability of these historical newspaper collections. This paper aims to discuss the aforementioned issue.
Design/methodology/approach
In this paper, the authors explore how existing computer vision and machine learning approaches can be used to improve access to digitized historical newspapers. To do this, the authors propose a workflow, using computer vision and machine learning approaches to (1) provide article-level access to digitized historical newspaper collections using document layout analysis, (2) extract specific types of articles (e.g. feuilletons – literary supplements from Le Peuple from 1938), (3) conduct image similarity analysis using (un)supervised classification methods and (4) perform named entity recognition (NER) to link the extracted information to open data.
Findings
The results show that the proposed workflow improves the accessibility and searchability of digitized historical newspapers, and also contributes to the building of corpora for digital humanities research. The AI-based methods enable automatic extraction of feuilletons, clustering of similar images and dynamic linking of related articles.
Originality/value
The proposed workflow enables automatic extraction of articles, including detection of a specific type of article, such as a feuilleton or literary supplement. This is particularly valuable for humanities researchers as it improves the searchability of these collections and enables corpora to be built around specific themes. Article-level access to, and improved searchability of, KBR's digitized newspapers are demonstrated through the online tool (https://tw06v072.ugent.be/kbr/).
Details
Keywords
R.S. Vignesh and M. Monica Subashini
An abundance of techniques has been presented so forth for waste classification but, they deliver inefficient results with low accuracy. Their achievement on various repositories…
Abstract
Purpose
An abundance of techniques has been presented so forth for waste classification but, they deliver inefficient results with low accuracy. Their achievement on various repositories is different and also, there is insufficiency of high-scale databases for training. The purpose of the study is to provide high security.
Design/methodology/approach
In this research, optimization-assisted federated learning (FL) is introduced for thermoplastic waste segregation and classification. The deep learning (DL) network trained by Archimedes Henry gas solubility optimization (AHGSO) is used for the classification of plastic and resin types. The deep quantum neural networks (DQNN) is used for first-level classification and the deep max-out network (DMN) is employed for second-level classification. This developed AHGSO is obtained by blending the features of Archimedes optimization algorithm (AOA) and Henry gas solubility optimization (HGSO). The entities included in this approach are nodes and servers. Local training is carried out depending on local data and updations to the server are performed. Then, the model is aggregated at the server. Thereafter, each node downloads the global model and the update training is executed depending on the downloaded global and the local model till it achieves the satisfied condition. Finally, local update and aggregation at the server is altered based on the average method. The Data tag suite (DATS_2022) dataset is used for multilevel thermoplastic waste segregation and classification.
Findings
By using the DQNN in first-level classification the designed optimization-assisted FL has gained an accuracy of 0.930, mean average precision (MAP) of 0.933, false positive rate (FPR) of 0.213, loss function of 0.211, mean square error (MSE) of 0.328 and root mean square error (RMSE) of 0.572. In the second level classification, by using DMN the accuracy, MAP, FPR, loss function, MSE and RMSE are 0.932, 0.935, 0.093, 0.068, 0.303 and 0.551.
Originality/value
The multilevel thermoplastic waste segregation and classification using the proposed model is accurate and improves the effectiveness of the classification.
Details
Keywords
Diana Irinel Baila, Filippo Sanfilippo, Tom Savu, Filip Górski, Ionut Cristian Radu, Catalin Zaharia, Constantina Anca Parau, Martin Zelenay and Pacurar Razvan
The development of new advanced materials, such as photopolymerizable resins for use in stereolithography (SLA) and Ti6Al4V manufacture via selective laser melting (SLM…
Abstract
Purpose
The development of new advanced materials, such as photopolymerizable resins for use in stereolithography (SLA) and Ti6Al4V manufacture via selective laser melting (SLM) processes, have gained significant attention in recent years. Their accuracy, multi-material capability and application in novel fields, such as implantology, biomedical, aviation and energy industries, underscore the growing importance of these materials. The purpose of this study is oriented toward the application of new advanced materials in stent manufacturing realized by 3D printing technologies.
Design/methodology/approach
The methodology for designing personalized medical devices, implies computed tomography (CT) or magnetic resonance (MR) techniques. By realizing segmentation, reverse engineering and deriving a 3D model of a blood vessel, a subsequent stent design is achieved. The tessellation process and 3D printing methods can then be used to produce these parts. In this context, the SLA technology, in close correlation with the new types of developed resins, has brought significant evolution, as demonstrated through the analyses that are realized in the research presented in this study. This study undertakes a comprehensive approach, establishing experimentally the characteristics of two new types of photopolymerizable resins (both undoped and doped with micro-ceramic powders), remarking their great accuracy for 3D modeling in die-casting techniques, especially in the production process of customized stents.
Findings
A series of analyses were conducted, including scanning electron microscopy, energy-dispersive X-ray spectroscopy, mapping and roughness tests. Additionally, the structural integrity and molecular bonding of these resins were assessed by Fourier-transform infrared spectroscopy–attenuated total reflectance analysis. The research also explored the possibilities of using metallic alloys for producing the stents, comparing the direct manufacturing methods of stents’ struts by SLM technology using Ti6Al4V with stent models made from photopolymerizable resins using SLA. Furthermore, computer-aided engineering (CAE) simulations for two different stent struts were carried out, providing insights into the potential of using these materials and methods for realizing the production of stents.
Originality/value
This study covers advancements in materials and additive manufacturing methods but also approaches the use of CAE analysis, introducing in this way novel elements to the domain of customized stent manufacturing. The emerging applications of these resins, along with metallic alloys and 3D printing technologies, have brought significant contributions to the biomedical domain, as emphasized in this study. This study concludes by highlighting the current challenges and future research directions in the use of photopolymerizable resins and biocompatible metallic alloys, while also emphasizing the integration of artificial intelligence in the design process of customized stents by taking into consideration the 3D printing technologies that are used for producing these stents.
Details
Keywords
Shilong Zhang, Changyong Liu, Kailun Feng, Chunlai Xia, Yuyin Wang and Qinghe Wang
The swivel construction method is a specially designed process used to build bridges that cross rivers, valleys, railroads and other obstacles. To carry out this construction…
Abstract
Purpose
The swivel construction method is a specially designed process used to build bridges that cross rivers, valleys, railroads and other obstacles. To carry out this construction method safely, real-time monitoring of the bridge rotation process is required to ensure a smooth swivel operation without collisions. However, the traditional means of monitoring using Electronic Total Station tools cannot realize real-time monitoring, and monitoring using motion sensors or GPS is cumbersome to use.
Design/methodology/approach
This study proposes a monitoring method based on a series of computer vision (CV) technologies, which can monitor the rotation angle, velocity and inclination angle of the swivel construction in real-time. First, three proposed CV algorithms was developed in a laboratory environment. The experimental tests were carried out on a bridge scale model to select the outperformed algorithms for rotation, velocity and inclination monitor, respectively, as the final monitoring method in proposed method. Then, the selected method was implemented to monitor an actual bridge during its swivel construction to verify the applicability.
Findings
In the laboratory study, the monitoring data measured with the selected monitoring algorithms was compared with those measured by an Electronic Total Station and the errors in terms of rotation angle, velocity and inclination angle, were 0.040%, 0.040%, and −0.454%, respectively, thus validating the accuracy of the proposed method. In the pilot actual application, the method was shown to be feasible in a real construction application.
Originality/value
In a well-controlled laboratory the optimal algorithms for bridge swivel construction are identified and in an actual project the proposed method is verified. The proposed CV method is complementary to the use of Electronic Total Station tools, motion sensors, and GPS for safety monitoring of swivel construction of bridges. It also contributes to being a possible approach without data-driven model training. Its principal advantages are that it both provides real-time monitoring and is easy to deploy in real construction applications.
Details
Keywords
An increasing number of images are generated daily, and images are gradually becoming a search target. Content-based image retrieval (CBIR) is helpful for users to express their…
Abstract
Purpose
An increasing number of images are generated daily, and images are gradually becoming a search target. Content-based image retrieval (CBIR) is helpful for users to express their requirements using an image query. Nevertheless, determining whether the retrieval system can provide convenient operation and relevant retrieval results is challenging. A CBIR system based on deep learning features was proposed in this study to effectively search and navigate images in digital articles.
Design/methodology/approach
Convolutional neural networks (CNNs) were used as the feature extractors in the author's experiments. Using pretrained parameters, the training time and retrieval time were reduced. Different CNN features were extracted from the constructed image databases consisting of images taken from the National Palace Museum Journals Archive and were compared in the CBIR system.
Findings
DenseNet201 achieved the best performance, with a top-10 mAP of 89% and a query time of 0.14 s.
Practical implications
The CBIR homepage displayed image categories showing the content of the database and provided the default query images. After retrieval, the result showed the metadata of the retrieved images and links back to the original pages.
Originality/value
With the interface and retrieval demonstration, a novel image-based reading mode can be established via the CBIR and links to the original images and contextual descriptions.
Details
Keywords
Yaolin Zhou, Zhaoyang Zhang, Xiaoyu Wang, Quanzheng Sheng and Rongying Zhao
The digitalization of archival management has rapidly developed with the maturation of digital technology. With data's exponential growth, archival resources have transitioned…
Abstract
Purpose
The digitalization of archival management has rapidly developed with the maturation of digital technology. With data's exponential growth, archival resources have transitioned from single modalities, such as text, images, audio and video, to integrated multimodal forms. This paper identifies key trends, gaps and areas of focus in the field. Furthermore, it proposes a theoretical organizational framework based on deep learning to address the challenges of managing archives in the era of big data.
Design/methodology/approach
Via a comprehensive systematic literature review, the authors investigate the field of multimodal archive resource organization and the application of deep learning techniques in archive organization. A systematic search and filtering process is conducted to identify relevant articles, which are then summarized, discussed and analyzed to provide a comprehensive understanding of existing literature.
Findings
The authors' findings reveal that most research on multimodal archive resources predominantly focuses on aspects related to storage, management and retrieval. Furthermore, the utilization of deep learning techniques in image archive retrieval is increasing, highlighting their potential for enhancing image archive organization practices; however, practical research and implementation remain scarce. The review also underscores gaps in the literature, emphasizing the need for more practical case studies and the application of theoretical concepts in real-world scenarios. In response to these insights, the authors' study proposes an innovative deep learning-based organizational framework. This proposed framework is designed to navigate the complexities inherent in managing multimodal archive resources, representing a significant stride toward more efficient and effective archival practices.
Originality/value
This study comprehensively reviews the existing literature on multimodal archive resources organization. Additionally, a theoretical organizational framework based on deep learning is proposed, offering a novel perspective and solution for further advancements in the field. These insights contribute theoretically and practically, providing valuable knowledge for researchers, practitioners and archivists involved in organizing multimodal archive resources.
Details