Search results

1 – 10 of over 262000
Article
Publication date: 9 October 2019

Rokas Jurevičius and Virginijus Marcinkevičius

The purpose of this paper is to present a new data set of aerial imagery from robotics simulator (AIR). AIR data set aims to provide a starting point for localization system…

Abstract

Purpose

The purpose of this paper is to present a new data set of aerial imagery from robotics simulator (AIR). AIR data set aims to provide a starting point for localization system development and to become a typical benchmark for accuracy comparison of map-based localization algorithms, visual odometry and SLAM for high-altitude flights.

Design/methodology/approach

The presented data set contains over 100,000 aerial images captured from Gazebo robotics simulator using orthophoto maps as a ground plane. Flights with three different trajectories are performed on maps from urban and forest environment at different altitudes, totaling over 33 kilometers of flight distance.

Findings

The review of previous research studies show that the presented data set is the largest currently available public data set with downward facing camera imagery.

Originality/value

This paper presents the problem of missing publicly available data sets for high-altitude (100‒3,000 meters) UAV flights; the current state-of-the-art research studies performed to develop map-based localization system for UAVs depend on real-life test flights and custom-simulated data sets for accuracy evaluation of the algorithms. The presented new data set solves this problem and aims to help the researchers to improve and benchmark new algorithms for high-altitude flights.

Details

International Journal of Intelligent Unmanned Systems, vol. 8 no. 3
Type: Research Article
ISSN: 2049-6427

Keywords

Article
Publication date: 11 May 2015

Sajad Saeedi, Carl Thibault, Michael Trentini and Howard Li

The purpose of this paper is to present a localization and mapping data set acquired by a fixed-wing unmanned aerial vehicle (UAV). The data set was collected for educational and…

Abstract

Purpose

The purpose of this paper is to present a localization and mapping data set acquired by a fixed-wing unmanned aerial vehicle (UAV). The data set was collected for educational and research purposes: to save time in dealing with hardware and to compare the results with a benchmark data set. The data were collected in standard Robot Operating System (ROS) format. The environment, fixed-wing, and sensor configuration are explained in detail. GPS coordinates of the fixed-wing are also available as ground truth. The data set is available for download (www.ece.unb.ca/COBRA/open_source.htm).

Design/methodology/approach

The data were collected in standard ROS format. The environment, fixed-wing, and sensor configuration are explained in detail.

Findings

The data set can be used for target localization and mapping. The data were collected to assist algorithm developments and help researchers to compare their results. Robotic data sets are specifically important when they are related to unmanned systems such as fixed-wing aircraft.

Originality/value

The Robotics Data Set Repository (RADISH) by A. Howard and N. Roy hosts 41 well-known data sets with different sensors; however, there is no fixed-wing data set in RADISH. This work presents two data sets collected by a fixed-wing aircraft using ROS standards. The data sets can be used for target localization and SLAM.

Details

International Journal of Intelligent Unmanned Systems, vol. 3 no. 2/3
Type: Research Article
ISSN: 2049-6427

Keywords

Article
Publication date: 3 October 2016

Pedro Carreira and Carlos Gomes da Silva

The purpose of this paper is to propose a methodology to estimate the number of records that were omitted from a data set, and to assess its effectiveness.

Abstract

Purpose

The purpose of this paper is to propose a methodology to estimate the number of records that were omitted from a data set, and to assess its effectiveness.

Design/methodology/approach

The procedure to estimate the number of records that were omitted from a data set is based on Benford’s law. Empirical experiments are performed to illustrate the application of the procedure. In detail, two simulated Benford-conforming data sets are distorted and the procedure is then used to recover the original patterns of the data sets.

Findings

The effectiveness of the procedure seems to increase with the degree of conformity of the original data set with Benford’s law.

Practical implications

This work can be useful in auditing and economic crime detection, namely in identifying tax evasion.

Originality/value

This work is the first to propose Benford’s law as a tool to detect data evasion.

Details

Journal of Financial Crime, vol. 23 no. 4
Type: Research Article
ISSN: 1359-0790

Keywords

Article
Publication date: 26 February 2024

Victoria Delaney and Victor R. Lee

With increased focus on data literacy and data science education in K-12, little is known about what makes a data set preferable for use by classroom teachers. Given that…

Abstract

Purpose

With increased focus on data literacy and data science education in K-12, little is known about what makes a data set preferable for use by classroom teachers. Given that educational designers often privilege authenticity, the purpose of this study is to examine how teachers use features of data sets to determine their suitability for authentic data science learning experiences with their students.

Design/methodology/approach

Interviews with 12 practicing high school mathematics and statistics teachers were conducted and video-recorded. Teachers were given two different data sets about the same context and asked to explain which one would be better suited for an authentic data science experience. Following knowledge analysis methods, the teachers’ responses were coded and iteratively reviewed to find themes that appeared across multiple teachers related to their aesthetic judgments.

Findings

Three aspects of authenticity for data sets for this task were identified. These include thinking of authentic data sets as being “messy,” as requiring more work for the student or analyst to pore through than other data sets and as involving computation.

Originality/value

Analysis of teachers’ aesthetics of data sets is a new direction for work on data literacy and data science education. The findings invite the field to think critically about how to help teachers develop new aesthetics and to provide data sets in curriculum materials that are suited for classroom use.

Details

Information and Learning Sciences, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2398-5348

Keywords

Book part
Publication date: 27 August 2016

James K. Galbraith, Jaehee Choi, Béatrice Halbach, Aleksandra Malinowska and Wenjie Zhang

We present a comparison of coverage and values for five inequality data sets that have worldwide or major international coverage and independent measurements that are intended to…

Abstract

We present a comparison of coverage and values for five inequality data sets that have worldwide or major international coverage and independent measurements that are intended to present consistent coefficients that can be compared directly across countries and time. The comparison data sets are those published by the Luxembourg Income Studies (LIS), the OECD, the European Union’s Statistics on Incomes and Living Conditions (EU-SILC), and the World Bank’s World Development Indicators (WDI). The baseline comparison is with our own Estimated Household Income Inequality (EHII) data set of the University of Texas Inequality Project. The comparison shows the historical depth and range of EHII and its broad compatibility with LIS, OECD, and EU-SILC, as well as problems with using the WDI for any cross-country comparative purpose. The comparison excludes the large World Incomes Inequality Database (WIID) of UNU-WIDER and the Standardized World Income Inequality Database (SWIID) of Frederick Solt; the former is a bibliographic collection and the latter is based on imputations drawn, in part, from EHII and the other sources used here.

Details

Income Inequality Around the World
Type: Book
ISBN: 978-1-78560-943-5

Keywords

Book part
Publication date: 7 May 2019

Stuti Saxena

Increasingly, Open Government Data (OGD), a philosophy and set of policies, gains on momentum today. Believed to promote transparency, accountability and value creation by making…

Abstract

Increasingly, Open Government Data (OGD), a philosophy and set of policies, gains on momentum today. Believed to promote transparency, accountability and value creation by making government data available to all (OECD, 2018), OGD constitutes a yet another field in which the interlocking relation between technological advances and politics can be studied. Using the national OGD portal of the Kingdom of Saudi Arabia (http://www.data.gov.sa/en) as a case study, this evaluates the portal to underline the significance of maintaining the quality of the data sets published online. The usability framework (Machova, Hub, & Lnenicka 2018) constitutes the framework for evaluation of the OGD portal. The findings suggest that there are many drivers to re-use the data sets published via the portal. At the same time, however, there are barriers to re-use the data sets on account of the non-publication of updated data sets. Implicitly, quality of the data sets should be improved. More involvement of the government agencies is required for contributing toward the data sets. Also, user involvement should be promoted by encouraging them to contribute to the data sets and lending recommendations for the improvisation of the data sets published via the portal.

Details

Politics and Technology in the Post-Truth Era
Type: Book
ISBN: 978-1-78756-984-3

Keywords

Article
Publication date: 9 November 2023

Gustavo Candela, Nele Gabriëls, Sally Chambers, Milena Dobreva, Sarah Ames, Meghan Ferriter, Neil Fitzgerald, Victor Harbo, Katrine Hofmann, Olga Holownia, Alba Irollo, Mahendra Mahey, Eileen Manchester, Thuy-An Pham, Abigail Potter and Ellen Van Keer

The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part…

Abstract

Purpose

The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part of the collections as data movement, suitable for computational use.

Design/methodology/approach

The checklist was built by synthesising and analysing the results of relevant research literature, articles and studies and the issues and needs obtained in an observational study. The checklist was tested and applied both as a tool for assessing a selection of digital collections made available by galleries, libraries, archives and museums (GLAM) institutions as proof of concept and as a supporting tool for creating collections as data.

Findings

Over the past few years, there has been a growing interest in making available digital collections published by GLAM organisations for computational use. Based on previous work, the authors defined a methodology to build a checklist for the publication of Collections as data. The authors’ evaluation showed several examples of applications that can be useful to encourage other institutions to publish their digital collections for computational use.

Originality/value

While some work on making available digital collections suitable for computational use exists, giving particular attention to data quality, planning and experimentation, to the best of the authors’ knowledge, none of the work to date provides an easy-to-follow and robust checklist to publish collection data sets in GLAM institutions. This checklist intends to encourage small- and medium-sized institutions to adopt the collection as data principles in daily workflows following best practices and guidelines.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9342

Keywords

Open Access
Article
Publication date: 28 February 2023

Luca Rampini and Fulvio Re Cecconi

This study aims to introduce a new methodology for generating synthetic images for facility management purposes. The method starts by leveraging the existing 3D open-source BIM…

1007

Abstract

Purpose

This study aims to introduce a new methodology for generating synthetic images for facility management purposes. The method starts by leveraging the existing 3D open-source BIM models and using them inside a graphic engine to produce a photorealistic representation of indoor spaces enriched with facility-related objects. The virtual environment creates several images by changing lighting conditions, camera poses or material. Moreover, the created images are labeled and ready to be trained in the model.

Design/methodology/approach

This paper focuses on the challenges characterizing object detection models to enrich digital twins with facility management-related information. The automatic detection of small objects, such as sockets, power plugs, etc., requires big, labeled data sets that are costly and time-consuming to create. This study proposes a solution based on existing 3D BIM models to produce quick and automatically labeled synthetic images.

Findings

The paper presents a conceptual model for creating synthetic images to increase the performance in training object detection models for facility management. The results show that virtually generated images, rather than an alternative to real images, are a powerful tool for integrating existing data sets. In other words, while a base of real images is still needed, introducing synthetic images helps augment the model’s performance and robustness in covering different types of objects.

Originality/value

This study introduced the first pipeline for creating synthetic images for facility management. Moreover, this paper validates this pipeline by proposing a case study where the performance of object detection models trained on real data or a combination of real and synthetic images are compared.

Details

Construction Innovation , vol. 24 no. 1
Type: Research Article
ISSN: 1471-4175

Keywords

Article
Publication date: 12 June 2020

Sandeepkumar Hegde and Monica R. Mundada

According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of…

Abstract

Purpose

According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of disease with a rate of 60%. These diseases persist for a longer duration of time, which are almost incurable and can only be controlled. Cardiovascular disease, chronic kidney disease (CKD) and diabetes mellitus are considered as three major chronic diseases that will increase the risk among the adults, as they get older. CKD is considered a major disease among all these chronic diseases, which will increase the risk among the adults as they get older. Overall 10% of the population of the world is affected by CKD and it is likely to double in the year 2030. The paper aims to propose novel feature selection approach in combination with the machine-learning algorithm which can early predict the chronic disease with utmost accuracy. Hence, a novel feature selection adaptive probabilistic divergence-based feature selection (APDFS) algorithm is proposed in combination with the hyper-parameterized logistic regression model (HLRM) for the early prediction of chronic disease.

Design/methodology/approach

A novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals in India. The HLRM is used as a machine-learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results compared to the existing work in most of the cases.

Findings

The performance of the proposed framework is validated by using the metric such as recall, precision, F1 measure and ROC. The predictive performance of the proposed framework is analyzed by passing the data set belongs to various chronic disease such as CKD, diabetes and heart disease. The diagnostic ability of the proposed approach is demonstrated by comparing its result with existing algorithms. The experimental figures illustrated that the proposed framework performed exceptionally well in prior prediction of CKD disease with an accuracy of 91.6.

Originality/value

The capability of the machine learning algorithms depends on feature selection (FS) algorithms in identifying the relevant traits from the data set, which impact the predictive result. It is considered as a process of choosing the relevant features from the data set by removing redundant and irrelevant features. Although there are many approaches that have been already proposed toward this objective, they are computationally complex because of the strategy of following a one-step scheme in selecting the features. In this paper, a novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The proposed algorithm handles the process of feature selection in two separate indices. Hence, the computational complexity of the algorithm is reduced to O(nk+1). The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals of karkala taluk ,India. The HLRM is used as a machine learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results are compared to the existing work in most of the cases.

Details

International Journal of Pervasive Computing and Communications, vol. 17 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 13 December 2019

Yang Li and Xuhua Hu

The purpose of this paper is to solve the problem of information privacy and security of social users. Mobile internet and social network are more and more deeply integrated into…

Abstract

Purpose

The purpose of this paper is to solve the problem of information privacy and security of social users. Mobile internet and social network are more and more deeply integrated into people’s daily life, especially under the interaction of the fierce development momentum of the Internet of Things and diversified personalized services, more and more private information of social users is exposed to the network environment actively or unintentionally. In addition, a large amount of social network data not only brings more benefits to network application providers, but also provides motivation for malicious attackers. Therefore, under the social network environment, the research on the privacy protection of user information has great theoretical and practical significance.

Design/methodology/approach

In this study, based on the social network analysis, combined with the attribute reduction idea of rough set theory, the generalized reduction concept based on multi-level rough set from the perspectives of positive region, information entropy and knowledge granularity of rough set theory were proposed. Furthermore, it was traversed on the basis of the hierarchical compatible granularity space of the original information system and the corresponding attribute values are coarsened. The selected test data sets were tested, and the experimental results were analyzed.

Findings

The results showed that the algorithm can guarantee the anonymity requirement of data publishing and improve the effect of classification modeling on anonymous data in social network environment.

Research limitations/implications

In the test and verification of privacy protection algorithm and privacy protection scheme, the efficiency of algorithm and scheme needs to be tested on a larger data scale. However, the data in this study are not enough. In the following research, more data will be used for testing and verification.

Practical implications

In the context of social network, the hierarchical structure of data is introduced into rough set theory as domain knowledge by referring to human granulation cognitive mechanism, and rough set modeling for complex hierarchical data is studied for hierarchical data of decision table. The theoretical research results are applied to hierarchical decision rule mining and k-anonymous privacy protection data mining research, which enriches the connotation of rough set theory and has important theoretical and practical significance for further promoting the application of this theory. In addition, combined the theory of secure multi-party computing and the theory of attribute reduction in rough set, a privacy protection feature selection algorithm for multi-source decision table is proposed, which solves the privacy protection problem of feature selection in distributed environment. It provides a set of effective rough set feature selection method for privacy protection classification mining in distributed environment, which has practical application value for promoting the development of privacy protection data mining.

Originality/value

In this study, the proposed algorithm and scheme can effectively protect the privacy of social network data, ensure the availability of social network graph structure and realize the need of both protection and sharing of user attributes and relational data.

Details

Library Hi Tech, vol. 40 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

1 – 10 of over 262000