Search results
1 – 10 of over 262000Rokas Jurevičius and Virginijus Marcinkevičius
The purpose of this paper is to present a new data set of aerial imagery from robotics simulator (AIR). AIR data set aims to provide a starting point for localization system…
Abstract
Purpose
The purpose of this paper is to present a new data set of aerial imagery from robotics simulator (AIR). AIR data set aims to provide a starting point for localization system development and to become a typical benchmark for accuracy comparison of map-based localization algorithms, visual odometry and SLAM for high-altitude flights.
Design/methodology/approach
The presented data set contains over 100,000 aerial images captured from Gazebo robotics simulator using orthophoto maps as a ground plane. Flights with three different trajectories are performed on maps from urban and forest environment at different altitudes, totaling over 33 kilometers of flight distance.
Findings
The review of previous research studies show that the presented data set is the largest currently available public data set with downward facing camera imagery.
Originality/value
This paper presents the problem of missing publicly available data sets for high-altitude (100‒3,000 meters) UAV flights; the current state-of-the-art research studies performed to develop map-based localization system for UAVs depend on real-life test flights and custom-simulated data sets for accuracy evaluation of the algorithms. The presented new data set solves this problem and aims to help the researchers to improve and benchmark new algorithms for high-altitude flights.
Details
Keywords
Sajad Saeedi, Carl Thibault, Michael Trentini and Howard Li
The purpose of this paper is to present a localization and mapping data set acquired by a fixed-wing unmanned aerial vehicle (UAV). The data set was collected for educational and…
Abstract
Purpose
The purpose of this paper is to present a localization and mapping data set acquired by a fixed-wing unmanned aerial vehicle (UAV). The data set was collected for educational and research purposes: to save time in dealing with hardware and to compare the results with a benchmark data set. The data were collected in standard Robot Operating System (ROS) format. The environment, fixed-wing, and sensor configuration are explained in detail. GPS coordinates of the fixed-wing are also available as ground truth. The data set is available for download (www.ece.unb.ca/COBRA/open_source.htm).
Design/methodology/approach
The data were collected in standard ROS format. The environment, fixed-wing, and sensor configuration are explained in detail.
Findings
The data set can be used for target localization and mapping. The data were collected to assist algorithm developments and help researchers to compare their results. Robotic data sets are specifically important when they are related to unmanned systems such as fixed-wing aircraft.
Originality/value
The Robotics Data Set Repository (RADISH) by A. Howard and N. Roy hosts 41 well-known data sets with different sensors; however, there is no fixed-wing data set in RADISH. This work presents two data sets collected by a fixed-wing aircraft using ROS standards. The data sets can be used for target localization and SLAM.
Details
Keywords
Pedro Carreira and Carlos Gomes da Silva
The purpose of this paper is to propose a methodology to estimate the number of records that were omitted from a data set, and to assess its effectiveness.
Abstract
Purpose
The purpose of this paper is to propose a methodology to estimate the number of records that were omitted from a data set, and to assess its effectiveness.
Design/methodology/approach
The procedure to estimate the number of records that were omitted from a data set is based on Benford’s law. Empirical experiments are performed to illustrate the application of the procedure. In detail, two simulated Benford-conforming data sets are distorted and the procedure is then used to recover the original patterns of the data sets.
Findings
The effectiveness of the procedure seems to increase with the degree of conformity of the original data set with Benford’s law.
Practical implications
This work can be useful in auditing and economic crime detection, namely in identifying tax evasion.
Originality/value
This work is the first to propose Benford’s law as a tool to detect data evasion.
Details
Keywords
Victoria Delaney and Victor R. Lee
With increased focus on data literacy and data science education in K-12, little is known about what makes a data set preferable for use by classroom teachers. Given that…
Abstract
Purpose
With increased focus on data literacy and data science education in K-12, little is known about what makes a data set preferable for use by classroom teachers. Given that educational designers often privilege authenticity, the purpose of this study is to examine how teachers use features of data sets to determine their suitability for authentic data science learning experiences with their students.
Design/methodology/approach
Interviews with 12 practicing high school mathematics and statistics teachers were conducted and video-recorded. Teachers were given two different data sets about the same context and asked to explain which one would be better suited for an authentic data science experience. Following knowledge analysis methods, the teachers’ responses were coded and iteratively reviewed to find themes that appeared across multiple teachers related to their aesthetic judgments.
Findings
Three aspects of authenticity for data sets for this task were identified. These include thinking of authentic data sets as being “messy,” as requiring more work for the student or analyst to pore through than other data sets and as involving computation.
Originality/value
Analysis of teachers’ aesthetics of data sets is a new direction for work on data literacy and data science education. The findings invite the field to think critically about how to help teachers develop new aesthetics and to provide data sets in curriculum materials that are suited for classroom use.
Details
Keywords
James K. Galbraith, Jaehee Choi, Béatrice Halbach, Aleksandra Malinowska and Wenjie Zhang
We present a comparison of coverage and values for five inequality data sets that have worldwide or major international coverage and independent measurements that are intended to…
Abstract
We present a comparison of coverage and values for five inequality data sets that have worldwide or major international coverage and independent measurements that are intended to present consistent coefficients that can be compared directly across countries and time. The comparison data sets are those published by the Luxembourg Income Studies (LIS), the OECD, the European Union’s Statistics on Incomes and Living Conditions (EU-SILC), and the World Bank’s World Development Indicators (WDI). The baseline comparison is with our own Estimated Household Income Inequality (EHII) data set of the University of Texas Inequality Project. The comparison shows the historical depth and range of EHII and its broad compatibility with LIS, OECD, and EU-SILC, as well as problems with using the WDI for any cross-country comparative purpose. The comparison excludes the large World Incomes Inequality Database (WIID) of UNU-WIDER and the Standardized World Income Inequality Database (SWIID) of Frederick Solt; the former is a bibliographic collection and the latter is based on imputations drawn, in part, from EHII and the other sources used here.
Details
Keywords
Increasingly, Open Government Data (OGD), a philosophy and set of policies, gains on momentum today. Believed to promote transparency, accountability and value creation by making…
Abstract
Increasingly, Open Government Data (OGD), a philosophy and set of policies, gains on momentum today. Believed to promote transparency, accountability and value creation by making government data available to all (OECD, 2018), OGD constitutes a yet another field in which the interlocking relation between technological advances and politics can be studied. Using the national OGD portal of the Kingdom of Saudi Arabia (http://www.data.gov.sa/en) as a case study, this evaluates the portal to underline the significance of maintaining the quality of the data sets published online. The usability framework (Machova, Hub, & Lnenicka 2018) constitutes the framework for evaluation of the OGD portal. The findings suggest that there are many drivers to re-use the data sets published via the portal. At the same time, however, there are barriers to re-use the data sets on account of the non-publication of updated data sets. Implicitly, quality of the data sets should be improved. More involvement of the government agencies is required for contributing toward the data sets. Also, user involvement should be promoted by encouraging them to contribute to the data sets and lending recommendations for the improvisation of the data sets published via the portal.
Details
Keywords
Gustavo Candela, Nele Gabriëls, Sally Chambers, Milena Dobreva, Sarah Ames, Meghan Ferriter, Neil Fitzgerald, Victor Harbo, Katrine Hofmann, Olga Holownia, Alba Irollo, Mahendra Mahey, Eileen Manchester, Thuy-An Pham, Abigail Potter and Ellen Van Keer
The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part…
Abstract
Purpose
The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part of the collections as data movement, suitable for computational use.
Design/methodology/approach
The checklist was built by synthesising and analysing the results of relevant research literature, articles and studies and the issues and needs obtained in an observational study. The checklist was tested and applied both as a tool for assessing a selection of digital collections made available by galleries, libraries, archives and museums (GLAM) institutions as proof of concept and as a supporting tool for creating collections as data.
Findings
Over the past few years, there has been a growing interest in making available digital collections published by GLAM organisations for computational use. Based on previous work, the authors defined a methodology to build a checklist for the publication of Collections as data. The authors’ evaluation showed several examples of applications that can be useful to encourage other institutions to publish their digital collections for computational use.
Originality/value
While some work on making available digital collections suitable for computational use exists, giving particular attention to data quality, planning and experimentation, to the best of the authors’ knowledge, none of the work to date provides an easy-to-follow and robust checklist to publish collection data sets in GLAM institutions. This checklist intends to encourage small- and medium-sized institutions to adopt the collection as data principles in daily workflows following best practices and guidelines.
Details
Keywords
Luca Rampini and Fulvio Re Cecconi
This study aims to introduce a new methodology for generating synthetic images for facility management purposes. The method starts by leveraging the existing 3D open-source BIM…
Abstract
Purpose
This study aims to introduce a new methodology for generating synthetic images for facility management purposes. The method starts by leveraging the existing 3D open-source BIM models and using them inside a graphic engine to produce a photorealistic representation of indoor spaces enriched with facility-related objects. The virtual environment creates several images by changing lighting conditions, camera poses or material. Moreover, the created images are labeled and ready to be trained in the model.
Design/methodology/approach
This paper focuses on the challenges characterizing object detection models to enrich digital twins with facility management-related information. The automatic detection of small objects, such as sockets, power plugs, etc., requires big, labeled data sets that are costly and time-consuming to create. This study proposes a solution based on existing 3D BIM models to produce quick and automatically labeled synthetic images.
Findings
The paper presents a conceptual model for creating synthetic images to increase the performance in training object detection models for facility management. The results show that virtually generated images, rather than an alternative to real images, are a powerful tool for integrating existing data sets. In other words, while a base of real images is still needed, introducing synthetic images helps augment the model’s performance and robustness in covering different types of objects.
Originality/value
This study introduced the first pipeline for creating synthetic images for facility management. Moreover, this paper validates this pipeline by proposing a case study where the performance of object detection models trained on real data or a combination of real and synthetic images are compared.
Details
Keywords
Sandeepkumar Hegde and Monica R. Mundada
According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of…
Abstract
Purpose
According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of disease with a rate of 60%. These diseases persist for a longer duration of time, which are almost incurable and can only be controlled. Cardiovascular disease, chronic kidney disease (CKD) and diabetes mellitus are considered as three major chronic diseases that will increase the risk among the adults, as they get older. CKD is considered a major disease among all these chronic diseases, which will increase the risk among the adults as they get older. Overall 10% of the population of the world is affected by CKD and it is likely to double in the year 2030. The paper aims to propose novel feature selection approach in combination with the machine-learning algorithm which can early predict the chronic disease with utmost accuracy. Hence, a novel feature selection adaptive probabilistic divergence-based feature selection (APDFS) algorithm is proposed in combination with the hyper-parameterized logistic regression model (HLRM) for the early prediction of chronic disease.
Design/methodology/approach
A novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals in India. The HLRM is used as a machine-learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results compared to the existing work in most of the cases.
Findings
The performance of the proposed framework is validated by using the metric such as recall, precision, F1 measure and ROC. The predictive performance of the proposed framework is analyzed by passing the data set belongs to various chronic disease such as CKD, diabetes and heart disease. The diagnostic ability of the proposed approach is demonstrated by comparing its result with existing algorithms. The experimental figures illustrated that the proposed framework performed exceptionally well in prior prediction of CKD disease with an accuracy of 91.6.
Originality/value
The capability of the machine learning algorithms depends on feature selection (FS) algorithms in identifying the relevant traits from the data set, which impact the predictive result. It is considered as a process of choosing the relevant features from the data set by removing redundant and irrelevant features. Although there are many approaches that have been already proposed toward this objective, they are computationally complex because of the strategy of following a one-step scheme in selecting the features. In this paper, a novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The proposed algorithm handles the process of feature selection in two separate indices. Hence, the computational complexity of the algorithm is reduced to O(nk+1). The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals of karkala taluk ,India. The HLRM is used as a machine learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results are compared to the existing work in most of the cases.
Details
Keywords
The purpose of this paper is to solve the problem of information privacy and security of social users. Mobile internet and social network are more and more deeply integrated into…
Abstract
Purpose
The purpose of this paper is to solve the problem of information privacy and security of social users. Mobile internet and social network are more and more deeply integrated into people’s daily life, especially under the interaction of the fierce development momentum of the Internet of Things and diversified personalized services, more and more private information of social users is exposed to the network environment actively or unintentionally. In addition, a large amount of social network data not only brings more benefits to network application providers, but also provides motivation for malicious attackers. Therefore, under the social network environment, the research on the privacy protection of user information has great theoretical and practical significance.
Design/methodology/approach
In this study, based on the social network analysis, combined with the attribute reduction idea of rough set theory, the generalized reduction concept based on multi-level rough set from the perspectives of positive region, information entropy and knowledge granularity of rough set theory were proposed. Furthermore, it was traversed on the basis of the hierarchical compatible granularity space of the original information system and the corresponding attribute values are coarsened. The selected test data sets were tested, and the experimental results were analyzed.
Findings
The results showed that the algorithm can guarantee the anonymity requirement of data publishing and improve the effect of classification modeling on anonymous data in social network environment.
Research limitations/implications
In the test and verification of privacy protection algorithm and privacy protection scheme, the efficiency of algorithm and scheme needs to be tested on a larger data scale. However, the data in this study are not enough. In the following research, more data will be used for testing and verification.
Practical implications
In the context of social network, the hierarchical structure of data is introduced into rough set theory as domain knowledge by referring to human granulation cognitive mechanism, and rough set modeling for complex hierarchical data is studied for hierarchical data of decision table. The theoretical research results are applied to hierarchical decision rule mining and k-anonymous privacy protection data mining research, which enriches the connotation of rough set theory and has important theoretical and practical significance for further promoting the application of this theory. In addition, combined the theory of secure multi-party computing and the theory of attribute reduction in rough set, a privacy protection feature selection algorithm for multi-source decision table is proposed, which solves the privacy protection problem of feature selection in distributed environment. It provides a set of effective rough set feature selection method for privacy protection classification mining in distributed environment, which has practical application value for promoting the development of privacy protection data mining.
Originality/value
In this study, the proposed algorithm and scheme can effectively protect the privacy of social network data, ensure the availability of social network graph structure and realize the need of both protection and sharing of user attributes and relational data.
Details