Search results

1 – 10 of 12
Article
Publication date: 24 August 2021

K. Sujatha and V. Udayarani

The purpose of this paper is to improve the privacy in healthcare datasets that hold sensitive information. Putting a stop to privacy divulgence and bestowing relevant information…

Abstract

Purpose

The purpose of this paper is to improve the privacy in healthcare datasets that hold sensitive information. Putting a stop to privacy divulgence and bestowing relevant information to legitimate users are at the same time said to be of differing goals. Also, the swift evolution of big data has put forward considerable ease to all chores of life. As far as the big data era is concerned, propagation and information sharing are said to be the two main facets. Despite several research works performed on these aspects, with the incremental nature of data, the likelihood of privacy leakage is also substantially expanded through various benefits availed of big data. Hence, safeguarding data privacy in a complicated environment has become a major setback.

Design/methodology/approach

In this study, a method called deep restricted additive homomorphic ElGamal privacy preservation (DR-AHEPP) to preserve the privacy of data even in case of incremental data is proposed. An entropy-based differential privacy quasi identification and DR-AHEPP algorithms are designed, respectively, for obtaining privacy-preserved minimum falsified quasi-identifier set and computationally efficient privacy-preserved data.

Findings

Analysis results using Diabetes 130-US hospitals illustrate that the proposed DR-AHEPP method is more significant in preserving privacy on incremental data than existing methods. A comparative analysis of state-of-the-art works with the objective to minimize information loss, false positive rate and execution time with higher accuracy is calibrated.

Originality/value

The paper provides better performance using Diabetes 130-US hospitals for achieving high accuracy, low information loss and false positive rate. The result illustrates that the proposed method increases the accuracy by 4% and reduces the false positive rate and information loss by 25 and 35%, respectively, as compared to state-of-the-art works.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 15 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 13 December 2019

Yang Li and Xuhua Hu

The purpose of this paper is to solve the problem of information privacy and security of social users. Mobile internet and social network are more and more deeply integrated into…

Abstract

Purpose

The purpose of this paper is to solve the problem of information privacy and security of social users. Mobile internet and social network are more and more deeply integrated into people’s daily life, especially under the interaction of the fierce development momentum of the Internet of Things and diversified personalized services, more and more private information of social users is exposed to the network environment actively or unintentionally. In addition, a large amount of social network data not only brings more benefits to network application providers, but also provides motivation for malicious attackers. Therefore, under the social network environment, the research on the privacy protection of user information has great theoretical and practical significance.

Design/methodology/approach

In this study, based on the social network analysis, combined with the attribute reduction idea of rough set theory, the generalized reduction concept based on multi-level rough set from the perspectives of positive region, information entropy and knowledge granularity of rough set theory were proposed. Furthermore, it was traversed on the basis of the hierarchical compatible granularity space of the original information system and the corresponding attribute values are coarsened. The selected test data sets were tested, and the experimental results were analyzed.

Findings

The results showed that the algorithm can guarantee the anonymity requirement of data publishing and improve the effect of classification modeling on anonymous data in social network environment.

Research limitations/implications

In the test and verification of privacy protection algorithm and privacy protection scheme, the efficiency of algorithm and scheme needs to be tested on a larger data scale. However, the data in this study are not enough. In the following research, more data will be used for testing and verification.

Practical implications

In the context of social network, the hierarchical structure of data is introduced into rough set theory as domain knowledge by referring to human granulation cognitive mechanism, and rough set modeling for complex hierarchical data is studied for hierarchical data of decision table. The theoretical research results are applied to hierarchical decision rule mining and k-anonymous privacy protection data mining research, which enriches the connotation of rough set theory and has important theoretical and practical significance for further promoting the application of this theory. In addition, combined the theory of secure multi-party computing and the theory of attribute reduction in rough set, a privacy protection feature selection algorithm for multi-source decision table is proposed, which solves the privacy protection problem of feature selection in distributed environment. It provides a set of effective rough set feature selection method for privacy protection classification mining in distributed environment, which has practical application value for promoting the development of privacy protection data mining.

Originality/value

In this study, the proposed algorithm and scheme can effectively protect the privacy of social network data, ensure the availability of social network graph structure and realize the need of both protection and sharing of user attributes and relational data.

Details

Library Hi Tech, vol. 40 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 18 October 2023

Langdon Holmes, Scott Crossley, Harshvardhan Sikka and Wesley Morris

This study aims to report on an automatic deidentification system for labeling and obfuscating personally identifiable information (PII) in student-generated text.

Abstract

Purpose

This study aims to report on an automatic deidentification system for labeling and obfuscating personally identifiable information (PII) in student-generated text.

Design/methodology/approach

The authors evaluate the performance of their deidentification system on two data sets of student-generated text. Each data set was human-annotated for PII. The authors evaluate using two approaches: per-token PII classification accuracy and a simulated reidentification attack design. In the reidentification attack, two reviewers attempted to recover student identities from the data after PII was obfuscated by the authors’ system. In both cases, results are reported in terms of recall and precision.

Findings

The authors’ deidentification system recalled 84% of student name tokens in their first data set (96% of full names). On the second data set, it achieved a recall of 74% for student name tokens (91% of full names) and 75% for all direct identifiers. After the second data set was obfuscated by the authors’ system, two reviewers attempted to recover the identities of students from the obfuscated data. They performed below chance, indicating that the obfuscated data presents a low identity disclosure risk.

Research limitations/implications

The two data sets used in this study are not representative of all forms of student-generated text, so further work is needed to evaluate performance on more data.

Practical implications

This paper presents an open-source and automatic deidentification system appropriate for student-generated text with technical explanations and evaluations of performance.

Originality/value

Previous study on text deidentification has shown success in the medical domain. This paper develops on these approaches and applies them to text in the educational domain.

Details

Information and Learning Sciences, vol. 124 no. 9/10
Type: Research Article
ISSN: 2398-5348

Keywords

Article
Publication date: 19 May 2020

Praveen Kumar Gopagoni and Mohan Rao S K

Association rule mining generates the patterns and correlations from the database, which requires large scanning time, and the cost of computation associated with the generation…

Abstract

Purpose

Association rule mining generates the patterns and correlations from the database, which requires large scanning time, and the cost of computation associated with the generation of the rules is quite high. On the other hand, the candidate rules generated using the traditional association rules mining face a huge challenge in terms of time and space, and the process is lengthy. In order to tackle the issues of the existing methods and to render the privacy rules, the paper proposes the grid-based privacy association rule mining.

Design/methodology/approach

The primary intention of the research is to design and develop a distributed elephant herding optimization (EHO) for grid-based privacy association rule mining from the database. The proposed method of rule generation is processed as two steps: in the first step, the rules are generated using apriori algorithm, which is the effective association rule mining algorithm. In general, the extraction of the association rules from the input database is based on confidence and support that is replaced with new terms, such as probability-based confidence and holo-entropy. Thus, in the proposed model, the extraction of the association rules is based on probability-based confidence and holo-entropy. In the second step, the generated rules are given to the grid-based privacy rule mining, which produces privacy-dependent rules based on a novel optimization algorithm and grid-based fitness. The novel optimization algorithm is developed by integrating the distributed concept in EHO algorithm.

Findings

The experimentation of the method using the databases taken from the Frequent Itemset Mining Dataset Repository to prove the effectiveness of the distributed grid-based privacy association rule mining includes the retail, chess, T10I4D100K and T40I10D100K databases. The proposed method outperformed the existing methods through offering a higher degree of privacy and utility, and moreover, it is noted that the distributed nature of the association rule mining facilitates the parallel processing and generates the privacy rules without much computational burden. The rate of hiding capacity, the rate of information preservation and rate of the false rules generated for the proposed method are found to be 0.4468, 0.4488 and 0.0654, respectively, which is better compared with the existing rule mining methods.

Originality/value

Data mining is performed in a distributed manner through the grids that subdivide the input data, and the rules are framed using the apriori-based association mining, which is the modification of the standard apriori with the holo-entropy and probability-based confidence replacing the support and confidence in the standard apriori algorithm. The mined rules do not assure the privacy, and hence, the grid-based privacy rules are employed that utilize the adaptive elephant herding optimization (AEHO) for generating the privacy rules. The AEHO inherits the adaptive nature in the standard EHO, which renders the global optimal solution.

Details

Data Technologies and Applications, vol. 54 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 30 July 2021

Tanvi Garg, Navid Kagalwalla, Shubha Puthran, Prathamesh Churi and Ambika Pawar

This paper aims to design a secure and seamless system that ensures quick sharing of health-care data to improve the privacy of sensitive health-care data, the efficiency of…

Abstract

Purpose

This paper aims to design a secure and seamless system that ensures quick sharing of health-care data to improve the privacy of sensitive health-care data, the efficiency of health-care infrastructure, effective treatment given to patients and encourage the development of new health-care technologies by researchers. These objectives are achieved through the proposed system, a “privacy-aware data tagging system using role-based access control for health-care data.”

Design/methodology/approach

Health-care data must be stored and shared in such a manner that the privacy of the patient is maintained. The method proposed, uses data tags to classify health-care data into various color codes which signify the sensitivity of data. It makes use of the ARX tool to anonymize raw health-care data and uses role-based access control as a means of ensuring only authenticated persons can access the data.

Findings

The system integrates the tagging and anonymizing of health-care data coupled with robust access control policies into one architecture. The paper discusses the proposed architecture, describes the algorithm used to tag health-care data, analyzes the metrics of the anonymized data against various attacks and devises a mathematical model for role-based access control.

Originality/value

The paper integrates three disparate topics – data tagging, anonymization and role-based access policies into one seamless architecture. Codifying health-care data into different tags based on International Classification of Diseases 10th Revision (ICD-10) codes and applying varying levels of anonymization for each data tag along with role-based access policies is unique to the system and also ensures the usability of data for research.

Details

World Journal of Engineering, vol. 20 no. 1
Type: Research Article
ISSN: 1708-5284

Keywords

Book part
Publication date: 7 May 2019

Francesco Ciclosi, Paolo Ceravolo, Ernesto Damiani and Donato De Ieso

This chapter analyzes the compliance of some category of Open Data in Politics with EU General Data Protection Regulation (GDPR) requirements. After clarifying the legal basis of…

Abstract

This chapter analyzes the compliance of some category of Open Data in Politics with EU General Data Protection Regulation (GDPR) requirements. After clarifying the legal basis of this framework, with specific attention to the processing procedures that conform to the legitimate interests pursued by the data controller, including open data licenses or anonymization techniques, that can result in partial application of the GDPR, but there is no generic guarantee, and, as a consequence, an appropriate process of analysis and management of risks is required.

Details

Politics and Technology in the Post-Truth Era
Type: Book
ISBN: 978-1-78756-984-3

Keywords

Article
Publication date: 22 April 2022

Sreedhar Jyothi and Geetanjali Nelloru

Patients having ventricular arrhythmias and atrial fibrillation, that are early markers of stroke and sudden cardiac death, as well as benign subjects are all studied using the…

Abstract

Purpose

Patients having ventricular arrhythmias and atrial fibrillation, that are early markers of stroke and sudden cardiac death, as well as benign subjects are all studied using the electrocardiogram (ECG). In order to identify cardiac anomalies, ECG signals analyse the heart's electrical activity and show output in the form of waveforms. Patients with these disorders must be identified as soon as possible. ECG signals can be difficult, time-consuming and subject to inter-observer variability when inspected manually.

Design/methodology/approach

There are various forms of arrhythmias that are difficult to distinguish in complicated non-linear ECG data. It may be beneficial to use computer-aided decision support systems (CAD). It is possible to classify arrhythmias in a rapid, accurate, repeatable and objective manner using the CAD, which use machine learning algorithms to identify the tiny changes in cardiac rhythms. Cardiac infractions can be classified and detected using this method. The authors want to categorize the arrhythmia with better accurate findings in even less computational time as the primary objective. Using signal and axis characteristics and their association n-grams as features, this paper makes a significant addition to the field. Using a benchmark dataset as input to multi-label multi-fold cross-validation, an experimental investigation was conducted.

Findings

This dataset was used as input for cross-validation on contemporary models and the resulting cross-validation metrics have been weighed against the performance metrics of other contemporary models. There have been few false alarms with the suggested model's high sensitivity and specificity.

Originality/value

The results of cross validation are significant. In terms of specificity, sensitivity, and decision accuracy, the proposed model outperforms other contemporary models.

Details

International Journal of Intelligent Unmanned Systems, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2049-6427

Keywords

Article
Publication date: 25 November 2013

Khanh Tran Dang, Nhan Trong Phan and Nam Chan Ngo

The paper aims to resolve three major issues in location-based applications (LBA) known as heterogeneity, user privacy, and context-awareness by proposing an elastic and open…

Abstract

Purpose

The paper aims to resolve three major issues in location-based applications (LBA) known as heterogeneity, user privacy, and context-awareness by proposing an elastic and open design platform named OpenLS privacy-aware middleware (OPM) for LBA.

Design/methodology/approach

The paper analyzes relevant approaches ranging from both academia and mobile industry community and insists the importance of heterogeneity, user privacy, and context-awareness towards the development of LBA.

Findings

The paper proposes the OPM by design. As a result, the OPM consists of two main component named application middleware and location middleware, which are cooperatively functioned to achieve the above goals. In addition, the paper has given the implementation of the OPM as well as its experiments. It is noted that two privacy-preserving techniques at two different levels are integrated into the OPM, including Memorizing algorithm at the application level and Bob-tree at the database level. Last but not least, the paper shows further discussion about other problems and improvements that might be needed for the OPM.

Research limitations/implications

Each issue has its sub problems that cause more influences to the OPM. Besides, each of the issues requires more investigations in depth in order to have better solutions in detail. Therefore, more overall experiments should be conducted to assure the OPM's scalability and effectiveness.

Practical implications

The paper hopefully promotes and speeds up the development of LBA when providing the OPM with suitable application programming interfaces and conforming the OpenLS standard.

Originality/value

This paper shows its originality towards location-based service (LBS) providers to develop their applications and proposes the OPM as a unified solution dealing with heterogeneity, user privacy, and context-awareness in the world of LBS.

Details

International Journal of Pervasive Computing and Communications, vol. 9 no. 4
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 19 July 2022

Faraja Ndumbaro

Users' search logs are implicit feedbacks on how searchers interact with online information retrieval (IR) systems. The purpose of this paper is to analyze search query…

Abstract

Purpose

Users' search logs are implicit feedbacks on how searchers interact with online information retrieval (IR) systems. The purpose of this paper is to analyze search query reformulation (SQR) patterns of University of Dar es Salaam remote OPAC users.

Design/methodology/approach

Qualitative and quantitative analysis of transaction logs were employed to ascertain the characteristics of search queries and the patterns in which remote OPAC users reformulate their search queries. The study covered a period of six months, commencing from January to June 2019.

Findings

A total of 30,474 search hits were submitted by remote OPAC users during the period under study. Individuals from academic and research institutions, computing consortia, and telecommunication companies are the main users of the system. Most of the searches originated from North America and Europe, with few searches coming from China and India. Besides improving search results, SQRs are linked with the existence of multiple information demands as manifested by the use of heterogeneous headwords within individual search episodes.

Research limitations/implications

Data collected covered only six months. Similarly, it was however not possible to analyze users' search query formulation within specific contexts such as task-based information searching.

Practical implications

A query recommendation system should be integrated into the OPAC functionalities to improve users' search experiences. Alternatively, there should be a migration to a new system that offers more advanced search features and functionalities.

Originality/value

The study has contributed new insights in SQR studies particularly on how non-institutional affiliated users translate their information needs into search queries during information searching processes.

Peer review

The peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-09-2020-0389

Details

Online Information Review, vol. 47 no. 1
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 3 June 2014

Rob Heyman, Ralf De Wolf and Jo Pierson

The purpose of this paper is to define two types of privacy, which are distinct but often reduced to each other. It also investigates which form of privacy is most prominent in…

5383

Abstract

Purpose

The purpose of this paper is to define two types of privacy, which are distinct but often reduced to each other. It also investigates which form of privacy is most prominent in privacy settings of online social networks (OSN). Privacy between users is different from privacy between a user and a third party. OSN, and to a lesser extent researchers, often reduce the former to the latter, which results in misleading users and public debate about privacy.

Design/methodology/approach

The authors define two types of privacy that account for the difference between interpersonal and third-party disclosure. The first definition draws on symbolic interactionist accounts of privacy, wherein users are performing dramaturgically for an intended audience. Third-party privacy is based on the data that represent the user in data mining and knowledge discovery processes, which ultimately manipulate users into audience commodities. This typology was applied to the privacy settings of Facebook, LinkedIn and Twitter. The results are presented as a flowchart.

Findings

The research indicates that users are granted more options in controlling their interpersonal information flow towards other users than third parties or service providers.

Research limitations/implications

This distinction needs to be furthered empirically, by comparing user’s privacy expectations in both situations. On more theoretical grounds, this typology could also be linked to Habermas’ system and life-world.

Originality/value

A typology has been provided to compare the relative autonomy users receive for settings that drive revenue and settings, which are independent from revenue.

Details

info, vol. 16 no. 4
Type: Research Article
ISSN: 1463-6697

Keywords

1 – 10 of 12