Solving Truthfulness-Privacy Trade-Off in Mixed Data Outsourcing by Using Data Balancing and Attribute Correlation-Aware Differential Privacy
In the modern era, data of diverse types (medical, financial, etc.) are outsourced from data owner environments to the public domains for data mining and knowledge discovery purposes. However, data often encompass sensitive information about individuals, and outsourcing the data without sufficient p...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10858716/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In the modern era, data of diverse types (medical, financial, etc.) are outsourced from data owner environments to the public domains for data mining and knowledge discovery purposes. However, data often encompass sensitive information about individuals, and outsourcing the data without sufficient protection may endanger privacy. Anonymization methods are mostly used in data outsourcing to protect privacy; however, it is very hard to apply anonymity to datasets of poor quality while maintaining an equilibrium between privacy, utility, and truthfulness (i.e., ensuring the values in anonymized data are consistent with the real data). To address these technical problems, we propose and implement a data balancing and attribute correlation-aware differential privacy (DP) method for mixed data outsourcing while accomplishing the three crucial objectives of privacy, truthfulness, and utility. Our method first identifies quality-related issues in the data and solves them in an automated manner by adding the fewest possible good-quality synthetic records. We propose a data partitioning method that exploits correlations between attributes to create blocks of data to lessen the amount of noise added by the DP model. To preserve higher truthfulness while guaranteeing privacy, categorical attributes are considered as one unit, and an exponential mechanism is applied to them. The numerical attributes are transformed using the Laplace mechanism with a relatively higher <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>. The joint application of these mechanisms to data blocks enables effective resolution of the truthfulness-privacy trade-off, and data usability is extremely high. Extensive experiments are performed on three benchmark datasets to demonstrate the effectiveness of our method in real scenarios. The experiment results and analysis indicate significantly better performance on four different evaluation metrics compared to the recent state-of-the-art (SOTA) DP-based methods. Furthermore, our method has better efficiency than its counterparts. |
---|---|
ISSN: | 2169-3536 |