Privacy Security in Big Data and Privacy-Preserving Data Mining (PPDM)

阿新 • • 發佈：2021-10-08

Introduction

Big data is such a hot and well-known concept in recent years that it can often be heard or seen in everyday life. In this introduction, I would first explain the definition of big data and introduce the background of the privacy security problems in big data, and then describe and evaluate the problem before introducing and evaluating the goal, Privacy-Preserving Data Mining (PPDM).

In this day and age, it seems that most people enjoy the convenience of big data from different aspects. Consumers could be provided with targeted advertising and services using techniques and methods of big data so that the users’ experience could be improved (Evens and Van Damme, 2016). A better example of the benefits big data technology brings could be Apple’s Siri, which is a voice assistant that can respond to simple commands of users (Gandomi and Haider, 2015). Gandomi and Haider (2015) also demonstrated some other quite interesting applications of big data analysis such as producing a summary of one or several documents for people to get key information faster and easier, or analysing a person's personality or influence through information in social networks.

In fact, at present, researchers still have different opinions on the specific definition of big data. The currently widely accepted definition describes it as the assets of information with a high volume, velocity and variety, the valuable information of which can only be found and used through specific technology and analytical methods (De Mauro et al.

, 2016). That means big data contains massive information of various types and is usually generated from new sources. In addition, without technology developed, it would be hard for big data to benefit people.

Computer technology has been greatly developed and widely used in recent years, making data easier to be collected, stored and analysed. The popularity of the big data market is growing. In fact, the big data market was dramatically increased, from 7.06 billion US dollars in 2011 to 49 billion in 2019, and it is expected to be more than doubled in 2026 (Liu, 2019).

The rapid development of technology can bring considerable improvement to people's lives. However, in addition to bringing benefits, technology can cause problems as well. Boyd and Crawford (2012) have described big data as invading personal privacy and reducing personal freedom. In fact, through daily activities such as visiting different websites, contacting others using phones, or even listening to music, people would give away personal information (Mai, 2016). The privacy leakage problem in everyday life seems to be common and inevitable.

As Xu et al., (2014) note, rapid development and wide application of data mining (finding useful information from data) technologies put personal privacy in an even more dangerous status. The better the big data industry is developed, the more dangerous personal privacy might be.

Bharathi’s (2017) research shows that the top three risk factors brought by big data are data brokering, global exposure to personal data and lack of governance-based security design. With the development of the internet, most information in this system is accessible for most users. In addition, data brokers, who collect and sell information about consumers, make the situation even more disordered. And currently, the secure methods of governing in this field are not well-developed. Big data techniques improve fast; however, people's vision and moral concepts cannot keep up with the development.

Technological progress should not be hindered by these obstacles. Under this circumstance, it may be an important task to find a way to get the useful information needed from big data without giving away sensitive information, which is called Privacy-Preserving Data Mining (PPDM).

PPDM methodologies are aimed to protect privacy to a certain extent while making data achieve its greatest value. So that data mining can still be efficient when applied to the converted data (Mendes and Vilela, 2017).

Except for some innovations of methods and techniques to solve this problem, there have been several other interesting approaches proposed to the problem of Privacy-Preserving Data Mining.

Zaïane (2004) pointed out in his research that it is prior to set a common definition of policy and standardise the basic rules so that people involved would not get confused. For example, to figure out what kind of information can be called privacy and what behaviour should be called leaking privacy. It is true that determining uniform standards before problems become complex is essential.

Xu et al., (2014) put forward that data providers, data collectors, data miners (people who find useful information from data) as well as decision-makers who make decisions based on the information from the data have different concerns about the security of privacy. For example, what data providers care about is how to control the sensitivity of the information they provide, while data collectors concern about how the data’s form can be changed to avoid privacy leakage. What is important is to weigh these issues to find the best solution. To consider more different roles in the problem should enable the solution widely recognized, which can be considered as a quite humane perspective.

Various views are raised to solve the Privacy-Preserving Data Mining (PPDM) problem. Nevertheless, further exploration is still required for its perfection. How to balance between losing useful information and leaking sensitive privacy should be what researchers in this field need to concentrate on.

Annotated Bibliography

Bharathi, S.V. (2017) ‘Prioritizing and Ranking the Big Data Information Security Risk Spectrum’, Global Journal of Flexible Systems Management, 18(3), pp.183–201.

Bharathi’s (2017) paper contributed to the assessment of data security risk, while few studies have done are related to this particular point. Bharathi listed twenty-five risk factors brought by big data and find out the top three of them. He gives detailed descriptions of the risks so that audience can understand clearly. In addition, the paper includes a part where Bharathi evaluate other researchers’ existing research works critically. Through this particular paper, the audience can get background information, understand the privacy risks of big data and have a certain understanding of the researches of other scholars in this field.

Gandomi, A. and Haider M. (2015) ‘Beyond the hype: Big data concepts, methods, and analytics’, International Journal of Information Management, 35(2), pp. 137-144.

Compared with Bharathi’s (2017) paper, this paper introduced more general concepts. Gandomi and Haider (2015) defined what big data means with its origin and detailed features. Then the researchers analysed how big data analysis techniques are used in the text, audio, video, and social media data and provided cases and examples for new big data analysis techniques. The paper highlighted the future developments in this field at the end. A large number of simple real-life examples are used in the description and explanation, making the paper not hard for readers without related background knowledge. The article is peer-reviewed and has been heavily cited. It would be a good choice for readers who are new to this field to get enough background information.

Mai, J. (2016) ‘Big data privacy: The datafication of personal information’, The Information Society: An International Journal, 32(3), pp. 192-199.

Mai’ s (2016) paper is highly related to privacy in the big data era. Comparing to Mendes’ (2017) research, Mai focuses more on the definition and features of privacy and the information extracted from big data. This paper is helpful for understanding the definition of privacy, how it varies among different individuals and how to distinguish privacy from the public. Therefore, after reading this paper, the risks of privacy security problems and the points that need to pay attention to in the PPDM process can be better understood.

Mendes, R., and Vilela, J.P. (2017) ‘Privacy-Preserving Data Mining: Methods, Metrics, and Applications’ IEEE Access, 5, pp. 10562–10582.

Mendes and Vilela (2017) introduced PPDM in this paper, using the example of a typical application of PPDM in relevant fields. And the researchers discuss the challenges and problems PPDM faces currently as well. The paper, which is more related to the field of computer science, is quite up-to-date with more specific concepts and opinions. It is more suitable for the audience with background knowledge of big data and privacy security to read this paper.

Xu, L. et al. (2014) ‘Information Security in Big Data: Privacy and Data Mining’, IEEE Access, 2, pp. 1149-1176.

The contribution that Xu et al.’ s (2017) paper makes is that it provides an interesting new perspective to the PPDM problem. Different concerns of different roles in the process are considered. The specific privacy issues and methods available to protect sensitive information correspond to a specific role. In addition, Xu et al.’ s paper tries to use game theory to find the optimal solution. The paper provides useful insights into this field.

References

Boyd, D. and Crawford, K. (2012) ‘Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon’, Information, communication & society, 15(5), pp. 662-679.

De Mauro, A., Greco, M., and Grimaldi, M. (2016) ‘A formal definition of Big Data based on its essential features’ Library Review, 65(3), pp. 122-135.

Evens, T. and Van Damme, K. (2016) ‘Consumers’ willingness to share personal data: Implications for newspapers’ business models’, International Journal on Media Management, 18(1), pp. 25-41.

Liu, S. (2019) Forecast of Big Data market size, based on revenue, from 2011 to 2027 (in billion U.S. dollars). Available at: https://www.statista.com/statistics/254266/global-big-data-market-forecast/ (Accessed: 3 May 2019).

Zaïane, O. R. (2004) ‘Toward Standardization in Privacy-Preserving Data Mining’, Embrapa Informática Agropecuária-Artigo em anais de congresso (ALICE), pp. 7-17.

Privacy Security in Big Data and Privacy-Preserving Data Mining (PPDM)

Introduction Big data is such a hot and well-known concept in recent years that it can often be heard or seen in everyday life. In this introduction, I would first explain the definition of big data

Fast-adapting and Privacy-preserving Federated Recommender System閱讀筆記

動機本文是2021年VLDBJ上的一篇論文。在聯邦推薦系統中，存在著資料異質性的問題，一些使用者與物品有很多互動，而一些使用者與物品互動很少，傳統的聯邦推薦利於活躍使用者的推薦，而幾乎忽略那些不活躍的使用者(這

[XState] Drag and Drop example (data-state, attr in css)

import { createMachine, assign, interpret } from \"xstate\"; const elBox = document.querySelector(\"#box\");

C++ write big files (10G+) continuously,pour data into the file and release memory manually

1 #pragma comment(lib,\"rpcrt4.lib\") 2 #include<Windows.h> 3 #include<iostream> 4 #include<algorithm>

Springboot 報錯 Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986

今天看專案日誌，發現報這個異常。是tomcat容器的問題，因為出現了特殊字元，所以報異常了。Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986

【解決】The valid characters are defined in RFC 7230 and RFC 3986

【我的問題】使用tomcat-7.0.104版本啟動war包，通過Postman傳送get請求，後臺報錯無效字元

Data Guard Physical Standby Setup in Oracle11.2 & 後加 Data Guard Broker 之01(先無Broker)

1.假設條件您有兩臺伺服器，其中裝有作業系統，並在其上安裝了Oracle Linux 7 和Oracle Database 11.2.0.4。

Data Guard Physical Standby Setup in Oracle11.2 & 後加 Data Guard Broker 之02 (switchover) 切換測試

1.將主資料庫轉換為備用資料庫(主庫上) [oracle@xag1124a ~]$sqlplus / as sysdba SQL> alter database commit to switchover to physical standby with session shutdown;

Data Guard Physical Standby Setup in Oracle11.2 & 後加 Data Guard Broker 之03 (Maximum Availability)

1.主庫 [oracle@xag1124a ~]$sqlplus / as sysdba SQL> select protection_mode,protection_level,log_mode,open_mode,flashback_on from v$database;

0211. Add and Search Word - Data structure design (M)

Add and Search Word - Data structure design (M) 題目 Design a data structure that supports the following two operations:

SpringBoot 專案遇到錯誤: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986

SpringBoot專案腦袋憨憨的遇到了如下報錯， 14:36:42 [http-nio-8691-exec-1] INFOo.a.coyote.http11.Http11Processor - Error parsing HTTP request header

Backtrader中文筆記之Tick Data and Resampling

參考連結:https://www.backtrader.com/blog/posts/2015-09-25-tickdata-resample/resample-tickdata/ backtrader could already do resampling up from minute data. Accepting tick data was not a problem, by si

[論文筆記 ECCV2020] Learning to Count in the Crowd from Limited Labeled Data

[論文筆記 ECCV2020] Learning to Count in the Crowd from Limited Labeled Data 摘要 Abstract貢獻 ContributionsModel Architecture(GP-based iterative learning)整個訓練過程分為兩個階段1. labeled

KVM Forum 2019: virtio-vsock in QEMU, Firecracker and Linux

Status, Performance and Challenges Slidesandrecordingare available for the “virtio-vsock in QEMU, Firecracker and Linux: Status, Performance and Challenges” talk that Andra Paraschiv an

java.lang.IllegalArgumentException: Invalid character found in the request target [請求url].The valid characters are defined in RFC 7230 and RFC 3986 異常處理

java.lang.IllegalArgumentException: Invalid character found in the request target [/wo/insert?id=8&workOrderNo=20200008&workOrderStatus=Status8&creator=zhang&otherFields={name:%22zgx%2

Privacy Security in Big Data and Privacy-Preserving Data Mining (PPDM)

Introduction

Annotated Bibliography

References

Privacy Security in Big Data and Privacy-Preserving Data Mining (PPDM)

Fast-adapting and Privacy-preserving Federated Recommender System閱讀筆記

[XState] Drag and Drop example (data-state, attr in css)

C++ write big files (10G+) continuously,pour data into the file and release memory manually

Springboot 報錯 Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986

【解決】The valid characters are defined in RFC 7230 and RFC 3986

Data Guard Physical Standby Setup in Oracle11.2 & 後加 Data Guard Broker 之01(先無Broker)

Data Guard Physical Standby Setup in Oracle11.2 & 後加 Data Guard Broker 之02 (switchover) 切換測試

Data Guard Physical Standby Setup in Oracle11.2 & 後加 Data Guard Broker 之03 (Maximum Availability)

0211. Add and Search Word - Data structure design (M)

SpringBoot 專案遇到錯誤: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986

Backtrader中文筆記之Tick Data and Resampling

[論文筆記 ECCV2020] Learning to Count in the Crowd from Limited Labeled Data

KVM Forum 2019: virtio-vsock in QEMU, Firecracker and Linux

java.lang.IllegalArgumentException: Invalid character found in the request target [請求url].The valid characters are defined in RFC 7230 and RFC 3986 異常處理

211. Design Add and Search Words Data Structure

Estimating Conversion Rate in Display Advertising from Past Performance Data 論文閱讀筆記

springboot-The valid characters are defined in RFC 7230 and RFC 3986

Android Build output:w: -Xcoroutines has no effect: coroutines are enabled anyway in 1.3 and beyond

Thinking in Ramda: Immutability and Objects

Privacy Security in Big Data and Privacy-Preserving Data Mining (PPDM)

Introduction

Annotated Bibliography

References

相關推薦