(PhD thesis) Privacy-Preserving Big Data Management and Analytics in Distributed Environments

Title: Privacy-Preserving Big Data Management and Analytics in Distributed Environments

 

Location: LORIA, Nancy, France

 

Research themes: Big Data, Data Analytics, Privacy, Machine learning

 

Contacts:

Dr. Alfredo Cuzzocrea (University of Calabria & LORIA, alfredo.cuzzocrea@unical.it)

Dr. Abdessamad Imine (LORIA, abdessamad.imine@loria.fr)

 

 

Context:

Nowadays, big data management and analytics, based often on distributed environments, is gaining momentum within the research community (e.g., [1,2,3]). Basically, the main issue with big data management concerns with effectively and efficiently managing massive big data repositories for a wide variety of typical data management tasks, such as representation, querying, indexing, partitioning, and so forth. On the other hand, big data analytics concerns with extracting useful, actionable knowledge from big data repositories for decision making purposes, by extending classical approaches inherited from decades of data mining and machine learning research. In this so-delineated context, the issue of supporting privacy-preserving big data management and analytics (e.g., [4,5,6]) plays a first-class role, especially with respect to the wide class of emerging big data application scenarios, which range from social networks to bio-informatics, from sensors networks to web recommendation tools, from e-science systems to e-government systems, and so forth. In all these applicative settings, protecting the privacy of sensitive information, for instance personal data (e.g., [7,8,9]) or aggregate data (e.g., [10,11,12]), can be clearly intended as an enabling technology.

 

Project description:

Following the great deal of interest for privacy-preserving big data management and analytics in distributed environments that has emerged during the last years, the research community already exposes quite a large literature on the topic. This demonstrates the maturity of the topic as well. Where future efforts will be oriented to? This actual PhD proposal aims at answering to this challenging question. From a side, it is un-doubtful that theoretical tools for supporting privacy-preserving big data management and analytics in distributed environments represent a very interesting research area to be explored. In this context, extending well-consolidated theoretical models for privacy-preserving OLAP to emerging tools such as differential privacy is a promising research direction. This paradigm is further sensible to be extended to more general privacy-preserving big data publishing problems whose integration with innovative advanced machine learning tools, such as tensor-based big data analytics, constitutes a vibrant area of research with outstanding outcomes in both theoretical contributions and practical achievements. On the other hand, as regards big data analytics properly, another interesting line of research for the actual PhD proposal is represented by the issue of supporting long-running big data analytics query processing in distributed environments, for instance Cloud stores, in a privacy-preserving manner. Here, the main problem consists in how to combine the privacy preservation of singleton query (e.g., OLAP query) that composes the distributed big data analytics task with the privacy preservation of the whole distributed big data analytics task composed by (singleton) queries.

 

The main objectives of the work consist thus in devising innovative models, methods and techniques for effectively and efficiently supporting privacy-preserving big data management and analytics in distributed environments, by also providing significant realizations in reference case studies.

 

Bibliography:

[1] D. Agrawal, S. Das, A. El Abbadi, “Big Data and Cloud Computing: Current State and Future Opportunities”. EDBT 2011, pp. 530-533, 2011

 

[2] A. Labrinidis, H.V. Jagadish, “Challenges and Opportunities with Big Data”. Proceedings of the VLDB Endowment 5(12), pp. 2032-2033, 2012

 

[3] P. Zikopoulos, C. Eaton, “Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data”. McGraw-Hill Osborne Media, 2011

 

[4] R. Lu, H. Zhu, X. Liu, J.K. Liu, J. Shao, “Toward Efficient and Privacy-Preserving Computing in Big Data Era”. IEEE Network 28(4), pp. 46-50, 2014

 

[5] H.-Y. Tran, J. Hu, “Privacy-Preserving Big Data Analytics A Comprehensive Survey”. Journal of Parallel and Distributed Computing 134, pp. 207-218, 2019

 

[6] A. Cuzzocrea, E. Damiani, “Making the Pedigree to Your Big Data Repository: Innovative Methods, Solutions, and Algorithms for Supporting Big Data Privacy in Distributed Settings via Data-Driven Paradigms”. IEEE COMPSAC 2019, pp. 508-516, 2019

 

[7] P. Liang, L. Zhang, L. Kang, J. Ren, “Privacy-Preserving Decentralized ABE for Secure Sharing of Personal Health Records in Cloud Storage”. Journal of Information Security and Applications 47, pp. 258-266, 2019

 

[8] M.H. Au, K. Liang, J.K. Liu, R. Lu, J. Ning, “Privacy-Preserving Personal Data Operation on Mobile Cloud – Chances and Challenges over Advanced Persistent Threat”. Future Generation Computer Systems 79, pp. 337-349, 2018

 

[9] E.G. Komishani, M. Abadi, F. Deldar, “PPTD: Preserving Personalized Privacy in Trajectory Data Publishing by Sensitive Attribute Generalization and Trajectory Local Suppression”. Knowledge Based Systems 94, pp. 43-59, 2016

 

[10] A. Cuzzocrea, “Privacy-Preserving Big Data Management: The Case of OLAP”. “Big Data – Algorithms, Analytics, and Applications”, Chapman and Hall/CRC, pp. 301-326, 2015

 

[11] A. Cuzzocrea, D. Saccà, “A Constraint-Based Framework for Computing Privacy Preserving OLAP Aggregations on Data Cubes”. ADBIS 2011, pp. 95-106, 2011

 

[12] A. Cuzzocrea, V. Russo, D. Saccà, “A Robust Sampling-Based Framework for Privacy Preserving OLAP”. DaWaK 2008, pp. 97-114, 2008

 

 

Additional Information:

 

Ideal Candidate’s skill:

The successful candidate has knowledge and experience in big data management and analytics topics, and privacy and (possibly) security over big data topics.

 

Duration: 3 years

 

Starting date: November. 1st 2020.

 

The required documents for applying are the following:

– CV;

– a motivation letter;

– your degree certificates and transcripts for Bachelor and Master.

– master thesis if it is already completed, or a description of the work in progress, otherwise;

– all your publications, if any.

– at least one recommendation letter from the person who supervises(d) your Master thesis (or research project or internship).

 

All the documents should be sent in at most 2 pdf files; one file should contain the publications, if any, the other file should contain all the other documents. These two files should be sent files to your prospective PhD supervisors.

Logo d'Inria