We are witnessing an exponential accumulation of personal data on central servers: data automatically gathered by administrations and companies but also data produced by individuals themselves (e.g., photos, agendas, data produced by smart appliances and quantified-self devices) and deliberately stored in the cloud for convenience. The net effect is, on the one hand, an unprecedented threat on data privacy due to abusive usage and attacks and, on the other hand, difficulties in providing powerful user-centric services (e.g. personal big data) which require crossing data stored today in isolated silos. The Personal Cloud paradigm holds the promise of a Privacy-by-Design storage and computing platform, where each individual can gather her complete digital environment in one place and share it with applications and users, while preserving her control. However, this paradigm leaves the privacy and security issues in user's hands, which leads to a paradox if we consider the weaknesses of individuals' autonomy in terms of computer security, ability and willingness to administer sharing policies. The challenge is however paramount in a society where emerging economic models are all based - directly or indirectly - on exploiting personal data.
While many research works tackle the organization of the user's workspace, the semantic unification of personal information, the personal data analytics problems, the objective of the PETRUS project-team is to tackle the privacy and security challenges from an architectural point of view. More precisely, our objective is to help providing a technical solution to the personal cloud paradox. More precisely, our goals are (i) to propose new architectures (encompassing both software and hardware aspects) and administration models (decentralized access and usage control models, data sharing, data collection and retention models) for secure personal cloud data management, (ii) to propose new secure distributed database indexing models, privacy preserving query processing strategies and data anonymization techniques for the personal cloud, and (iii) study economic, legal and societal issues linked to secure personal cloud adoption.
To tackle the challenge introduced above, we identify three main lines of research:
Our contributions also rely on tools (algorithms, protocols, proofs, etc.) from other communities, namely security (cryptography, secure multiparty computations, differential privacy, etc.) and distributed systems (distributed hash tables, gossip protocols, etc.). Beyond the research actions, we structure our software activity around advanced platforms integrating our main research contributions. These platforms are cornerstones to help validating our research results through accurate performance measurements, a common practice in the DB community, and target the best conferences. It is also a strong vector to federate the team, simplify the bootstrapping of new PhD or master students, conduct multi-disciplinary research and open the way to industrial collaborations and technological transfers. Our main platform is called PlugDB and has reached a high level of maturity. It runs on a microcontroller and is integrated in a real PDMS home box solution deployed in the field for social-medical care. In addition, we are developing a second platform (which is only in the research prototype stage), to provide the user with a PDMS solution that can be hosted on a cloud platform using trusted execution environments (such as Intel SGX) to ensure data privacy and security.
As stated in the software section, the Petrus research strategy aims at materializing its scientific contributions in an advanced hardware/software platform with the expectation to produce a real societal impact. Hence, our software activity is structured around a common Secure Personal Cloud platform rather than several isolated demonstrators. This platform will serve as the foundation to develop a few emblematic applications.
Several privacy-preserving applications can actually be targeted by a Personal Cloud platform, like: (i) smart disclosure applications allowing the individual to recover her personal data from external sources (e.g., bank, online shopping activity, insurance, etc.), integrate them and cross them to perform personal big data tasks (e.g., to improve her budget management) ; (ii) management of personal medical records for care coordination and well-being improvement; (iii) privacy-aware data management for the IoT (e.g., in sensors, quantified-self devices, smart meters); (iv) community-based sensing and community data sharing; (v) privacy-preserving studies (e.g., cohorts, public surveys, privacy-preserving data publishing). Such applications overlap with all the research axes described above but each of them also presents its own specificities. For instance, the smart disclosure applications will focus primarily on sharing models and enforcement, the IoT applications require to look with priority at the embedded data management and sustainability issues, while community-based sensing and privacy-preserving studies demand to study secure and efficient global query processing.
Among these applications domains, one is already receiving a particular attention from our team. Indeed, we gained a strong expertise in the management and protection of healthcare data through our past DMSP (Dossier Medico-Social Partagé) experiment in the field. This expertise is being exploited to develop a dedicated healthcare and well-being personal cloud platform. We are currently deploying 10000 boxes equipped with PlugDB in the context of the DomYcile project. In this context, we are currently setting up an Inria Innovation Lab with the Hippocad company to industrialize this platform and deploy it at large scale (see Section the bilateral contract OwnCare II-Lab).
PlugDB is a complete platform dedicated to a secure and ubiquitous management of personal data. It aims at providing an alternative to a systematic centralization of personal data. The PlugDB engine is a personal database server capable of storing data (tuples and documents) in tables and BLOBs, indexing them, querying them in SQL, sharing them through assertional access control policies and enforcing transactional properties (atomicity, integrity, durability).
The prototype version of PlugDB engine is embedded in a tamper-resistant hardware device combining the security of smartcard with the storage capacity of NAND Flash. The personal database is hosted encrypted in NAND Flash and the PlugDB engine code runs in the tamper-resistant device. Complementary modules allow to pre-compile SQL queries for the applications, communicate with the DBMS from a remote Java program, synchronize local data with remote servers (typically used for recovering the database in the case of a broken or lost devices) and participate in distributed computation (e.g., global queries). Then, PlugDB was extended to run both on secure devices provided by Gemalto and on specific secure devices designed by PETRUS and assembled by electronic SMEs. Mastering the hardware platform opens up new research and experiment opportunities (e.g., support for wireless communication, secure authentication, sensing capabilities, battery powered ...).
PlugDB engine has been registered first at APP (Agence de Protection des Programmes) in 2009 - a new version being registered every two years - and the hardware datasheets in 2015. PlugDB has been experimented in the field, notably in the healthcare domain. PlugDB was used in an educational platform that we set up : SIPD (Système d’Information Privacy- by-Design). SIPD was used at ENSIIE, INSA CVL and UVSQ through the Versailles Sciences Lab fablab, to raise students awareness of privacy protection problems and embedded programming.
PlugDB combines several research contributions from the team, at the crossroads of flash data management, embedded data processing and secure distributed computations. It then strongly federates all members of our team (permanent members, PhD students and engineers). It is also a vector of visibility, technological transfer and dissemination and gives us the opportunity to collaborate with researchers from other disciplines around a concrete privacy-enhancing platform.
PlugDB is currently industrialized in the context of the OwnCare Inria Innovation Lab (II-Lab). In OwnCare, PlugDB acts as a secure personal cloud to manage medical/social data for people receiving care at home. It is currently being deployed over 10.000 patient in the Yvelines district. The industrialization process covers the development of a complete testing environment, the writing of a detailed documentation and the development of additional features (e.g., embedded ODBC driver, TPM support, flexible access control model and embedded code upgrade notably). It has also required the design of a new hardware platform equipped with a battery power supply, introducing new energy consumption issues for the embedded software.
Personal Data Management Systems (PDMS) arrive at a rapid pace boosted by smart disclosure initiatives and new regulations such as GDPR. However, our survey 1 indicates that the existing PDMS solutions cover partially the PDMS data life-cycle and, more importantly, focus on specific privacy threats depending on the employed architecture. To address this issue, we proposed in 1 a logical reference architecture for an extensive (i.e., covering all the major functionalities) and secure (i.e., circumventing all the threats specific to the PDMS context) PDMS. We also discussed several possible physical instances fo the architecture and showed that TEEs (Trusted Execution Environments) are a prime option for building a trustworthy PDMS platform 2.
Hence, based on our previous studies, we have developed a first prototype of an extensive and secure PDMS (ES-PDMS) platform using the state-of-the-art TEE technology available today, i.e., Intel Software Guard eXtension (SGX). The originality of our approach is to achieve extensibility through a set of isolated data-oriented tasks potentially untrusted by the PDMS owner, running alongside a trusted module which controls the complete workflow and limits data leakage. Our ES-PDMS software stack can be deployed on any SGX-enabled machine (i.e., any relatively recent computer having an Intel CPU). This prototype was presented in a demonstration paper 6 focusing on security properties of the platform with the help of several concrete scenarios and interactive games.
Xinqing Li's PhD thesis, which started in Oct 2023, takes up some of the plateforms's elements and adapts them to the context of a cloud database service (going beyond that of PDMS) with code extensions potentially vulnerable to certain attacks.
Data minimization is a privacy principle of the GDPR, which states that the collection of personal data must be minimized according to the purpose of the intended processing. We have proposed a "privacy-enhancing technology" (PET) for the collection of personal data, specifically targeting public sectors such as social services, where forms are filled in by applicants to apply for certain benefits. Our approach, based on classical logic and game theory, aims to minimize data collection while ensuring that users make informed choices and obtain all due benefits. It offers a practical solution for preserving privacy without compromising service accuracy. We also demonstrated the implementation of our data minimization model in two real-life scenarios concerning French social benefits. This work resulted in an article in EDBT'24 3 and was demonstrated at CSS'23 13. This work was done in collaboration with Sabine Frittella (INSA-VDL) and Guillaume Scerri (ENS Paris Saclay).
The development and adoption of personal data management systems (PDMS) has been fueled by legal and technical means such as smart disclosure, data portability and data altruism. By using a PDMS, individuals can effortlessly gather and share data, generated directly by their devices or as a result of their interactions with companies or institutions. In this context, federated learning appears to be a very promising technology, but it requires secure, reliable, and scalable aggregation protocols to preserve user privacy and account for potential PDMS dropouts. Despite recent significant progress in secure aggregation for federated learning, we still lack a solution suitable for the fully decentralized PDMS context. In this work, we proposed a family of fully decentralized protocols that are scalable and reliable with respect to dropouts. We focused in particular on the reliability property which is key in a peer-to-peer system wherein aggregators are system nodes and are subject to dropouts in the same way as contributor nodes. We showed that in a decentralized setting, reliability raises a tension between the potential completeness of the result and the aggregation cost. We proposed a set of strategies that deal with dropouts and offer different trade-offs between completeness and cost. We extensively evaluated the proposed protocols and showed that they cover the design space allowing to favor completeness or cost in all settings. This work was published in SSDBM 17. We also built a demonstration platform to illustrate these results within the open-source Cozy Cloud PDMS product. This demonstration was published in COOPIS 18 and highlights both the utility aspect of collective computations and the main features of the aggregation protocol.
This work explores the distributed use of PDMSs in an Opportunistic Network context, where messages are transferred from one device to another without the need for any infrastructure. The objective is to enable the implementation of complex processing crossing data from thousands of individuals, while guaranteeing the security and fault tolerance of the executions. The proposed approach leverages the Trusted Execution Environments to define a new computing paradigm, entitled Edgelet computing, that satisfies both validity, resiliency and privacy properties. Contributions include: (1) security mechanisms to protect executions from malicious attacks seeking to plunder personal data, (2) resiliency strategies to tolerate failures and message losses induced by the fully decentralized environment, (3) extensive validations and practical demonstrations of the proposed methods. In 2023, we have demonstrated a prototype in PERCOM’23 16 which highlights the pertinence of the approach through a real medical use-case. We then extend the applicability of the solution beyond the OppNet context, by considering TEE-enabled devices communicating through various forms of degraded channels. The demonstration published at EDBT'23 15 showed the pertinence of the approach through execution scenarios integrating heterogeneous secure personal devices. This work is part of Ludovic Javet's PhD 22.
Since July 2023, a PETRUS spin-off has been incubating around the development of privacy-enhancing technologies (PETs). This project aims to model and implement solutions to protect personal data in different contexts such as: information exchange for remote workers, anonymous reporting of school bullying or minimizing the collection of personal data in applications. Drawing inspiration from regulations such as the RGPD and societal issues, the aim is to reconcile privacy protection and legitimate uses in an ever-changing digital environment, with a focus on the secure design, optimization and effective deployment of PETs, while ensuring their explicability and practical applicability.
- Partners: PETRUS, Hippocad
The OwnCare IILab – Inria Innovation Lab - (Jan 2018-Dec 2021) aimed at conceiving a secured personal medical folder facilitating the organization of medical and social care provided at home to elderly people and at deploying it in the field. This IILab has been built in partnership with the Hippocad company which won, in association with Inria and UVSQ, a public call for tender launched by the Yvelines district to deploy this medical folder on the whole distinct (10.000 patients). This solution, named DomYcile in the Yvelines district, is based on a home box combining the PlugDB hardware/software technology developed by the Petrus team (to manage and secure the medical folder) and additional technology developed by Hippocad. The primary result of the OwnCare IILab has been to build a concrete industrial solution based on PlugDB and deploy it so far among 3000 patients in the Yvelines district, despite the Covid pandemia. In 2022, Hippocad has become a subsidiary of the La Poste group opening new opportunities in terms of deployment. Hence, Inria, UVSQ and Hippocad have launched a follow up of the OwnCare IILab for the period Jan 2022-Dec 2025. The goal of the OwnCare2 IILab is (1) to integrate our solution in the MaSanté 2022 national roadmap by making it interoperable with external services (without hurting the security provided by the box), (2) to handle, in a privacy-preserving way, new usages like actimetrics, teleassistance and global statistics based on IoT techniques, machine learning and decentralized computations and (3) try to deploy it at the national/international level. In 2023, a new district (Hauts de Seine) has decided to deploy the DomYcile solution on its own territory, leading to an extended partnership. The beginning of this second deployment is planned in the course of 2024.
- Partners: Cozy Cloud, PETRUS
A third CIFRE PhD thesis has been concluded between Cozy Cloud and Julien Mirval from PETRUS. Cozy Cloud is a French startup providing a personal Cloud platform. The Cozy product is a software stack that anyone can deploy to run his personal server in order to host his personal data and web services. The objective of this thesis is to propose appropriate solutions to effectively train an AI model (e.g., a deep neural network) in a fully distributed system while providing strong security guarantees to the participating nodes. The results, in the form of protocols and distributed and secure execution algorithms,were applied to practical cases provided by the Cozy Cloud company (see 18).
Partners: Inria, CNRS, EDHEC, INSA CVL, INSA Lyon, UGA, Université de Lille, Université Rennes 1, UVSQ, CNIL
Digital technologies provide services that can greatly increase quality of life (e.g. connected e-health devices, location based services or personal assistants). However, these services can also raise major privacy risks, as they involve personal data, or even sensitive data. Indeed, this notion of personal data is the cornerstone of French and European regulations, since processing such data triggers a series of obligations that the data controller must abide by. This raises many multidisciplinary issues, as the challenges are not only technological, but also societal, judiciary, economic, political and ethical. The objectives of this project are thus to study the threats on privacy that have been introduced by these new services, and to conceive theoretical and technical privacy-preserving solutions that are compatible with French and European regulations, that preserve the quality of experience of the users. These solutions will be deployed and assessed, both on the technological and legal sides, and on their societal acceptability. In order to achieve these objectives, we adopt an interdisciplinary approach, bringing together many diverse fields: computer science, technology, engineering, social sciences, economy and law.
The project’s scientific program focuses on new forms of personal information collection, on the learning of Artificial Intelligence (AI) models that preserve the confidentiality of personal information used, on data anonymization techniques, on securing personal data management systems, on differential privacy, on personal data legal protection and compliance, and all the associated societal and ethical considerations. This unifying interdisciplinary research program brings together internationally recognized research teams (from universities, engineering schools and institutions) working on privacy, and the French Data Protection Authority (CNIL).
Partners: CERDI (Université Paris Saclay), LITEM (IMT-BS), PETRUS (Inria-UVSQ).
Despite its somewhat seemingly light nature, Youth Privacy Protection in Online Gaming is a very important topic in the field of Privacy. Indeed, 94% of minor children (under 18 years old) play video games, and 60% of the 10-17 year olds play online. While 68% of parent declare “to feel concerned” by the games their children play online, 12% actively monitor their children’s activities. This topic leads to several multidisciplinary questions, which are studied in the context of the YPPOG project, which groups researcher in Law (CERDI), Economics (LITEM) and Computer Science (PETRUS). First of all, consent: what does it mean for (or how can) a minor to consent ? How can the GDPR be enforced online ? Do technical solutions exist, and are they sufficient. Secondly, data processing of minor’s data: how is the data collected, what are the purposes ? Can we accompany the stakeholders of the online gaming field to help them to apply the GDPR ? The french regulator (the CNIL) is very interested in the topic, and is monitoring the advance of our project.
As stated above, our approach is multi-disciplinary, mixing computer science, to propose privacy preserving algorithms and infrastructures to manage youth online data, economy in order to understand the constraints of the gaming industry and how to take them into account when it comes to legal aspects, and finally law in order to propose procedures to help the companies move to legally compliant processes.
One example of a technical prototype, developed by PETRUS in 2022 (see the git repository) is a wrapper for the publicly available data of one of the most played games online, League of Legends (LoL). In 2023, we have extended our wrappers to many different games and gaming platforms (Fortnite, Riot, Steam, ...) and are in the process of recruiting participants for a field study to see if it is possible to train AI algorithms to detect, based on this open data, if a player is under or over 18.