SPLiTS, Secure Programming Languages & Tools for Security, is a research team at Inria (Inria équipe-projet), focusing on defensive system security and compilers. It was created on July 1st, 2023.
SPLiTS’ overall goal is to develop safeguard mechanisms for confidentiality and integrity of computer systems by providing mathematical proofs that align with currently valid threat models at various levels of abstraction. To accomplish this goal, our research plan is organized into six axes, addressing different levels of abstraction:
One of the most important concepts in cybersecurity is, arguably, that of a threat model.
A threat model determines what is the power of an attacker: essentially, it defines what the attacker can and cannot do. For example, a particular threat model may assume that
factorization cannot be solved in polynomial time, or that a number given by a random generator can never be guessed, or that memory boundaries imposed by software can never be crossed by the attacker, or that
certain hardware microarchitectural features such as
the cache memory can be observed.
Threat models are important, in particular, to reason about
what one is defending against by focusing on certain attacker's capabilities and abstracting away from
any attacks outside the considered threat model. Threat models represent the most vulnerable point in security research and engineering.
This is because threat models may be contradicted by novel attacks, and assumptions of what an attacker cannot do might be violated in practice.
A remarkable example of a wrong threat model came to light in January 2018, when Spectre vulnerabilities were revealed. Up to then, application security researchers and engineers had assumed that the abstraction provided by hardware, that is the hardware architecture, was mostly unbreakable by the attacker (with few exceptions such as, for example, leaks due to the cache memory). Our research plan consists in
elucidating the synergies among different threat models in the new Spectre era, by facing major scientific challenges such as the design of secure and efficient software and hardware that take the new synergies into account.
A session refers to an interaction, devoted to a particular topic, among a number of participants. When a web server communicates with an object, session information may include tokens that authenticate the server, which could be used by an attacker to access an actuator and use the device to change the physical world, e.g. the token could be used to operate a smart vacuum cleaner and obtain a house map. An attacker modifying the normal flow of a web session, e.g. manipulating session cookies or tokens, violates session integrity.
Session types were introduced in the mid-nineties, as a tool for specifying and analyzing web services in a variant of the
Running a program in a popular programming language today, such as JavaScript or Python, brings us back to the performance of a running C program on a computer from about 20 years ago. Energy consumption is subject to a similar law.
Improving the performance of modern languages, either by developing new better adapted hardware architectures, or by developing new techniques implementation and execution is therefore a subject of major research.
Hop.js, an ahead-of-time JavaScript compiler, calls upon a large part of the optimization techniques developed for functional languages (Scheme and ML languages) since the beginning of the 90s. However, these methods alone are not sufficient to obtain performance comparable to those of the best JIT compilers of the moment. To approach it, new optimizations based on so-called opportunistic optimizations have been invented. They are essentially based on the idea of transposing the techniques of long-used hardware speculations. These new software techniques are still in their infancy and constitute the central element of the program research that will be developed in SPLiTS. In order to do that for the JavaScript language we will investigate the approach of opportunistic JavaScript typing as well as TypeScript, a typed version of JavaScript.
As noted earlier, an antagonism has recently been highlighted between processor performance and execution security, mainly due to speculation phenomena (e.g. Spectre mentioned above) whose importance is now well understood and accepted. To be able to bring a satisfactory answer to this difficult problem, that is to say an answer that would combine performance and safety, dual skills are required: on the one hand knowledge and in-depth understanding of the concepts and techniques used in the design of architectures and on the other hand a mastery of formal mathematical tools which allow us to reason about the behavior of processors and to establish proofs of correctness and security.
The Jasmin compiler was designed to achieve predictability and efficiency of the output binary code (for now the code is targeted to the x86 architecture). The compiler is formally verified in the Coq proof assistant. The Jasmin compiler generates binary code with provable security guarantees to defend against a speculative attacker that can measure access to the cache memory: indeed, generated Jasmin code is not vulnerable -by construction- to Spectre-PHT nor Spectre-STL. Without using a threat model for speculative execution, Jasmin generates binary code with a performance closed to the fastest cryptographic code (e.g. the OpenSSL implementation in Jasmin enjoys a performance competitive with OpenSSL, even slightly beating it). We plan to extend Jasmin compilers to generalize the guarantees it provides for the speculative and quantum thread models.
At the abstraction level provided by the hardware architecture, programs are assumed to execute sequentially in the order in which the program control flow is provided.
However, at the level of the hardware implementation, program execution is more complex and involves for example execution of programs out-of-order and speculatively. This complexity at the microarchitectural level was supposed to be transparent for the developer, who should only reason about programs using the abstractions provided by the hardware architecture.
Yet, Spectre attacks, quickly followed by many other attacks, demonstrated how an attacker could make use of speculative execution to exfiltrate secrets that were otherwise highly protected at the architectural level.
The consequences of these speculative attacks can be devastating. For example,
Branch Target Injection (a.k.a. BTI or Spectre v2) allows the attacker to ignore the architectural privilege boundaries, i.e. the attacker can control the execution of a
more privileged program, for instance a program belonging to the operating system kernel.
Since their disclosure, speculative attacks have strongly impacted both
academia and industry
1 and
have opened a new era for security for which almost all previous threat models
need to be revisited.
Our broad goal is to provide software and hardware for the speculative threat model.
The Jasmin programming language smoothly combines high-level and low-level constructs, so as to support “assembly in the head” programming. Programmers can control many low-level details that are performance-critical: instruction selection and scheduling, what registers to spill and when, etc. The language also features high-level abstractions (variables, functions, arrays, loops, etc.) to structure the source code and make it more amenable to formal verification. The Jasmin compiler produces predictable assembly and ensures that the use of high-level abstractions incurs no run-time penalty.
The semantics is formally defined to allow rigorous reasoning about program behaviors. The compiler is formally verified for correctness (the proof is machine-checked by the Coq proof assistant). This ensures that many properties can be proved on a source program and still apply to the corresponding assembly program: safety, termination, functional correctness…
Jasmin programs can be automatically checked for safety and termination (using a trusted static analyzer). The Jasmin workbench leverages the EasyCrypt toolset for formal verification. Jasmin programs can be extracted to corresponding EasyCrypt programs to prove functional correctness, cryptographic security, or security against side-channel attacks (constant-time).
2023.06.0 is a major release of Jasmin. It contains a few noteworthy changes: - local functions now use call and ret instructions, - experimental support for the ARMv7 (i.e., Cortex-M4) architecture, - a few aspects of the safety checker can be finely controlled through annotations or command-line flags, - shift and rotation operators have a simpler semantics.
As usual, it also brings in various fixes and improvements, such as bit rotation operators and automatic slicing of the input program.
The Hop programming environment consists in a web broker that intuitively combines in a single architecture a web server and a web proxy. The broker embeds a Hop interpreter for executing server-side code and a Hop client-side compiler for generating the code that will get executed by the client.
An important effort is devoted to providing Hop with a realistic and efficient implementation. The Hop implementation is validated against web applications that are used on a daily-basis. In particular, we have developed Hop applications for authoring and projecting slides, editing calendars, reading RSS streams, or managing blogs.
We have pursued the development of Hop (also sometimes refereed to as
Hop.js in the rest of this text) and our study on efficient JavaScript
implementations as well as our development of analyses for distributed
language sessions and security. These contributions concern
improvements of the hopc compiler. They have been integrated in the
new version that is about to be released and only described in
unpublished internal reports. We also pursued our collaboration with
the University of Montréal on the compilation of the Python
programming language.
In this study, we compared the performance of server-side JavaScript applications programmed using synchronous, asynchronous, and
promise-based APIs. Our findings show that, in general, synchronous
programs execute significantly faster and consume fewer resources
compared to their asynchronous counterparts. The only situation where
we observed better performance for the asynchronous model was in the
case of a highly loaded web server. For that application, under that
exceptional context, we noticed that the asynchronous approach enables
servers to degrade more gracefully than the synchronous one. This
suggests that an hybrid architecture that could combine the best
of both worlds, synchronous operations under normal situations
and asynchronous operations under super-heavy pressure, would
be optimal. The Hop.js execution environment that supports both modes
is an ideal playground for conducting this sort of experiment.
We further demonstrated that the overhead incurred by asynchronous executions is a combination of complex control flow, which makes it challenging for compilers to apply simple reduction optimizations, additional memory allocations, and high thread synchronization costs. The asynchronous model also requires intensive collaborations between JavaScript and the implementation language, with numerous exchanges between the two worlds. This places demands on the JavaScript automatic memory management system to track values passed to the implementation language.
This experiment also unveiled that a highly optimized and tuned JavaScript runtime can be defeated by a more general runtime system that enables smoother integration between the core language and foreign libraries. This suggests that such systems may be suitable when intensive collaborations between the core language and foreign libraries are expected. This is an incentive for pursuing the development of the generic runtime system hopc uses.
JavaScript allows functions to be called with a different number of arguments
than the specified number of parameters. The pseudo-variable
arguments packs all the arguments into a heap-allocated
structure and enables the receiver to introspect the received
values. The design and semantics of arguments make it
difficult to implement efficiently but it is used by millions or even
maybe billions of applications. Therefore, improving it may have a
significant global impact on computing resources.
We conducted a study on how modern JavaScript code utilizes the arguments
property. By analyzing Npm, the largest JavaScript repository, we
demonstrated that despite the introduction of alternative constructs
to support variable-arity functions, the historical arguments
property is still widely used, which is unfortunate so difficult it is
to implement efficiently. To address this issue, we presented an
optimization that significantly accelerates its performance. The
optimization can either completely eliminate the construction of the
arguments object or transform the usual heap allocation, which a
generic implementation employs, into a faster stack allocation. We
validated the effectiveness of this optimization through a set of
micro-benchmarks. When applied to the Hop.js compiler, the
optimization resulted in performance gains of one to two orders of
magnitude compared to the generic code. The analysis and optimization
can also be adopted by JIT compilers, as they rely on fast local
analyses instead of lengthy analysis of the entire program.
While the optimization may not have a spectacular impact on individual
programs, its global effect could be significant. Our study revealed
that approximately half of the 3 million JavaScript packages available on
Npm directly or indirectly depend on one or several packages that use
arguments. As a result, our optimization has the potential to
accelerate around 1.5 million packages, which likely play a role in
millions or billions of real applications. Considering the collective
impact of optimizing all these executions, the potential benefits
could be tremendous. We reckon that this compelling motivation
justified our undertaking of this study.
Python is a popular programming language whose performance is known to
be uncompetitive in comparison to static languages such as C. Although
significant efforts have already accelerated implementations of the
language, more efficient ones are still required. The development of
such optimized implementations is nevertheless hampered by its complex
semantics and the lack of an official formal semantics. We addressed
this issue by developing an approach to define an executable semantics
targeting the development of optimizing compilers. This executable
semantics is written in a format that highlights type checks,
primitive values boxing and unboxing, and function calls which are all
known sources of overhead. We also developed semPy, a partial
evaluator of our executable semantics that can be used to remove
redundant operations when evaluating arithmetic operators.
To validate our approach, we integrated these behaviors to Zipi, an AoT optimizing Python compiler prototype. Zipi compiles behaviors and dispatches operations to their corresponding behaviors at run time. This increases execution speed, offering performance that rivals PyPy. Although this speedup is limited to arithmetic-heavy programs, behaviors could be extended to other operations or serve alongside other optimization techniques. On some tasks, Zipi displays performance competitive with that of state of art Python implementations.
We hope semPy and the behavior optimization can contribute to the
ongoing optimization efforts of Python implementations. It appears to
us that they would be well suited for CPython, as they specifically
address the known overhead of this implementation.
This work has been presented at the 16th ACM SIGPLAN International Conference on Software Language Engineering (SLE'23) conference 8.
Symbolic execution is a program analysis technique commonly utilized to determine whether programs violate properties and, in case violations are found, to generate inputs that can trigger them. Used in the context of security properties such as noninterference, symbolic execution is precise when looking for counterexample pairs of traces when insecure information flows are found, however it is sound only for a subset of executions. Thus, it does not allow to prove the correctness of programs with executions beyond the given bound. By contrast, abstract interpretation-based static analysis guarantees soundness but generally lacks the ability to provide counterexample pairs of traces. In this paper, we propose to weave both to obtain the best of two worlds. We demonstrate this with a series of static analyses, including a static analysis called RedSoundRSE aimed at verifying noninterference. RedSoundRSE provides both semantically sound results and the ability to derive counterexample pairs of traces up to a bound. It relies on a combination of symbolic execution and abstract domains inspired by the well known notion of reduced product. We formalize RedSoundRSE and prove its soundness as well as its relative precision up to a bound. We also provide a prototype implementation of RedSoundRSE and evaluate it on a sample of challenging examples.
This work has been presented at VMCAI'23 9.
We propose ProSpeCT, a generic formal processor model providing provably secure speculation for the constant-time policy. For constant-time programs under a non-speculative semantics, ProSpeCT guarantees that speculative and out-of-order execution cause no microarchitectural leaks. This guarantee is achieved by tracking secrets in the processor pipeline and ensuring that they do not influence the microarchitectural state during speculative execution. Our formalization covers a broad class of speculation mechanisms, generalizing prior work. As a result, our security proof covers all known Spectre attacks, including load value injection (LVI) attacks.
In addition to the formal model, we provide a prototype hardware implementation of ProSpeCT on a RISC-V processor and show evidence of its low impact on hardware cost, performance, and required software changes. In particular, the experimental evaluation confirms our expectation that for a compliant constant-time binary, enabling ProSpeCT incurs no performance overhead.
This work has been presented at USENIX Security 2023 7.
We tackle the problem of designing efficient binary-level verification for a subset of information flow properties encompassing constant-time and secret-erasure. These properties are crucial for cryptographic implementations, but are generally not preserved by compilers. Our proposal builds on relational symbolic execution enhanced with new optimizations dedicated to information flow and binary-level analysis, yielding a dramatic improvement over prior work based on symbolic execution. We implement a prototype, Binsec/Rel, for bug-finding and bounded-verification of constant-time and secret-erasure, and perform extensive experiments on a set of 338 cryptographic implementations, demonstrating the benefits of our approach. Using Binsec/Rel, we also automate two prior manual studies on preservation of constant-time and secret-erasure by compilers for a total of 4148 and 1156 binaries respectively. Interestingly, our analysis highlights incorrect usages of volatile data pointer for secret erasure and shows that scrubbing mechanisms based on volatile function pointers can introduce additional register spilling which might break secret-erasure. We also discovered that gcc -O0 and backend passes of clang introduce violations of constant-time in implementations that were previously deemed secure by a state-of-the-art constant-time verification tool operating at LLVM level, showing the importance of reasoning at binary-level.
This work has been published as a journal in ACM Transactions on Privacy and Security 2.
We revisit the problem of erasing sensitive data from memory and registers during return from a cryptographic routine. While the problem and related attacker model is fairly easy to phrase, it turns out to be surprisingly hard to guarantee security in this model when implementing cryptography in common languages such as C/C++ or Rust. We revisit the issues surrounding zeroization and then present a principled solution in the sense that it guarantees that sensitive data is erased and it clearly defines when this happens. We implement our solution as extension to the formally verified Jasmin compiler and extend the correctness proof of the compiler to cover zeroization. We show that the approach seamlessly integrates with state-of-the-art protections against microarchitectural attacks by integrating zeroization into Libjade, a cryptographic library written in Jasmin with systematic protections against timing and Spectre-v1 attacks. Benchmarks show that in many cases the overhead of zeroization is barely measurable and that it stays below 2% except for highly optimized symmetric crypto routines on short inputs. This paper is accepted at CHES 2024.
The area of post-quantum cryptography (PQC) focuses on classical cryptosystems that are provably secure against quantum adversaries. PQC is based on computational problems that are conjectured to be hard for quantum computers, e.g., the learning with errors problem 13. Simply relying on such assumptions, however, is insufficient to ensure security against quantum attackers; one must also verify that a security reduction holds in the quantum setting.
A natural question to ask is therefore whether we need a fundamentally different approach to the design of formal verification tools to capture these results, which seem tantalizingly close to the classical setting. For example, Unruh 14 suggests that the EasyCrypt is not sound for quantum adversaries. Concretely, 14 claim that the CHSH protocol, which is secure in the classic setting but not in the quantum setting, can be proved secure in EasyCrypt. While this point is moot, because the EasyCrypt logics was not designed (or claimed) to be sound for the quantum setting, it does raises an important question that we addressed in the paper 12:
While the primary advantage of the logic proposed in 12 is its usability, a notable drawback is its limited expressivity. Numerous valid proof techniques for post-quantum cryptography cannot be articulated within this logic. Consequently, we have initiated a collaborative effort to define a new logic for proving post-quantum cryptography, aiming for expressiveness comparable to 14 while maintaining a high level of usability as seen in 12.
Session types describe communication protocols involving two or more
participants by specifying the sequence of exchanged messages and
their functionality (sender, receiver and type of carried data). They
may be viewed as the analogue, for concurrency and distribution, of
data types for sequential computation. Originally conceived as a
static analysis technique for a variant of the
The aim of session types is to ensure safety properties for
sessions, such as the absence of communication errors (no
type mismatch in exchanged data) and deadlock-freedom (no
standstill until all participants are terminated). When describing multiparty
protocols, session types often target also the liveness property
of progress or lock-freedom (no participant waits
forever).
While binary sessions can be described by a single session type,
multiparty sessions require two kinds of types: a global type
that describes the whole session protocol, and local types
that describe the individual contributions of the participants to
the protocol. The key requirement to achieve safety properties such
as deadlock-freedom is that the local types of the processes
implementing the participants be obtained as projections from the
same global type. To ensure progress, global types must satisfy
additional well-formedness requirements.
What makes session types particularly attractive is that they offer several advantages at once: 1) static safety guarantees, 2) automatic check of protocol implementation correctness, based on local types, and 3) a strong connection with linear logics and with concurrency models such as communicating automata, graphical choreographies and message-sequence charts.
Choreographies are global specifications for multiparty
communication protocols, very close in spirit to multiparty session
types. A classical question for choreographies is whether they are
realizable by means of a distributed implementation. We have been
addressing this question using as an intermediate model branching pomsets,
a recently proposed model for concurrent communicating processes. We
have also investigated the relation between branching pomsets and
several classes of event structures, some of which have already been
used to model multiparty session types 1.
Choreographic languages describe possible sequences of interactions
among a set of agents. Typical models are based on languages or
automata over sending and receiving actions. Pomsets provide a more
compact alternative by using a partial order to explicitly represent
causality and concurrency between these actions. However, pomsets
offer no representation of choices, thus a set of pomsets is required
to represent branching behavior. For example, if an agent Alice can
send one of two possible messages to Bob three times, one would need a
set of branching pomsets, with a branching
structure that can represent Alice's behavior using
To celebrate the 30th edition of EXPRESS (Expressiveness in Concurrency) and the 20th edition of SOS (Structural Operational Semantics),
we presented a retrospective view on how session types can be
expressed in a type theory for the standard
Nowadays most applications are distributed, that is, they run on several computers: a mobile device for the graphical user interface, a gateway for storing data in a local area; a remote server of a large cloud platform for resource demanding computing; an object connected to Internet in the IoT (Internet of Things); etc. For many different reasons, this makes programming much more difficult than it was when only a single computer was involved:
The former Indes now pursued in Splits team, Northwestern, and Collège de France teams are studying programming languages and have each created complementary solutions that address the aforementioned problems. Combined together, they could lead to a robust and secure execution environment for the web and IoT programming. Indes will bring its expertise in secure web programming, Collège de France its expertise in synchronous reactive programming, Northwestern its expertise in secure execution environments and run-time validation of security properties of program executions. Finally Northwestern will contribute with its expertise in medical descriptions, which will be the main application domain of the secure execution environment the participants aim to develop.
The main objective of the collaboration is the development of a robust and secure integrated programming environment for reactive applications suitable for web and IoT applications. The programming of medical prescriptions will be our favored application domain. We will base our work on three pillars: Hop.js, the contract system designed for the Racket language, and HipHop.js, a domain specific language for reactive programming within Hop.js.
Ilaria Castellani visited the team of Nobuko Yoshida at the University of Oxford for two weeks and Emilio Tuosto at the Gran Sasso Science Institute for one week.
This action exploratoire is bi-localized in Rennes and
Sophia-Antipolis.
JavaScript programs are typically executed by a JIT compiler, able to
handle efficiently the dynamic aspects of the language. However, JIT compilers are not always viable or sensible (e.g., on
constrained IoT systems, due to secured read-only memory (W
The CISC project (Certified IoT Secure Compilation) is funded by the ANR for 42 months, ending in September 2023. The goal of the CISC project is to provide strong security and privacy guarantees for IoT applications by means of a language to orchestrate IoT applications from the microcontroller to the cloud. Tamara Rezk coordinates this project, and Manuel Serrano and Ilaria Castellani participate in the project. The partners of this project are the INRIA teams Celtique and SPLiTS, and Collège de France.
SVP PEPR Cybersecurity. We participate in a project concerned with the verification of security protocols. Partners in this project are CNRS IRISA Rennes (coordinator Stéphanie Delaune), INRIA, University of Paris-Saclay, University of Lorraine, University of Côte d’Azur, ENS Rennes. The funds allocated to our team in this collaboration are 333 KEuros. The corresponding researcher for this contract is Benjamin Grégoire.
We organized the Inaugural SPliTS workshop on September 29th, 2023, in Belles Rives, Antibes.