Research in Parkas focuses on the design, semantics, and compilation of
programming languages which allow going from parallel deterministic
specifications to target embedded code executing on sequential or multi-core
architectures.
We are driven by the ideal of a mathematical and executable language used
both to program and simulate a wide variety of systems, including real-time
embedded controllers in interaction with a physical environment (e.g.,
fly-by-wire, engine control), computationally intensive applications (e.g.,
video), and compilers that produce provably correct and efficient code.
The team bases its research on the foundational work of Gilles Kahn on the semantics of deterministic parallelism, the theory and practice of synchronous languages and typed functional languages, synchronous circuits, modern (polyhedral) compilation, and formal models to prove the correctness of low-level code.
To realize our research program, we develop languages (Lucid Synchrone,
ReactiveML, Lucy-n, Zelus), compilers,
contributions to open-source projects (Sundials/ML), and formalizations in Interactive Theorem
Provers of language semantics (Vélus and n-synchrony).
These software projects constitute essential “laboratories”: they
ground our scientific contributions, guide and validate our research
through experimentation, and are an important vehicle for long-standing
collaborations with industry.
We study the definition of languages for reactive and Cyber-Physical Systems in which distributed control software interacts closely with physical devices. We focus on languages that mix discrete-time and continuous-time; in particular, the combination of synchronous programming constructs with differential equations, relaxed models of synchrony for distributed systems communicating via periodic sampling or through buffers, and the embedding of synchronous features in a general purpose ML language.
The synchronous language
Scade
based on synchronous languages principles, is ideal for programming embedded
software and is used routinely in the most critical applications. But
embedded design also involves modeling the control software together with
its environment made of physical devices that are traditionally defined by
differential equations that evolve on a continuous-time basis and
approximated with a numerical solver. Furthermore, compilation usually
produces single-loop code, but implementations increasingly involve multiple
and multi-core processors communicating via buffers and shared-memory.
The major player in embedded design for cyber-physical systems is
undoubtedly
Simulink,
with Modelica a new player.
Models created in these tools are used not only for simulation,
but also for test-case generation, formal verification, and
translation to embedded code.
That said, many foundational and practical aspects are not well-treated by
existing theory (for instance, hybrid automata), and current tools.
In particular, features that mix discrete and continuous time often suffer
from inadequacies and bugs.
This results in a broken development chain: for the most critical
applications, the model of the controller must be reprogrammed into either
sequential or synchronous code, and properties verified on the source model
have to be reverified on the target code.
There is also the question of how much confidence can be placed in the code
used for simulation.
We attack these issues through the development of the Zelus research
prototype, industrial collaborations with the SCADE team at
ANSYS/Esterel-Technologies, and collaboration with Modelica developers at
Dassault-Systèmes and the Modelica association.
Our approach is to develop a conservative extension of a synchronous
language capable of expressing in a single source text a model of the
control software and its physical environment, to simulate the whole using
off-the-shelf numerical solvers, and to generate target embedded code.
Our goal is to increase faithfulness and confidence in both what is actually
executed on platforms and what is simulated.
The goal of building a language on a strong mathematical basis for hybrid
systems is shared with the Ptolemy project at UC Berkeley; our approach is
distinguished by building our language on a synchronous semantics, reusing
and extending classical synchronous compilation techniques.
Adding continuous time to a synchronous language gives a richer programming
model where reactive controllers can be specified in idealized physical
time.
An example is the so called quasi-periodic architecture studied by Caspi,
where independent processors execute periodically and communicate by
sampling.
We have applied Zelus to model a class of quasi-periodic protocols and to
analyze an abstraction proposed for model-checking such systems.
Communication-by-sampling is suitable for control applications where value
timeliness is paramount and lost or duplicate values tolerable, but other
applications—for instance, those involving video streams—seek a
different trade-off through the use of bounded buffers between processes.
We developed the n-synchronous model and the programming language
Lucy-n to treat this issue.
We develop compilation techniques for sequential and multi-core processors, and efficient parallel run-time systems for computationally intensive real-time applications (e.g., video and streaming). We study the generation of parallel code from synchronous programs, compilation techniques based on the polyhedral model, and the exploitation of synchronous Single Static Assignment (SSA) representations in general purpose compilers.
We consider distribution and parallelism as two distinct concepts.
We also see a strong relation between the foundations of synchronous languages and the design of compiler intermediate representations for concurrent programs. These representations are essential to the construction of compilers enabling the optimization of parallel programs and the management of massively parallel resources. Polyhedral compilation is one of the most popular research avenues in this area. Indirectly, the design of intermediate representations also triggers exciting research on dedicated runtime systems supporting parallel constructs. We are particularly interested in the implementation of non-blocking dynamic schedulers interacting with decoupled, deterministic communication channels to hide communication latency and optimize local memory usage.
While distribution and parallelism issues arise in all areas of computing, our programming language perspective pushes us to consider four scenarios:
We work on a multitude of research experiments, algorithms and prototypes related to one or more of these scenarios. Our main efforts focused on extending the code generation algorithms for synchronous languages and on the development of more scalable and widely applicable polyhedral compilation methods.
Compilers are complex software and not immune from bugs. We work on validation and proof tools for compilers to relate the semantics of source programs with the corresponding executable code.
The formal validation of a compiler for a synchronous language, or more generally for a language based on synchronous block diagrams, promises to reduce the likelihood of compiler-introduced bugs, the cost of testing, and also to ensure that properties verified on the source model hold of the target code. Such a validation would be complementary to existing industrial qualifications which certify the development process and not the functional correctness of a compiler. The scientific interest is in developing models and techniques that both facilitate the verification and allow for convenient reasoning over the semantics of a language and the behavior of programs written in it.
Most embedded systems evolve in an open, noisy environment that they only perceive through noisy sensors (e.g., accelerometers, cameras, or GPS).
Another level of uncertainty comes from interactions with other autonomous entities (e.g., surrounding cars, or pedestrians crossing the street).
Yet, to date, existing tools for cyber-physical system have had limited support for modeling uncertainty, to simulate the behavior of the systems, or to infer parameters from noisy observations.
The classic approach consists in hand-coding robust stochastic controllers.
But this solution is limited to well-understood and relatively simple tasks like the lane following assist system.
However, no such controller can handle, for example, the difficult to anticipate behavior of a pedestrian crossing the street.
A modern alternative is to rely on deep-learning techniques.
But neural networks are black-box models that are notoriously difficult to understand and verify.
Training them requires huge amounts of curated data and computing resources which can be problematic for corner-case scenarios in embedded control systems.
Over the last few years, Probabilistic Programming Languages (PPL) have been introduced to describe probabilistic models and automatically infer distributions of parameters from observed data. Compared to deep-learning approaches, probabilistic models show great promise: they overtly represent uncertainty, and they enable explainable models that can capture both expert knowledge and observed data.
A probabilistic reactive language provides the facilities of a synchronous language to write control software, with probabilistic constructs to model uncertainties and perform inference-in-the-loop. This approach offers two key advantages for the design of embedded systems with uncertainty: 1) Probabilistic models can be used to simulate an uncertain environment for early stage design and incremental development. 2) The embedded controller itself can rely on probabilistic components which implement skills that are out of reach for classic automatic controllers.
Embedded control software defines the interactions of specialized hardware with the physical world. It normally ticks away unnoticed inside systems like medical devices, trains, aircraft, satellites, and factories. This software is complex and great effort is required to avoid potentially serious errors, especially over many years of maintenance and reuse.
Engineers have long designed such systems using block diagrams and state
machines to represent the underlying mathematical models.
One of the key insights behind synchronous programming languages is that
these models can be executable and serve as the base for simulation,
validation, and automatic code generation.
This approach is sometimes termed Model-Based Development (MBD).
The SCADE language and associated code generator allow the application of
MBD in safety-critical applications.
They incorporate ideas from Lustre, Lucid Synchrone, and other programming languages.
Modern embedded systems are increasingly conceived as rich amalgams of software, hardware, networking, and physical processes. The terms Cyberphysical System (CPS) or Internet-of-Things (IoT) are sometimes used as labels for this point of view.
In terms of modeling languages, the main challenges are to specify both
discrete and continuous processes in a single hybrid language, give
meaning to their compositions, simulate their interactions, analyze the
behavior of the overall system, and extract code either for target control
software or more efficient, possibly online, simulation.
Languages like Simulink and Modelica are already used in the design and
analysis of embedded systems; it is more important than ever to understand
their underlying principles and to propose new constructs and analyses.
Sundials/ML is a comprehensive OCaml interface to the Sundials suite of numerical solvers (CVODE, CVODES, IDA, IDAS, KINSOL). Its structure mostly follows that of the Sundials library, both for ease of reading the existing documentation and for adapting existing source code, but several changes have been made for programming convenience and to increase safety, namely:
solver sessions are mostly configured via algebraic data types rather than multiple function calls,
errors are signalled by exceptions not return codes (also from user-supplied callback routines),
user data is shared between callback routines via closures (partial applications of functions),
vectors are checked for compatibility (using a combination of static and dynamic checks), and
explicit free commands are not necessary since OCaml is a garbage-collected language.
Heptagon is an experimental language for the implementation of embedded real-time reactive systems. It is developed inside the Synchronics large-scale initiative, in collaboration with Inria Rhones-Alpes. It is essentially a subset of Lucid Synchrone, without type inference, type polymorphism and higher-order. It is thus a Lustre-like language extended with hierchical automata in a form very close to SCADE 6. The intention for making this new language and compiler is to develop new aggressive optimization techniques for sequential C code and compilation methods for generating parallel code for different platforms. This explains much of the simplifications we have made in order to ease the development of compilation techniques.
The current version of the compiler includes the following features: - Inclusion of discrete controller synthesis within the compilation: the language is equipped with a behavioral contract mechanisms, where assumptions can be described, as well as an "enforce" property part. The semantics of this latter is that the property should be enforced by controlling the behaviour of the node equipped with the contract. This property will be enforced by an automatically built controller, which will act on free controllable variables given by the programmer. This extension has been named BZR in previous works. - Expression and compilation of array values with modular memory optimization. The language allows the expression and operations on arrays (access, modification, iterators). With the use of location annotations, the programmer can avoid unnecessary array copies.
ZRun is an executable semantics of a synchronous data-flow language. It takes the form of a purely functional interpreter and is implemented in OCaml. The input of Zrun is a large subset of the language Zélus, but only its discrete-time (synchronous) subset. The basic primitives are those of Lustre: a unit non-initialized delay (pre), the initialization operator (->), the initialized delay (fby), and streams can be defined by mutually recursive definitions. It also provides richer programming constructs that were introduced in Lucid Synchrone and Scade 6, but are not in Lustre: the by-case definition of streams, the last computed value of a signal, hierarchical automata with parameters, stream functions with static parameters that are either know at compile time or at instanciation time, and two forms of iteratiors on arrays: the "forward" to perform an iteration in time, the "foreach" to perform an iteration on space.
The objective of this prototype is to give a reference executable semantics that is independent of a compiler. It can be used, e.g., as an oracle for compiler testing, to execute unfinished programs or programs that are semantically correct but are statically rejected by the compiler.
Branch Master (2000) - v1.x. - first-order language, streams, hierarchical automata, by-case definition of streams, operator last.
Branch Works (2023): - v2.x - static higher-order, hierarchical automata with parameters, valued signals. - arrays, - "forward" and "foreach" iterations.
Vélus is a compiler for a subset of
Lustre and Scade that is specified in the Coq 31
Interactive Theorem Prover (ITP). It integrates the CompCert C compiler
27, 25 to define the semantics of
machine operations (integer addition, floating-point multiplication,
etcetera) and to generate assembly code for different architectures. The
research challenges are to
Work continued this year on this long-running project in two main directions: improving the compilation of shared variables, and developing constructive denotational models to facilitate interactive verification.
This year, in the context of last value, are a useful enhancment of state machines.
They allow access to the previous value of a variable relative to the whole state machine, rather than just the current state.
Importantly, they permit implicit completion: for a shared variable x, if no explicit definition is given, then x = last x.
This construct was already added to the Coq-based semantics, but it was compiled away early in the compiler, which often results in unnecessary copying in the generated code.
This year, we modified the compiler so that last variables are carried right through to the back-end passes which faciliates the elimination of redundant assignments and the associated correctness proofs.
This improvement required changes to all the intermediate languages, semantic definitions, and compilation algorithms.
In particuarly, we found it necessary to modify the Stc intermediate language to include both `next` and `last` definitions.
In terms of expressivity, only one such form is necesary, but having both facilitates optimizations and their correctness proofs, at the expense of more complicated semantics and scheduling.
To date we have focused on proving the correctness of compilation passes.
This involves specifying semantic models to define the input/output relation
associated with a program, implementing compilation functions to transform
the syntax of a program, and proving that the relation is unchanged by the
functions. In addition to specifying compiler correctness, semantic models
can also serve as a base for verifying individual programs. The challenge is
to present and manipulate such detailed specifications in interactive
proofs. The potential advantage is to be able to reason on abstract models
and to obtain, via the compiler correctness theorem, proofs that apply to
generated code. Making this idea work requires solving several scientific
and technical challenges. It is the subject of
This year we continued developing a Kahn-style semantics in Coq using C. Paulin-Mohring's library 29. The model now treats the dataflow core of Lustre as presented in our EMSOFT 2021 article 26 with the generalization to enumerated types. We show that, under specific conditions, the denotational model satisfies the relational predicates used in the compiler correctness proof. This allows us to strengthen the overall compiler correctness theorem. Rather than state “If a semantics exists for a program, then it is preseved by the generated code”, we show that “Under specific conditions, a semantics exists and it is preserved by the generated code”. The “specific conditions” are, as usual, that the source program satisfies typing and clock typing rules, but also, that it is not subject to run-time errors. Run-time errors cannot be ignored in our context of end-to-end proof. The CompCert definitions for several arithmetic and logical operators are partial, for example, integer division by zero is not defined. Such partiality simply propogates to the the Vélus relational model, but the denotational model is a total function and operator failures must thus be modeled explicitly. We expressed the absence of run-time errors as a predicate over the dynamic behavior of a program. We implemented a simple static analysis, that nevertheless suffices for many practical programs, and showed that it is a sufficient condition for the absence of run-time errors. The next version of the Vélus compiler will now print warning messages if the source program uses features not treated in the denotational model or if the simple static analysis cannot guarantee the absence of errors. In this case, it becomes the user's responsability to show that run-time errors cannot occur. Our denotational model clarifies several points about the clock typing and Kahn semantics of the function reset operator. We are currently formalizing an alternative model for the function reset operator to further improve our understanding of this topic. We have started drafting an article on these results.
External collaborators:
Michel Angot,
Vincent Bregeon,
and
Matthieu Boitrel,
(Airbus).
It is sometimes desirable to compile a single synchronous language program into multiple tasks for execution by a real-time operating system. We have been investigating this question from three different perspectives.
In this approach, the top-level node of a Lustre program is
distinguished from inner nodes. It may contain special annotations
to specify the triggering and other details of node instances from
which separate “tasks” are to be generated. Special operators are
introduced to describe the buffering between top-level
instances. Notably, different forms of the when and
current operators are provided. Some of the operators are
under-specified and a constraint solver is used to determine their
exact meaning, that is, whether the signal is delayed by zero, one,
or more cycles of the receiving clock, which depends on the
scheduling of the source and destination nodes. Scheduling is
formalized as a constraint solving problem based on latency
constraints between some pairs of input/outputs that are specified
by the designer.
This year we presented our previous results at ECRTS 2023 16.
We also looked more closely at the possibilty of eliminating inter-period
instantaneous cycles by adding constraints to the ILP scheduling problem.
This problem is related to the detection of feedback arc sets for which
there are two well-known encodings. Unfortunately, they can both induce a
very large number of additional variables and constraints in the ILP
encoding. This is not surprising since the base problem is NP-hard. We thus
worked on mitigating heuristics. On the positive side, it turns out that our
existing data-dependency and end-to-end latency constraints are readily
generalized to allow for “variable concomitance” which may sometimes be
useful for breaking instantanous cycles. In particular, we can require that
the end-to-end latency along a cycle of data dependencies be strictly
greater than zero. The ILP solver is then free to break dependencies by
either scheduling the components in different phases or choosing
concomitance values to prevent cycles during microscheduling. We presented
these preliminary results at the Synchron 2023 workshop.
In the collaboration with Airbus we extended the prototype compiler with a hyper-period
expansion pass to permit an integration with the Lopht compiler.
In the collaboration with Airbus we extended the prototype compiler with a hyper-period
expansion pass to permit an integration with the Lopht compiler.
This work is funded by direct industrial contracts with Airbus.
Zelus is our laboratory to experiment our research on programming languages for hybrid systems. It is devoted to the design and implementation of systems that may mix discrete-time/continuous-time signals and systems between those signals. It is first a synchronous language reminiscent of Lustre and Lucid Synchrone with the ability to define functions that manipulate continuous-time signals defined by Ordinary Differential Equations (ODEs) and zero-crossing events. The language is functional in the sense that a system is a function from signals to signals (not a relation). It provides some features from ML languages like higher-order and parametric polymorphism as well as dedicated static analyses.
The language, its compiler and examples (release 2.1) are on GitHub. It is also available as an OPAM package. All the installation machinery has been greatly simplified.
The implementation of Zelus is now relatively mature. The language has been used in a collection of advances projects; the most important of the recent years being the design and implementation of ProbZelus on top of Zelus. This experiment called for several internal deep changes in the Zelus language.
One of the biggest troubles we faced when implementing Zélus was the
lack of a tool to automatically test the compiler and to prototype
language extensions before finding how to incorporate in the language
and how to compile them. This is what motivated first our work on an
executable semantics. The tool Zrun works well now. It is detailled in the Section below. Based on it, we have started a new implementation of Zélus with the objective that every pass of the compiler can be tested, using Zrun as an oracle.
External collaborators:
In 2023, we have finished an experiment that stated right after the COVID, during the preparation of a Master course given at University of Bamberg, in July 2020 (M. Pouzet, as an invited Professor). This work has been presented at the EMSOFT conference this year, in September 2023 and at the SYNCHRON Workshop, in November 2023, in Kiel (Gernany). It is published in the ACM TECS journal.
The purpose of this work is the definition of a formal and executable semantics for a reactive language that can be used as an oracle for compiler testing and the formal verification of compiler steps. We have considered a comprehensive synchronous language with programming constructs that exist in several compilers (developed at PARKAS and elsewhere): its core is a language subset reminiscent of Lustre, including the definition of stream functions and streams defined by mutually recursive defitions, the point-wise application of combinational operations, the delay operator. It is extended with constructs that do not exist in Lustre, like the by-case definitions of streams, hierarchical automata and the modular reset. Those construct a part of Vélus, Scade and Zélus (developed at PARKAS), for example, and LustreC (developed at ENSEEIHT, Toulouse). Two main approaches have been considered for defining the semantics of a language with such constructs in the litterature: (i) an indirect collapsing semantics based on a source-to-source translation of high-level constructs into a data-flow core language whose semantics is precisely specified and is the entry for code generation; (ii) a relational synchronous semantics, either state-based or stream-based, that applies directly to the source. It defines what is a valid synchronous reaction but hides, on purpose, if a semantics exists, is unique and can be computed. Hence, it is not executable and can thus not be used for compiler testing.
In this work, we define an executable semantics for a language that has all the above programming constructs all together. It applies directly to the source language before static checks and compilation steps. It is constructive in the sense that the language in which the semantics is defined is a statically typed functional language with call-by-value and strong normalization, e.g., it is expressible in a proof-assistant where all functions terminate. It leads to a reference, purely functional, interpreter. This semantics is modular and can account for possible errors, allowing to establish what property is ensured by each static verification performed by the compiler. It also clarifies how causality is treated in Scade compared with Esterel.
This semantics can serve as an oracle for compiler testing and validation; to prototype novel language constructs before they are implemented, to execute possibly unfinished models or that are correct but rejected by the compiler; to prove the correctness of compilation steps.
In term of expressiveness, we went a little further with the treatment of
array operators and two forms of iterations, the iteration on time, named
forward and the iteration in space, named foreach. The former has been studied by B. Pauget in his PhD. thesis: it constists in iterating a stream function on an array, interpreted as a finite stream and is reminiscent of "time refinement" (Caspi et Mikac, 2005; Mandel, Pouzet and Pasteur, 2015). We also added static parameters and a limited form of higher-order (functions of functions but no streams of functions). Those extensions are part of the source code distribution.
Our long term objective is to define an executable semantics for the Zélus language, dealing with both discrete and continuous-time constructs. For the moment, only the discrete-time subset of Zélus language is considered. Treating the whole language would lead to the very first operational semantics for a hybrid systems modeling language.
The semantics is implemented as an interpreter in a purely functional style, in OCaml. The source code of this development is available at the Zrun repository.
External collaborators:
Jean-Louis Colaco (ANSYS, Toulouse).
In his PhD., Baptiste Pauget studied the design, semantics and compilation of a reactive language extended with array operators. Those operations are used in classical control systems applications (e.g., Kalman filtering, linear algebra operations) and more recent ones that involves optimization algorithms and machine learning algorithems (e.g., neural networks). Existing languages, e.g., Lustre and Scade but more widely all the existing block-diagram languages used for model-based design, e.g, Simulink, are too limited in term of expressiveness and modularity. But more problematically, the generated code is not as efficient as it should be. The consequence is that designer may have to model its system in one language (e.g., Simulink, Scade) and to re-implement it into C code. One difficulty is that the generated code contains many useless copies for arrays that are difficult to remove. This problem exist in all purely functional language: how to generate code for functional arrays with in-place modifications and a compile-time static allocation of memory.
Baptiste Pauget has addressed three aspects, with the support of a compiler prototype. (i) He developed a Hindley-Milner type system specifying sizes in the form of multivariate polynomials. This proposal makes it possible to verify and infer most sizes in a modular way. (ii) He explored an alternative compilation method, based on a memory-aware declarative language named MADL. It aims to reconcile the data flow style with precise specification of memory locations. The modular size description is a key element of this. In this language, copies must be explicit. Several programming constructs (e.g., concat, append, reverse, transpose, etc.) do not generate any code. They define a special view of a memory location. MADL comes with a original type system that associate a location to every expression. Type checking ensure that programs can be statically scheduled. (iii) Finally, he proposed an iteration construction inspired by Sisal which complements current iterators. By treating tables as finite sequences, it gives access to Scade's sequential constructions (automata) during iterations. In addition, it makes it possible to describe in a declarative manner efficient implementations of algorithms such as the Cholesky decomposition. This controllable compilation is a necessary first step for compiling to GPUs.
In his PhD. thesis started in 2023, Grégoire Bussone pursues this work on the design, semantics and implentation of a synchronous language, dealing with aggressive optimization techniques.
External collaborators:
Jean-Louis Colaco (ANSYS, Toulouse).
In this work, we present a compile-time analysis for tracking the size of data-structures in a statically typed and strict functional language. This information is valuable for static checking and code generation. Rather than relying on dependent types, we propose a type-system close to that of ML: polymorphism is used to define functions that are generic in types and sizes; both can be inferred. This approach is convenient, in particular for a language used to program critical embedded systems, where sizes are indeed known at compile-time. By using sizes that are multivariate polynomials, we obtain a good compromise between the expressiveness of the size language and its properties (verification, inference).
We define a minimal functional language that is sufficient to capture size constraints in types, present its dynamic semantics, the type system and inference algorithm. Last, we sketch some practical extensions that matter for a more realistic language.
This work has been presented at the conference ARRAY (associated with PLDI) in June 2023, the international workshop SYNCHRON, in December 2023; at a seminar of the GDR GPL (group compilation). It is published by ACM (ARRAY'23). This work is part of the PhD. thesis of B. Pauget defended in December 2023 28.
Grégoire Bussone stated his PhD. in April 2023. He studies the use of translation validation techniques applied to a realistic synchronous language compiler. The objective is to deal with the compilation of array operations and, more generally, memory location. Arrays are not supported in Vélus for the moment. The problem is difficult and occurs in two situations: avoid copies for functional iterators (e.g., map, fold, transpose, concat, reverse); optimize the representation of the state in the final target code (e.g., C) and avoid useless copies for states whose lifetime never intersect (a classical situation that comes for a Scade-like hierarchical automaton where all states are entered by reset). For this work, we follow a translation validation approach, relying on an untrusted compiler and an independent but trustable validation step. We also target a richer and type-safe language back-end (here Rust) instead of C to transmit some of the invariants from the source. In the longer term, the purpose is to be able to implement and to machine-check the correctness of compilation techniques for a synchronous language with arrays and their efficient compilation.
During year 2023, several compilation steps that are implemented in the Zélus compiler have been implemented as translation validation functions proved correct in Coq, notably the inlining, renaming, scheduling, normalization. Internally, the technique employs the "locally nameless representation" introduced by Chargueraud. The input language is, for the moment, a simple subset of Zélus. The treatment of MADL is under way.
External collaborators:
Louis Mandel (IBM), Erik Atkinson, Michael Carbin and Ellie Y. Cheng (MIT), Waïss Azizian, Marc Lelarge (Inria), Christine Tasson (ISAE-Supaero).
Synchronous languages were introduced to design and implement real-time embedded systems with a (justified) enphasis on determinacy. Yet, they interact with a physical environment that is only partially known and are implemented on architectures subject to failures and noise (e.g., channels, variable communication delays or computation time). Dealing with uncertainties is useful for online monitoring, learning, statistical testing or to build simplified models for faster simulation. Actual synchronous and languages provide limited support for modeling the non-deterministic behaviors that are omnipresent in embedded systems. ProbZelus is a probabilistic extension of the synchronous language Zelus for the design of reactive probabilistic models in interaction with an environment.
This year we continued this project along three main directions: 1) new semantics models 2) static analysis for semi-symbolic inference, and 3) embedding ProbZelus ideas in Julia.
In ProbZelus, the semantics of probabilistic models is only defined for scheduled equations. This is a significant limitation compared to synchronous dataflow languages where sets of mutually recursive equations are not ordered. This is a key requirement for commercial synchronous data-flow languages where programs are written using a block diagram graphical interface. Scheduling should not depend on the placement of the blocks which motivate their definition as mutually recursive equations. Besides, the compiler implements a series of source-to-source transformations which often introduces new variables in arbitrary order. Scheduling local declarations is one of the very last compilation passes. The original semantics of ProbZelus is thus far from what is exposed to the programmer and prevents reasoning about most program transformations and compilation passes.
Building on existing semantics for deterministic synchronous languages, we proposed two schedule agnostic semantics for ProbZelus. The key idea is to interpret probabilistic expressions as a stream of un-normalized density functions which maps random variable values to a result and positive score. The co-iterative semantics extends the original semantics to interpret mutually recursive equations using a fixpoint operator. The relational semantics directly manipulates streams and is thus a better fit to reason about program equivalence. We use the relational semantics to prove the correctness of a program transformation required to run the Assumed Parameter Filter (APF) an optimized inference algorithm for state-space models with constant parameters.
A preliminary version of this work is available online 24. The work on the APF-based inference engine (static analysis, compilation, runtime) was presented By G. Bussone at the Journées Francophones des Langages Applicatifs (JFLA) 2023 18.
(with Erik Atkinson, Michael Carbin, L. Mandel).
Advanced probabilistic inference algorithms combine exact and approximate inference to improve performance in probabilistic programs, and often use various heuristics and optimizations. The inference engine tries to compute exact solution as much as possible and falls back to approximate sampling when symbolic computations fail. The dynamic nature of these systems comes at a cost: 1) The heuristics are not guaranteed to be globally optimal, and 2) inference behavior is unpredictable.
We propose a new probabilistic language with a semi-symbolic inference engine based on our previous work on Delayed Sampling and Semi-Symbolic inference. In this language the user can annotate the program with constraints on the random variable representation (e.g., sampled or symbolic). A specialized static analysis then checks at compile time if these constraints are satisfiable.
A short version of this work was presented at the VeriProP workshop at the International Conference on Computer Aided Verification (CAV) 2023 23.
(with Marc Lelarge and Waïss Azizian).
We continued our work on OnlineSampling.jl. OnlineSampling.jl is an embedded reactive probabilistic language in Julia. Inspired by ProbZelus we designed a domain specific language for describing reactive probabilistic models using Julia macros. Following ProbZelus ideas, the inference method is a Rao-Blackwellised particle filter, a semi-symbolic algorithm which tries to analytically compute closed-form solutions, and falls back to a particle filter when symbolic computations fail. For Gaussian random variables with linear relations, we use belief propagation instead of delayed sampling if the factor graph is a tree. We can thus compute exact solutions for a broader class of models.
This work was accepted at the SPIGM workshop at the International Conference on Machine Learning (ICML) 20
Our work on multi-clock Lustre programs is funded by contracts with Airbus.
The ANR JCJC project “FidelR” led by T. Bourke began in 2020 and ended in December 2023.