The last decade has witnessed a remarkable convergence between several sub-domains of the calculus of variations, namely optimal transport (and its many generalizations), infinite dimensional geometry of diffeomorphisms groups and inverse problems in imaging (in particular sparsity-based regularization). This convergence is due to (i) the mathematical objects manipulated in these problems, namely sparse measures (e.g. coupling in transport, edge location in imaging, displacement fields for diffeomorphisms) and (ii) the use of similar numerical tools from non-smooth optimization and geometric discretization schemes. Optimal Transportation, diffeomorphisms and sparsity-based methods are powerful modeling tools, that impact a rapidly expanding list of scientific applications and call for efficient numerical strategies. Our research program shows the important part played by the team members in the development of these numerical methods and their application to challenging problems.
Optimal Mass Transportation is a mathematical research topic which started two centuries ago with Monge's work on the “Théorie des déblais et des remblais" (see 104).
This engineering problem consists in minimizing the transport cost between two given mass densities. In the 40's, Kantorovich 113 introduced a powerful linear relaxation and introduced its dual formulation. The Monge-Kantorovich problem became a specialized research topic in optimization and Kantorovich obtained the 1975 Nobel prize in economics for his contributions to resource allocations problems. Since the seminal discoveries of Brenier in the 90's 67, Optimal Transportation has received renewed attention from mathematical analysts and the Fields Medal awarded in 2010 to C. Villani, who gave important contributions to Optimal Transportation and wrote the modern reference monographs 137, 138, arrived at a culminating moment for this theory. Optimal Mass Transportation is today a mature area of mathematical analysis with a constantly growing range of applications. Optimal Transportation has also received a lot of attention from probabilists (see for instance the recent survey 117 for an overview of the Schrödinger problem which is a stochastic variant of the Benamou-Brenier dynamical formulation of optimal transport). The development of numerical methods for Optimal Transportation and Optimal Transportation related problems is a difficult topic and comparatively underdeveloped. This research field has experienced a surge of activity in the last five years, with important contributions of the Mokaplan group (see the list of important publications of the team). We describe below a few of recent and less recent Optimal Transportation concepts and methods which are connected to the future activities of Mokaplan :
Brenier's theorem 70 characterizes the unique optimal map as the gradient of a convex
potential. As such Optimal Transportation may be interpreted as an infinite dimensional optimisation problem under “convexity constraint": i.e. the solution of this infinite dimensional optimisation problem is a convex potential. This connects Optimal Transportation to “convexity constrained" non-linear variational problems such as, for instance, Newton's problem of the body of minimal resistance.
The value function of the optimal transport problem is also known to define a distance between source and target densities called the Wasserstein distance which plays a key role in many applications such as image processing.
A formal substitution of the optimal transport map as the gradient of a convex potential in the mass conservation constraint (a Jacobian equation) gives a non-linear Monge-Ampère equation. Caffarelli 74 used this result to extend the regularity theory for the Monge-Ampère equation. In the last ten years, it also motivated new research on numerical solvers for non-linear degenerate Elliptic equations 9612160 59 and the references therein. Geometric approaches based on Laguerre diagrams and discrete data 124 have also been developed. Monge-Ampère based Optimal Transportation solvers have recently given the first linear cost computations of Optimal Transportation (smooth) maps.
In recent years, the classical Optimal Transportation problem has been extended in several directions. First, different ground costs measuring the “physical" displacement have been considered. In particular, well posedness for a large class of convex and concave costs has been established by McCann and Gangbo 103. Optimal Transportation techniques have been applied for example to a Coulomb ground cost in Quantum chemistry in relation with Density Functional theory 90. Given the densities of electrons Optimal Transportation models the potential energy and their relative positions. For more than more than 2 electrons (and therefore more than 2 densities) the natural extension of Optimal Transportation is the so called Multi-marginal Optimal Transport (see 129 and the references therein). Another instance of multi-marginal Optimal Transportation arises in the so-called Wasserstein barycenter problem between an arbitrary number of densities 44. An interesting overview of this emerging new field of optimal transport and its applications can be found in the recent survey of Ghoussoub and Pass 128.
Optimal transport has found many applications, starting from its relation with several physical models such as the semi-geostrophic equations in meteorology 108, 93, 92, 55, 120, mesh adaptation 119, the reconstruction of the early mass distribution of the Universe 101, 68 in Astrophysics, and the numerical optimisation of reflectors following the Optimal Transportation interpretation of Oliker 73 and Wang 139. Extensions of OT such as multi-marginal transport has potential applications in Density Functional Theory , Generalized solution of Euler equations 69 (DFT) and in statistics and finance 53, 102 .... Recently, there has been a spread of interest in applications of OT methods in imaging sciences 63, statistics 61 and machine learning 94. This is largely due to the emergence of fast numerical schemes to approximate the transportation distance and its generalizations, see for instance 57. Figure 1 shows an example of application of OT to color transfer. Figure 2 shows an example of application in computer graphics to interpolate between input shapes.
While the optimal transport problem, in its original formulation, is a static problem (no time evolution is considered), it makes sense in many applications to rather consider time evolution. This is relevant for instance in applications to fluid dynamics or in medical images to perform registration of organs and model tumor growth.
In this perspective, the optimal transport in Euclidean space corresponds to an evolution where each particule of mass evolves in straight line. This interpretation corresponds to the Computational Fluid Dynamic (CFD) formulation proposed by Brenier and Benamou in 54. These solutions are time curves in the space of densities and geodesics for the Wasserstein distance. The CFD formulation relaxes the non-linear mass conservation constraint into a time dependent continuity equation, the cost function remains convex but is highly non smooth. A remarkable feature of this dynamical formulation is that it can be re-cast as a convex but non smooth optimization problem. This convex dynamical formulation finds many non-trivial extensions and applications, see for instance 56. The CFD formulation also appears to be a limit case of Mean Fields games (MFGs), a large class of economic models introduced by Lasry and Lions 115 leading to a system coupling an Hamilton-Jacobi with a Fokker-Planck equation. In contrast, the Monge case where the ground cost is the euclidan distance leads to a static system of PDEs 64.
Another extension is, instead of considering geodesic for transportation metric (i.e. minimizing the Wasserstein distance to a target measure), to make the density evolve in order to minimize some functional. Computing the steepest descent direction with respect to the Wasserstein distance defines a so-called Wasserstein gradient flow, also known as JKO gradient flows after its authors 112. This is a popular tool to study a large class of non-linear diffusion equations. Two interesting examples are the Keller-Segel system for chemotaxis 111, 82 and a model of congested crowd motion proposed by Maury, Santambrogio and Roudneff-Chupin 123. From the numerical point of view, these schemes are understood to be the natural analogue of implicit scheme for linear parabolic equations. The resolution is however costly as it involves taking the derivative in the Wasserstein sense of the relevant energy, which in turn requires the resolution of a large scale convex but non-smooth minimization.
To tackle more complicated warping problems, such as those encountered in medical image analysis, one unfortunately has to drop the convexity of the functional involved in defining the gradient flow. This gradient flow can either be understood as defining a geodesic on the (infinite dimensional) group of diffeomorphisms 52, or on a (infinite dimensional) space of curves or surfaces 140. The de-facto standard to define, analyze and compute these geodesics is the “Large Deformation Diffeomorphic Metric Mapping” (LDDMM) framework of Trouvé, Younes, Holm and co-authors 52, 107. While in the CFD formulation of optimal transport, the metric on infinitesimal deformations is just the
Beside image warping and registration in medical image analysis, a key problem in nearly all imaging applications is the reconstruction of high quality data from low resolution observations. This field, commonly referred to as “inverse problems”, is very often concerned with the precise location of features such as point sources (modeled as Dirac masses) or sharp contours of objects (modeled as gradients being Dirac masses along curves). The underlying intuition behind these ideas is the so-called sparsity model (either of the data itself, its gradient, or other more complicated representations such as wavelets, curvelets, bandlets 122 and learned representation 141).
The huge interest in these ideas started mostly from the introduction of convex methods to serve as proxy for these sparse regularizations. The most well known is the
However, the theoretical analysis of sparse reconstructions involving real-life acquisition operators (such as those found in seismic imaging, neuro-imaging, astro-physical imaging, etc.) is still mostly an open problem. A recent research direction, triggered by a paper of Candès and Fernandez-Granda 77, is to study directly the infinite dimensional problem of reconstruction of sparse measures (i.e. sum of Dirac masses) using the total variation of measures (not to be mistaken for the total variation of 2-D functions). Several works 76, 98, 97 have used this framework to provide theoretical performance guarantees by basically studying how the distance between neighboring spikes impacts noise stability.
In image processing, one of the most popular methods is the total variation regularization 132, 71. It favors low-complexity images that are piecewise constant, see Figure 4 for some examples on how to solve some image processing problems. Beside applications in image processing, sparsity-related ideas also had a deep impact in statistics 134 and machine learning 48. As a typical example, for applications to recommendation systems, it makes sense to consider sparsity of the singular values of matrices, which can be relaxed using the so-called nuclear norm (a.k.a. trace norm) 47. The underlying methodology is to make use of low-complexity regularization models, which turns out to be equivalent to the use of partly-smooth regularization functionals 118, 136 enforcing the solution to belong to a low-dimensional manifold.
The dynamical formulation of optimal transport creates a link between optimal transport and geodesics on diffeomorphisms groups. This formal link has at least two strong implications that Mokaplan will elaborate on: (i) the development of novel models that bridge the gap between these two fields ; (ii) the introduction of novel fast numerical solvers based on ideas from both non-smooth optimization techniques and Bregman metrics.
In a similar line of ideas, we believe a unified approach is needed to tackle both sparse regularization in imaging and various generalized OT problems. Both require to solve related non-smooth and large scale optimization problems. Ideas from proximal optimization has proved crucial to address problems in both fields (see for instance 54, 131). Transportation metrics are also the correct way to compare and regularize variational problems that arise in image processing (see for instance the Radon inversion method proposed in 57) and machine learning (see 94).
Since its creation, the Mokaplan team has made important contributions in Optimal Transport both on the theoretical and the numerical side, together with applications such as fluid mechanics, the simulation biological systems, machine learning. We have also contributed to to the field of inverse problems in signal and image processing (super-resolution, nonconvex low rank matrix recovery). In 2022, the team was renewed with the following research program which broadens our spectrum and addresses exciting new problems.
FreeForm Optics, Fluid Mechanics (Incompressible Euler, Semi-Geostrophic equations), Quantum Chemistry (Density Functional Theory), Statistical Physics (Schroedinger problem), Porous Media.
Full Waveform Inversion (Geophysics), Super-resolution microscopy (Biology), Satellite imaging (Meteorology)
Mean-field games, spatial economics, principal-agent models, taxation, nonlinear pricing.
The Semigeostrophic equations are a frontogenesis model in atmospheric science. Existence of solutions both from the theoretical and numerical point of view is given under a change of variable involving the interpretation of the pressure gradient as an Optimal Transport map between the density of the fluid and its push forward. Thanks to recent advances in numerical Optimal Transportation, the computation of large scale discrete approximations can be envisioned. We study in 25 the use of Entropic Optimal Transport and its Sinkhorn algorithm companion.
Motivated by the Derrida-Lebowitz-Speer-Spohn (DLSS) quantum drift equation, which is the Wasserstein gradient flow of the Fisher information, we study in 27 in details solutions of the corresponding implicit Euler scheme. We also take advantage of the convex (but non-smooth) nature of the corresponding variational problem to propose a numerical method based on the Chambolle-Pock primal-dual algorithm.
In 20, a new derivation of the Euler-Lagrange equation of a total-variation regularization problem with a Wasserstein penalization is obtained, it is interesting as on easily deduces some regularity of the Lagrange multiplier for the non-negativity constraint. A numerical implementation is also described.
We propose in 32 a variational approach to approximate measures with measures uniformly distributed over a 1 dimensional set. The problem consists in minimizing a Wasserstein distance as a data term with a regularization given by the length of the support. As it is challenging to prove existence of solutions to this problem, we propose a relaxed formulation, which always admits a solution. In the sequel we show that if the ambient space is
This work 23, 30 is concerned with the recovery of piecewise constant images from noisy linear measurements. We study the noise robustness of a variational reconstruction method, which is based on total (gradient) variation regularization. We show that, if the unknown image is the superposition of a few simple shapes, and if a non-degenerate source condition holds, then, in the low noise regime, the reconstructed images have the same structure: they are the superposition of the same number of shapes, each a smooth deformation of one of the unknown shapes. Moreover, the reconstructed shapes and the associated intensities converge to the unknown ones as the noise goes to zero.
A classical tool for approximating integrals is the Laplace method. The first-order, as well as the higher-order Laplace formula is most often written in coordinates without any geometrical interpretation. In 9, motivated by a situation arising, among others, in optimal transport, we give a geometric formulation of the first-order term of the Laplace method. The central tool is the Kim–McCann Riemannian metric which was introduced in the field of optimal transportation. Our main result expresses the first-order term with standard geometric objects such as volume forms, Laplacians, covariant derivatives and scalar curvatures of two different metrics arising naturally in the Kim–McCann framework. Passing by, we give an explicitly quantified version of the Laplace formula, as well as examples of applications.
We investigate in 15 the convergence rate of the optimal entropic cost
A key inequality which underpins the regularity theory of optimal transport for costs satisfying the Ma–Trudinger–Wang condition is the Pogorelov second derivative bound. This translates to an apriori interior C1 estimate for smooth optimal maps. Here we give a new derivation of this estimate which relies in part on Kim, McCann and Warren's observation that the graph of an optimal map becomes a volume maximizing spacelike submanifold when the product of the source and target domains is endowed with a suitable pseudo-Riemannian geometry that combines both the marginal densities and the cost.
We present a new class of gradient-type optimization methods that extends vanilla gradient descent, mirror descent, Riemannian gradient descent, and natural gradient descent. Our approach involves constructing a surrogate for the objective function in a systematic manner, based on a chosen cost function. This surrogate is then minimized using an alternating minimization scheme. Using optimal transport theory we establish convergence rates based on generalized notions of smoothness and convexity. We provide local versions of these two notions when the cost satisfies a condition known as nonnegative cross-curvature. In particular our framework provides the first global rates for natural gradient descent and the standard Newton's method.
The Burer-Monteiro factorization is a classical reformulation of optimization problems where the unknown is a matrix, when this matrix is known to have low rank. Rigorous analysis has been provided for this reformulation, when solved with first-order algorithms, but second-order algorithms oftentimes perform better in practice. We have established convergence rates for a representative second-order algorithm in a simplified setting. An article is in preparation.
In 31, is analysed a stochastic primal-dual hybrid gradient for large-scale inverse problems (with application mostly to medical imaging), which was proposed some years ago by A. Chambolle and collaborators. The new result describes how the parameters can be modified/updated at each iteration in a way which still ensures (almost sure) convergence, and proposes some heuristic rules which fit into the framework and effectively improve the rate of convergence in practical experiments. The proceeding 21, in collaboration with U. Bordeaux, shows some convergence guarantees for a particular implementation of a “plug-and-play” image restoration method, where the regularizer for inverse problems is based on a denoising neural network. A more developed journal version has been submitted 39.
The proceeding 22, in collaboration with the computer imaging group at TU Graz (Austria), implements as a toy model a stochastic diffusion equation for sampling image priors based on Gaussian Mixture models, with exact formulas.
In a different direction, the proceeding 19, also with TU Graz, considers the issue of parameters learning for a better discretization of variational regularizers allowing for singularities (the “total-generalized-variation” of Bredies, Kunisch and Pock). A theoretical analysis of this model and of more standard total-variation regularization models is found in the new preprint 37, which introduces a novel approach (and much simpler than the previous ones) for studying the stability of the discontinuity sets in elementary denoising models.
The publications 14, 17, 16, 33 are related to “free discontinuity problems” in materials science, with application to fracture growth or shape optimisation. In 17, 16 we discuss compactness for functionals which appear in the variational approach to fracture, in particular 16 is a new, very general, and in some sense more natural proof of compactness with respect to the previous results. The preprint 33 was submitted to the proceedings of the ICIAM conference, and it contains in an appendix a relatively simple presentation (in a simpler case) of the proof of Poincaré / Poincaré-Korn inequalities with small jump set developped in the past 10 years by A. Chambolle.
An application to shape optimization (of an object dragged in a Stokes flow) is presented in 14, while 34, 35 address other type of shape optimization problems in a more classical framework.
The new article 18 of Chambolle, DeGennaro, Morini generalizes to non-homogeneous flows an implicit approach for mean curvature flow of surfaces introduced in the 1990's by Luckhaus and Sturzenhecker, and Almgren, Taylor and Wang. Current developments in the fully discrete case are under study, with striking results which should appear in 2024, in the meantime, a short description of the possible anisotropies (or surface tension) which arise on discrete lattices was published in 13.
A different dynamics of interfaces, based on
In 41 we consider the problem of optimal approximation of a target measure by an atomic measure with
We introduce a time discretization for Wasserstein gradient flows based on the classical Backward Differentiation Formula of order two. The main building block of the scheme is the notion of geodesic extrapolation in the Wasserstein space, which in general is not uniquely defined. We propose several possible definitions for such an operation, and we prove convergence of the resulting scheme to the limit PDE, in the case of the Fokker-Planck equation. For a specific choice of extrapolation we also prove a more general result, that is convergence towards EVI flows. Finally, we propose a variational finite volume discretization of the scheme which numerically achieves second order accuracy in both space and time.
Using the dual formulation only, we show that regularity of unbalanced optimal transport also called entropy-transport inherits from regularity of standard optimal transport. We then provide detailed examples of Riemannian manifolds and costs for which unbalanced optimal transport is regular.Among all entropy-transport formulations, Wasserstein-Fisher-Rao metric, also called Hellinger-Kantorovich, stands out since it admits a dynamic formulation, which extends the Benamou-Brenier formulation of optimal transport. After demonstrating the equivalence between dynamic and static formulations on a closed Riemannian manifold, we prove a polar factorization theorem, similar to the one due to Brenier and Mc-Cann. As a byproduct, we formulate the Monge-Ampère equation associated with Wasserstein-Fisher-Rao metric, which also holds for more general costs.
We propose an entropic approximation approach for optimal transportation problems with a supremal cost.
We establish
We study the quantitative stability of the mapping that to a measure associates its pushforward measure by a fixed (non-smooth) optimal transport map. We exhibit a tight Hölder-behavior for this operation under minimal assumptions. Our proof essentially relies on a new bound that quantifies the size of the singular sets of a convex and Lipschitz continuous function on a bounded domain.
Wasserstein barycenters define averages of probability measures in a geometrically meaningful way. Their use is increasingly popular in applied fields, such as image, geometry or language processing. In these fields however, the probability measures of interest are often not accessible in their entirety and the practitioner may have to deal with statistical or computational approximations instead. In this article, we quantify the effect of such approximations on the corresponding barycenters. We show that Wasserstein barycenters depend in a Hölder-continuous way on their marginals under relatively mild assumptions. Our proof relies on recent estimates that quantify the strong convexity of the dual quadratic optimal transport problem and a new result that allows to control the modulus of continuity of the push-forward operation under a (not necessarily smooth) optimal transport map.
We investigate the notion of Wasserstein median as an alternative to the Wasserstein barycenter, which has become popular but may be sensitive to outliers. In terms of robustness to corrupted data, we indeed show that Wasserstein medians have a breakdown point of approximately 1 2. We give explicit constructions of Wasserstein medians in dimension one which enable us to obtain
CIFRE PhD thesis scholarship (
He is also one of the 4 editors of “Interfaces and Free Boundaries”.