# Major Mathematics, Statistical and Machine Learning

Statistical and machine learning methods are of first importance in several domains. Beyond pratical performance and applications as black box methods, one of the modern mathematical challenges is to provide a complete theoretical analysis and a sharp understanding of these approaches. The goal of this major is to provide a modern overview on the construction and behavior of these methods.

Courses of this major are associated to
– refresher courses at the begining of the academic year
– a curriculum core during the first semester
– two additional courses to be chosen among those proposed in the other majors (see the welcome page for a complet list).

## Neural networks

The goal of this course is twofold:

• Introduce the principles behind deep neural networks, and the associated implementation for adressing classification and regression problems.
• Propose an overview of mathematical tools associated to modern learning methods based on these networks.

The course will start with the universal approximation property of neural networks. Then, we will investigate the improvement brought by the deepness of neural newtorks for precise function approximations with a given computation cost.

Some tools allowing to deal with learning issues during the training process on large data set will be provided, together with some convergence results.

Finally, statistical results on deep neural networks generalisation garanties will be presented, both in the (classical) underfitting scenario, and in the overfitting case, lerading the ‘double-descent- phenomenon.

## Sparsity and high dimension

Sparsity and convexity are ubiquitous notions in Machine Learning and Statistics. In this course, we study the mathematical foundations of some powerful methods based on convex relaxation: L1-regularisation techniques in Statistics and Signal Processing; Nuclear Norm minimization in Matrix Completion; K-means and Graph Clustering.

These approaches turn out to be Semi-Definite representable (SDP) and hence tractable in practice.

The theoretical part of the course will focus on the performance guarantees for theses approaches and for the corresponding algorithms under the sparsity assumption. The practical part of this course will present the standard SDP solvers for these learning problems.

Keywords: L1-regularisation; Matrix Completion; K-Means; Graph Clustering; Semi-Definite Programming;

## Graphs and ecological networks

A graph, whose first use are mentionned in the 16th century, is a mathematical object widely used from the first appearance of network investigations, namely investigation of relationship between individual in wide sense. Ranging from social network to the internet, graphs are leading objects for the analysis of several data sets. Ecosystem relationships, from species relationship (prédation, interaction between plants and pollinating insects , etc…) social relationship between individuals (sociality between primates, etc…), offers several different possible applications of graphs modelling and network investigation.

In this course, we will investigate the framework of graph theory and network science. We will provide an introduction to modern research problems regarding ecosystems studies. We will use alternatively discrete mathematics, statistics and machine learning.

We will adress both theoretical and practical (case studies in ecology) questions.

Theretical keywords: Bases / definitions (graphs, path, etc…) – Metrics – Clustering methods – Spectral methods – Random graphs models – Graphical models (graphs inference) – Signal processing on graphs – Multi-level graphs (time, space, link types) – Embedding methods (optional)

Case studies : Contact network between animals. Interaction network between species in a marine and/or alpine environment. Consideration about the relevance of a graph for biodiversity support.

## Optimal transport for statistical learning

The goal of this course is to present a guideline of optimal trnasport theory and some of its application in data science.

The first part of the course will detail the Monge-Kantorovich problem and its formulation as linear progamming problem and the use of convex duality, together with distances (so called Wasserstein distances) that optimal transport allows to define on probability measures. Geodesics and barycenters in the Wassertein space, that are of first importance for interpolation adn comparison of data, will be also introduced.

A second part of the course will be dedicated to numerical methods allowing the resolution of optimal transport problems, with a specific focus on methods (in particular the Sinkhirn algorithm) adapted to high dimension and to non-structured data.

Then, the third part will present a selection of applications concerned with optimal transport and Wasserstein distances in statistical learning as, e.g., Wasserstein GANs, transfert learning, data deneration models, …