COMPiLE Reading Group

We organize a reading group on COMPutational LEarning, specifically on topics related to Numerics/Matrix Theory/Graph Theory with particular emphasis on their application/connection with Machine Learning, Data Mining and Network Science.

We meet regularly to read a paper of common interest, together, with the goal of creating some motivation for staying up to date with the new papers or to catch up with some papers that you wanted to read but postponed for some time. Recently, as an alternative, we have decided to let people present their own research work, so that everyone in the group knows what the others are doing, with the aim of fostering exchange of ideas and collaborations.

If you are interested in participating please complete the form below to be to be notified about the meetings:
Google Form: Subscription to Reading Group mailing list

If you have suggestions for papers to ready you can propose your ideas here:
Google Form: Suggestions for papers to read

If you want inspiration for a possible paper to read, here is a list of the papers that have been suggested so far:
Suggested papers for COMPiLE Reading Group

The reading group is organized by our early career group members and supervised/coordinated by Nicola Guglielmi and Francesco Tudisco.

Organizer Period
Bernardo Collufio, GSSI
December 2024 -- Present
Piero Deidda, GSSI
January 2024 -- Present
Arturo De Marinis, GSSI
February 2023 -- Present
Dayana Savostianova, GSSI
November 2021 -- Summer 2022

Academic Year 2024/25

Presenter Title, Abstract & Other Info Date
Arturo De Marinis Approximation properties of neural ODEs

We study the universal approximation property (UAP) of shallow neural networks whose activation function is defined as the flow of a neural ODE. We prove the UAP for the space of such shallow neural networks in the space of continuous functions. In particular, we also prove the UAP with the weight matrices constrained to have unit norm. Furthermore, we are able to bound from above the Lipschitz constant of the flow of the neural ODE, that tells us how much a perturbation in input is amplified or shrunk in output. If the upper bound is large, then so it may be the Lipschitz constant, leading to the undesirable situation where certain small perturbations in input cause large changes in output. Therefore, we compute a perturbation to the weight matrix of the neural ODE such that the flow of the perturbed neural ODE has Lipschitz constant bounded from above as we desire. This leads to a stable flow and so to a stable shallow neural network. However, the stabilized shallow neural network with unit norm weight matrices does not satisfy the universal approximation property anymore. Nevertheless, we are able to prove approximation bounds that tell us how poorly and how accurately a continuous target function can be approximated by the stabilized shallow neural network.
November 27, 2024

Academic Year 2023/24

Presenter Title, Abstract & Other Info Date
Angelo Alberto Casulli Learning Poisson Equation with Rank-Structured Matrices

We explore how rank-structured matrices can be leveraged to efficiently represent and learn elliptic operators, with a focus on the 1D Poisson equation as a model problem. We begin by introducing key concepts in numerical linear algebra, including the use of rank-structured matrices to approximate the discretized Poisson operator. The talk then delves into techniques from randomized linear algebra, showing how these methods can be applied to recover rank-structured matrices from matrix-vector products. This approach enables the reconstruction of the Poisson operator’s discretization by solving only a small number of Poisson equations. Finally, we extend the discussion to the 2D Poisson problem, highlighting the additional challenges that arise in this more complex setting.
October 23, 2024
Francesco Fabbri β-Variational Autoencoder Graph Neural Network for Synthetic Generation of Abdominal Aorta Aneurisms

Synthetic data, particularly in the medical field, is artificially created to replicate real medical data's statistical properties and patterns. This process is essential for addressing privacy concerns, as it allows researchers to work with data that mirrors real patient information without compromising individual privacy. Generative algorithms play a crucial role in this process by leveraging the probability distribution of the original data to create synthetic data through random sampling from features-rich latent spaces. Creating synthetic medical data from a small patient population by extracting key anatomical features offers a viable solution to the challenge of obtaining large databases of real patients. This approach allows for the development of a virtual patient population that encompasses a wide range of anatomical and physiological conditions, facilitating extensive clinical tests and statistical analyses. The medical field increasingly demands patient-specific testing, necessitating extensive validation before adopting new medical processes. While linear approaches are robust, they cannot often capture non-linear features leading to large latent spaces. In contrast, neural network-based reduced-order models can effectively capture these non-linear features with a reduced latent space. Concurrently, the use of generative algorithms and these advanced modeling tools aids in developing effective data generators. In this work, starting from a reduced real-word dataset, we propose a β-Variational Autoencoder Graph Neural Network-based framework that will allow for the creation of synthetic Abdominal Aorta Aneurisms (AAA) that maintain the statistical correlations and anatomical relationships present in real-world datasets using a small dimension latent space.
October 16, 2024
Rinat Kamalov Non-Chebyshev Uniform Approximation

Consider the problem of the best uniform approximation of a continuous function by linear combinations of a finite system of functions (not necessarily Chebyshev) under arbitrary linear constraints on a convex domain. By modifying the concept of alternance and of the Remez iterative procedure we present a method, which demonstrates its efficiency in numerical problems. Under certain favorable assumptions, we establish the proof of linear convergence rate. Special focus is dedicated to systems of complex exponents, Gaussian functions, Cauchy functions, lacunar algebraic and trigonometric polynomials. We explore applications to signal processing, switching dynamical systems, and to Markov-Bernstein type inequalities.
May 3, 2024
Sara Tarquini Testing Quantum and Simulated Annealers on the Drone Delivery Packing Problem

Using drones to perform human-related tasks can play a key role in various fields, such as disaster response, agriculture, defense, healthcare, and many others. The drone delivery packing problem (DDPP) arises in the context of logistics in response to increasing demand in the delivery process along with the necessity of lowering human intervention. The DDPP is usually formulated as a combinatorial optimization (CO) problem, aiming to minimize drone usage with specific battery constraints while ensuring timely consistent deliveries with fixed locations and energy budget. In the work I am presenting, we propose two alternative formulations of the DDPP as a quadratic unconstrained binary optimization (QUBO) problem and test the performance of quantum annealing (QA) approaches, as compared to simulated annealing (SA) and classical branch-and-bound global optimization tools. Motivated by the rapidly progressing technological advances realizing upgraded quantum devices, the goal is to identify, with the DDPP as a prototype of a complex CO problem, the advantages and limitations of modern quantum annealing hardware, and the extent to which quantum annealers have the potential to represent a realistic alternative to simulated annealing and deterministic approaches. The talk will start with an introduction aimed at helping the audience understand the basics of quantum annealing technology and optimization methods.
April 19, 2024
Miryam Gnazzo An oracle for the solution of nearness problems in matrix theory

Given a matrix A with a certain property, a matrix nearness problem consists in finding the closest matrix to A that loses that property. Solving nearness problems may be a challenging task and several methods have been developed for the numerical solution of this class of problems. Depending on the property associated with the nearness problem, we may need a different resolution technique. In the first part of the talk, I will provide an introduction to this class of problems, including a way to extend the notion of matrix nearness problems to the setting of matrix polynomials. In the second part, I will describe a new method for the solution of a wide class of nearness problems in matrix theory, including its extension to matrix polynomials. The key idea consists in using an oracle, which can give us information on the solution of the problem, and in combining this output with a Riemannian optimization-based solver. It is also possible to adapt the same methodology to the setting of matrix nearness problems with a prescribed linear structure for the perturbations. A few examples of this possibility will be provided during the talk.
March 14, 2024
Shafqat Ali Stabilized reduced basis methods for parametrized viscous flows

In this talk I will discuss the stability of the Reduced Basis (RB) Method. In the RB approximation of saddle point problems the Galerkin projection on the reduced space does not guarantee the inf-sup approximation stability even if a stable high fidelity method was used to generate snapshots. For problems in computational fluid dynamics, the lack of inf-sup stability is reflected by the inability to accurately approximate the pressure field. In this context, inf-sup stability is usually recovered through the enrichment of the velocity space with suitable supremizer functions. The main goal of this work is to propose an alternative approach, which relies on the residual based stabilization techniques customarily employed in the Finite Element literature, such as Brezzi-Pitkaranta, Franca-Hughes, streamline upwind Petrov-Galerkin, Galerkin Least Square. In the spirit of offline-online reduced basis computational splitting, two such options are proposed, namely offline-only stabilization and offline-online stabilization. These approaches are then compared to (and combined with) the state of the art supremizer enrichment approach. Numerical results highlight that the proposed methodology allows to obtain smaller reduced basis spaces (i.e., neglecting supremizer enrichment) for which a modified inf-sup stability is still preserved at the reduced order level.
January 23, 2024
Emanuele Zangrando Representations for partially exchangeable arrays of random variables, by David J. Aldous

The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. The purpose of this work is to propose a modern view and a general mathematical framework for loss landscapes and efficient optimization in over-parameterized machine learning models and systems of non-linear equations, a setting that includes over-parameterized deep neural networks. Our starting observation is that optimization landscapes corresponding to such systems are generally not convex, even locally around a global minimum, a condition we call essential non-convexity. We argue that instead they satisfy PL⁎, a variant of the Polyak-Łojasiewicz condition on most (but not all) of the parameter space, which guarantees both the existence of solutions and efficient optimization by (stochastic) gradient descent (SGD/GD). The PL⁎ condition of these systems is closely related to the condition number of the tangent kernel associated to a non-linear system showing how a PL⁎-based non-linear theory parallels classical analyses of over-parameterized linear equations. We show that wide neural networks satisfy the PL⁎ condition, which explains the (S)GD convergence to a global minimum. Finally we propose a relaxation of the PL⁎ condition applicable to “almost” over-parameterized systems.

Read the paper
December 6, 2023
Anton Savostianov Representations for partially exchangeable arrays of random variables, by David J. Aldous

Consider an array of random variables (Xi,j), 1 ≤ i,j < ∞, such that permutations of rows or of columns do not alter the distribution of the array. We show that such an array may be represented as functions f(α, ξi, ηj, λi,j) of underlying i.i.d. random variables. This result may be useful in characterizing arrays with additional structure. For example, we characterize random matrices whose distribution is invariant under orthogonal rotation, confirming a conjecture of Dawid.

Read the paper
November 15, 2023

Academic Year 2022/23

Presenter Title, Abstract & Other Info Date
Stefano Sicilia Why are big data matrices approximately low rank?, by Madeleine Udell and Alex Townsend

Matrices of (approximate) low rank are pervasive in data science, appearing in movie preferences, text documents, survey data, medical records, and genomics. While there is a vast literature on how to exploit low rank structure in these datasets, there is less attention paid to explaining why the low rank structure appears in the first place. Here, we explain the effectiveness of low rank models in data science by considering a simple generative model for these matrices: we suppose that each row or column is associated to a (possibly high dimensional) bounded latent variable, and entries of the matrix are generated by applying a piecewise analytic function to these latent variables. These matrices are in general full rank. However, we show that we can approximate every entry of an m \times n matrix drawn from this model to within a fixed absolute error by a low rank matrix whose rank grows as \scrO (log(m + n)). Hence any sufficiently large matrix from such a latent variable model can be approximated, up to a small entrywise error, by a low rank matrix.

Read the paper
October 17, 2023
Miryam Gnazzo Robust Rational Approximations of Nonlinear Eigenvalue Problems, by Stefan Güttel, Gian Maria Negri Porzio and Françoise Tisseur

We develop algorithms that construct robust (i.e., reliable for a given tolerance and scaling independent) rational approximants of matrix-valued functions on a given subset of the complex plane. We consider matrix-valued functions provided in both split form (i.e., as a sum of scalar functions times constant coefficient matrices) and as a black box form. We develop a new error analysis and use it for the construction of stopping criteria, one for each form. Our criterion for split forms adds weights chosen relative to the importance of each scalar function, leading to the weighted adaptive Antoulas--Anderson (AAA) algorithm, a variant of the set-valued AAA algorithm that can guarantee to return a rational approximant with a user-chosen accuracy. We propose two-phase approaches for black box matrix-valued functions that construct a surrogate AAA approximation in phase one and refine it in phase two, leading to the surrogate AAA algorithm with exact search and the surrogate AAA algorithm with cyclic Leja--Bagby refinement. The stopping criterion for black box matrix-valued functions is updated at each step of phase two to include information from the previous step. When convergence occurs, our two-phase approaches return rational approximants with a user-chosen accuracy. We select problems from the NLEVP collection that represent a variety of matrix-valued functions of different sizes and properties and use them to benchmark our algorithms.

Read the paper
July 6, 2023
Piero Deidda Spectral Decompositions using One-Homogeneous Functionals, by Martin Burger, Guy Gilboa, Michael Moeller, Lina Eckardt and Daniel Cremers

This paper discusses the use of absolutely one-homogeneous regularization functionals in a variational, scale space, and inverse scale space setting to define a nonlinear spectral decomposition of input data. We present several theoretical results that explain the relation between the different definitions. Additionally, results on the orthogonality of the decomposition, a Parseval-type identity and the notion of generalized (nonlinear) eigenvectors closely link our nonlinear multiscale decompositions to the well-known linear filtering theory. Numerical results are used to illustrate our findings.

Read the paper
May 31, 2023
Emanuele Natale On the Random Subset Sum Problem and Neural Networks

The Random Subset Sum Problem (RSSP) is a fundamental problem in mathematical optimization, especially in the understanding of the statistical behavior of integer linear programs. Recently, the theory related to the problem has found applications also in theoretical machine learning, providing key tools for the proof of the Strong Lottery Ticket Hypothesis (SLTH) for dense neural network architectures. In this talk, I will present two recent joint works that push the application of the RSSP further. First, we provide a proof of the SLTH for convolutional neural networks, generalizing the original result. Second, we leverage those ideas to provide a novel way to build neuromorphic hardware.
March 23, 2023
Arturo De Marinis The Forward-Forward Algorithm: Some Preliminary Investigations, by Geoffrey Hinton

The aim of this paper is to introduce a new learning procedure for neural networks and to demonstrate that it works well enough on a few small problems to be worth further investigation. The Forward-Forward algorithm replaces the forward and backward passes of backpropagation by two forward passes, one with positive (i.e. real) data and the other with negative data which could be generated by the network itself. Each layer has its own objective function which is simply to have high goodness for positive data and low goodness for negative data. The sum of the squared activities in a layer can be used as the goodness but there are many other possibilities, including minus the sum of the squared activities. If the positive and negative passes could be separated in time, the negative passes could be done offline, which would make the learning much simpler in the positive pass and allow video to be pipelined through the network without ever storing activities or stopping to propagate derivatives.

Read the paper
February 27, 2023

Academic Year 2021/22

Presenter Title, Abstract & Other Info Date
Pierluigi Crescenzi Proving the Lottery Ticket Hypothesis for Convolutional Neural Networks, by Arthur da Cunha, Emanuele Natale, Laurent Viennot

The lottery ticket hypothesis states that a randomly-initialized neural network contains a small subnetwork which, when trained in isolation, can compete with the performance of the original network. Recent theoretical works proved an even stronger version: every sufficiently overparameterized (dense) neural network contains a subnetwork that, even without training, achieves accuracy comparable to that of the trained large network. These works left as an open problem to extend the result to convolutional neural networks (CNNs). In this work we provide such generalization by showing that, with high probability, it is possible to approximate any CNN by pruning a random CNN whose size is larger by a logarithmic factor.

Read the paper
April 27, 2022
Arturo De Marinis Structure-preserving deep learning, by Elena Celledoni, Matthias J. Ehrhardt, Christian Etmann, Robert I McLachlan, Brynjulf Owren, Carola-Bibiane Schönlieb, Ferdia Sherry

Over the past few years, deep learning has risen to the foreground as a topic of massive interest, mainly as a result of successes obtained in solving large-scale image processing tasks. There are multiple challenging mathematical problems involved in applying deep learning: most deep learning methods require the solution of hard optimisation problems, and a good understanding of the tradeoff between computational effort, amount of data and model complexity is required to successfully design a deep learning approach for a given problem. A large amount of progress made in deep learning has been based on heuristic explorations, but there is a growing effort to mathematically understand the structure in existing deep learning methods and to systematically design new deep learning methods to preserve certain types of structure in deep learning. In this article, we review a number of these directions: some deep neural networks can be understood as discretisations of dynamical systems, neural networks can be designed to have desirable properties such as invertibility or group equivariance, and new algorithmic frameworks based on conformal Hamiltonian systems and Riemannian manifolds to solve the optimisation problems have been proposed. We conclude our review of each of these topics by discussing some open problems that we consider to be interesting directions for future research.

Read the paper
April 6, 2022
Giuseppe Lipardi Computing semiclassical quantum dynamics with Hagedorn wavepackets, Erwan Faou, Vasile Gradinaru, and Christian Lubich

We consider the approximation of multiparticle quantum dynamics in the semiclassical regime by Hagedorn wavepackets, which are products of complex Gaussians with polynomials that form an orthonormal $L^2$ basis and preserve their type under propagation in Schrödinger equations with quadratic potentials. We build a fully explicit, time-reversible time-stepping algorithm to approximate the solution of the Hagedorn wavepacket dynamics. The algorithm is based on a splitting between the kinetic and potential part of the Hamiltonian operator, as well as on a splitting of the potential into its local quadratic approximation and the remainder. The algorithm is robust in the semiclassical limit. It reduces to the Strang splitting of the Schrödinger equation in the limit of the full basis set, and it advances positions and momenta by the Störmer–Verlet method for the classical equations of motion. The algorithm allows for the treatment of multiparticle problems by thinning out the basis according to a hyperbolic cross approximation and of high-dimensional problems by Hartree-type approximations in a moving coordinate frame.

Read the paper
March 30, 2022
Martino Caliaro Stability analysis of a chain of non-identical vehicles under bilateral cruise control

Bilateral cruise control (BCC) suppresses traffic flow instabilities. Previously, for simplicity of analysis, vehicles in BCC traffic flow were assumed to be identical, i.e., using the same gains for control. In this study, we analyze the stability of an inhomogeneous vehicular chain in which the gains used by different vehicles are not the same. Not unexpectedly, mathematical analysis becomes more difficult, and leads to a quadratic eigenvalue problem. We study several different cases, and shows that a chain of vehicles under bilateral cruise control is stable even when the vehicles do not all have the same control system properties. Numerical simulations validate the analysis.

Read the paper
March 23, 2022
Vishnu Sanjay Hodge Laplacians on Graphs, Lek-Heng Lim

This is an elementary introduction to the Hodge Laplacian on a graph, a higher-order generalization of the graph Laplacian. We will discuss basic properties including cohomology and Hodge theory. The main feature of our approach is simplicity, requiring only knowledge of linear algebra and graph theory. We have also isolated the algebra from the topology to show that a large part of cohomology and Hodge theory is nothing more than the linear algebra of matrices satisfying AB=0. For the remaining topological aspect, we cast our discussions entirely in terms of graphs as opposed to less-familiar topological objects like simplicial complexes.

Read the paper
March 16, 2022
Helena Biscevic Time-symmetric integration in astrophysics

Calculating the long term solution of ordinary differential equations, such as those of the N-body problem, is central to understanding a wide range of dynamics in astrophysics, from galaxy formation to planetary chaos. Because generally no analytic solution exists to these equations, researchers rely on numerical methods which are prone to various errors. In an effort to mitigate these errors, powerful symplectic integrators have been employed. But symplectic integrators can be severely limited because they are not compatible with adaptive stepping and thus they have difficulty accommodating changing time and length scales. A promising alternative is time-reversible integration, which can handle adaptive time stepping, but the errors due to time-reversible integration in astrophysics are less understood. The goal of this work is to study analytically and numerically the errors caused by time-reversible integration, with and without adaptive stepping. We derive the modified differential equations of these integrators to perform the error analysis. As an example, we consider the trapezoidal rule, a reversible non-symplectic integrator, and show it gives secular energy error increase for a pendulum problem and for a Hénon---Heiles orbit. We conclude that using reversible integration does not guarantee good energy conservation and that, when possible, use of symplectic integrators is favored. We also show that time-symmetry and time-reversibility are properties that are distinct for an integrator.

Read the paper
March 8, 2022
Pierpaolo Bilotto On the spectra of general random graphs, Fan Chung and Mary Radcliffe

We consider random graphs such that each edge is determined by an independent random variable, where the probability of each edge is not assumed to be equal. We use a Chernoff inequality for matrices to show that the eigenvalues of the adjacency matrix and the normalized Laplacian of such a random graph can be approximated by those of the weighted expectation graph, with error bounds dependent upon the minimum and maximum expected degrees. In particular, we use these results to bound the spectra of random graphs with given expected degree sequences, including random power law graphs. Moreover, we prove a similar result giving concentration of the spectrum of a matrix martingale on its expectation.

Read the paper
March 2, 2022
Mattia Manucci Physics-informed machine learning for reduced-order modeling of nonlinear problems

A reduced basis method based on a physics-informed machine learning framework is developed for efficient reduced-order modeling of parametrized partial differential equations (PDEs). A feedforward neural network is used to approximate the mapping from the time-parameter to the reduced coefficients. During the offline stage, the network is trained by minimizing the weighted sum of the residual loss of the reduced-order equations, and the data loss of the labeled reduced coefficients that are obtained via the projection of high-fidelity snapshots onto the reduced space. Such a network is referred to as physics-reinforced neural network (PRNN). As the number of residual points in time-parameter space can be very large, an accurate network – referred to as physics-informed neural network (PINN) – can be trained by minimizing only the residual loss. However, for complex nonlinear problems, the solution of the reduced-order equation is less accurate than the projection of high-fidelity solution onto the reduced space. Therefore, the PRNN trained with the snapshot data is expected to have higher accuracy than the PINN. Numerical results demonstrate that the PRNN is more accurate than the PINN and a purely data-driven neural network for complex problems. During the reduced basis refinement, the PRNN may obtain higher accuracy than the direct reduced-order model based on a Galerkin projection. The online evaluation of PINN/PRNN is orders of magnitude faster than that of the Galerkin reduced-order model.

Read the paper
February 23, 2022
Tommaso Tonolo Mean Field Analysis of Hypergraph Contagion Model, Desmond J. Higham and Henry-Louis de Kergorlay

We typically interact in groups, not just in pairs. For this reason, it has recently been proposed that the spread of information, opinion or disease should be modelled over a hypergraph rather than a standard graph. The use of hyperedges naturally allows for a nonlinear rate of transmission, in terms of both the group size and the number of infected group members, as is the case, for example, when social distancing is encouraged. We consider a general class of individual-level, stochastic, susceptible-infected-susceptible models on a hypergraph, and focus on a mean field approximation proposed in [Arruda et al., Phys. Rev. Res., 2020]. We derive spectral conditions under which the mean field model predicts local or global stability of the infection-free state. We also compare these results with (a) a new condition that we derive for decay to zero in mean for the exact process, (b) conditions for a different mean field approximation in [Higham and de Kergorlay, Proc. Roy. Soc. A, 2021], and (c) numerical simulations of the microscale model.

Read the paper
February 23, 2022
Arturo De Marinis Regularized nonlinear acceleration, by Damien Scieur, Alexandre d’Aspremont, Francis Bach

We describe a convergence acceleration technique for unconstrained optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing improved estimates of the solution on the fly, while the original optimization method is running. Numerical experiments are detailed on classical classification problems.

Read the paper
February 9, 2022
Konstantin Prokopchik Consensus Dynamics and Opinion Formation on Hypergraphs

In this chapter, we derive and analyse models for consensus dynamics on hypergraphs. As we discuss, unless there are nonlinear node interaction functions, it is always possible to rewrite the system in terms of a new network of effective pairwise node interactions, regardless of the initially underlying multi-way interaction structure. We thus focus on dynamics based on a certain class of non-linear interaction functions, which can model different sociological phenomena such as peer pressure and stubbornness. Unlike for linear consensus dynamics on networks, we show how our nonlinear model dynamics can cause shifts away from the average system state. We examine how these shifts are influenced by the distribution of the initial states, the underlying hypergraph structure and different forms of non-linear scaling of the node interaction function.

Read the paper
January 26, 2022
Simone Fioravanti EigenGame: PCA as Nash Equilibrium

We present a novel view on principal component analysis (PCA) as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function. We analyze the properties of this PCA game and the behavior of its gradient based updates. The resulting algorithm -- which combines elements from Oja's rule with a generalized Gram-Schmidt orthogonalization -- is naturally decentralized and hence parallelizable through message passing. We demonstrate the scalability of the algorithm with experiments on large image datasets and neural network activations. We discuss how this new view of PCA as a differentiable game can lead to further algorithmic developments and insights.

Read the paper
December 9, 2021
Dayana Savostianova Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching

Data Poisoning attacks modify training data to maliciously control a model trained on such data. In this work, we focus on targeted poisoning attacks which cause a reclassification of an unmodified test image and as such breach model integrity. We consider a particularly malicious poisoning attack that is both "from scratch" and "clean label", meaning we analyze an attack that successfully works against new, randomly initialized models, and is nearly imperceptible to humans, all while perturbing only a small fraction of the training data. Previous poisoning attacks against deep neural networks in this setting have been limited in scope and success, working only in simplified settings or being prohibitively expensive for large datasets. The central mechanism of the new attack is matching the gradient direction of malicious examples. We analyze why this works, supplement with practical considerations. and show its threat to real-world practitioners, finding that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset. Finally we demonstrate the limitations of existing defensive strategies against such an attack, concluding that data poisoning is a credible threat, even for large-scale deep learning systems.

Read the paper
November 24, 2021