David Picard

IMAGINE/LIGM, École des Ponts ParisTech
6-8, Av Blaise Pascal - Cité Descartes
Champs-sur-Marne
77455 Marne-la-Vallée cedex 2

Email  /  GitHub  /  Google Scholar  /  Resume

profile photo

Research

My research interests include machine learning for computer vision. More specifically, deep learning, kernel methods and asynchronous optimization for the machine learning part and image representation learning, scene understanding and video analysis for the computer vision part.

PhD Students


Current

  • Yue Zhu, 2020-2023, Interactive 3D estimation of human posture in the working environment using deep neural networks, with Nicolas Thome at CNAM de Paris
  • Thibaut Issenhuth, 2020-2023, Interactive Generative Models, with Jérémie Mary at Criteo
  • Victor Besnier, 2019-2022, Safety in Machine Learning Systems, with Alexandre Briot and Abdelillah Ymlahi-Ouazzani at Valeo
  • Thomas Luka, 2019-2022, Cross-modal Representation Learning
  • Marie-Morgane Paumard, 2017-2020, Ancient Fragment Re-assembly using Deep Learning (with H. Tabia)

Former

  • Pierre Jacob, 2020, High-order statistics for representation learning (with A. Histace and E. Klein)
  • Diogo Luvizon, 2019, 2D/3D Pose Estimation and Action Recognition (with H. Tabia), now at Samsung
  • Jérôme Fellus, 2017, Machine learning using asynchronous gossip exchange (with P.H. Gosselin), now postdoc at Irisa
  • Romain Negrel, 2014, Representation learning for image retrieval (with P.H. Gosselin), now associate prof at Esiee

Recent publications

Full list: scholar / dblp / hal

project image

DIABLO: Dictionary-based attention block for deep metric learning


Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein
Pattern Recognition Letters, 2020
arxiv / doi /

In this paper, we propose DIABLO, a dictionary-based attention method for image embedding. DIABLO produces richer representations by aggregating only visually-related features together while being easier to train than other attention-based methods in deep metric learning. This is experimentally confirmed on four deep metric learning datasets (Cub-200-2011, Cars-196, Stanford Online Products, and In-Shop Clothes Retrieval) for which DIABLO shows state-of-the-art performances.

project image

Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition


Diogo Luvizon, David Picard, Hedi Tabia
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020
arxiv / doi /

In this work, we propose a multi-task framework for jointly estimating 2D or 3D human poses from monocular color images and classifying human actions from video sequences. We show that a single architecture can be used to solve both problems in an efficient way and still achieves state-of-the-art or comparable results at each task while running with a throughput of more than 100 frames per second. The proposed method benefits from high parameters sharing between the two tasks by unifying still images and video clips processing in a single pipeline, allowing the model to be trained with data from different categories simultaneously and in a seamlessly way.

project image

Deepzzle: Solving Visual Jigsaw Puzzles with Deep Learning and Shortest Path Optimization


Marie-Morgane Paumard, David Picard, Hedi Tabia
IEEE Transactions on Image Processing, 2020
doi /

We tackle the image reassembly problem with wide space between the fragments, in such a way that the patterns and colors continuity is mostly unusable. The spacing emulates the erosion of which the archaeological fragments suffer. We use a two-step method to obtain the reassemblies: 1) a neural network predicts the positions of the fragments despite the gaps between them; 2) a graph that leads to the best reassemblies is made from these predictions.

project image

Human pose regression by combining indirect part detection and contextual information


Diogo Luvizon, David Picard, Hedi Tabia
Computers and Graphics, 2019
arxiv / code / doi /

In this paper, we tackle the problem of human pose estimation from still images, which is a very active topic, specially due to its several applications, from image annotation to human-machine interface. We use the soft-argmax function to convert feature maps directly to body joint coordinates, resulting in a fully differentiable framework. Our method is able to learn heat maps representations indirectly, without additional steps of artificial ground truth generation.

project image

Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings


Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein
International Conference on Computer Vision, 2019
arxiv / code /

In this paper, we tackle this scattering problem with a distribution-aware regularization named HORDE. This regularizer enforces visually-close images to have deep features with the same distribution which are well localized in the feature space. We provide a theoretical analysis supporting this regularization effect. We also show the effectiveness of our approach by obtaining state-of-the-art results on 4 well-known datasets (Cub-200-2011, Cars-196, Stanford Online Products and Inshop Clothes Retrieval).

project image

Efficient Codebook and Factorization for Second Order Representation Learning


Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein
International Conference on Image Processing, 2019
arxiv /

To build richer representations, high order statistics have been exploited and have shown excellent performances, but they produce higher dimensional features. While this drawback has been partially addressed with factorization schemes, the original compactness of first order models has never been retrieved, or at the cost of a strong performance decrease. Our method, by jointly integrating codebook strategy to factorization scheme, is able to produce compact representations while keeping the second order performances with few additional parameters.

project image

Distributed optimization for deep learning with gossip exchange


Michael Blot, David Picard, Nicolas Thome, Matthieu Cord
Neurocomputing, 2019
arxiv / doi /

We address the issue of speeding up the training of convolutional neural networks by studying a distributed method adapted to stochastic gradient descent. Our parallel optimization setup uses several threads, each applying individual gradient descents on a local variable. We propose a new way of sharing information between different threads based on gossip algorithms that show good consensus convergence properties. Our method called GoSGD has the advantage to be fully asynchronous and decentralized.

project image

Leveraging Implicit Spatial Information in Global Features for Image Retrieval


Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein
International Conference on Image Processing, 2018
arxiv / doi /

Most image retrieval methods use global features that aggregate local distinctive patterns into a single representation. However, the aggregation process destroys the relative spatial information by considering orderless sets of local descriptors. We propose to integrate relative spatial information into the aggregation process by taking into account co-occurrences of local patterns in a tensor framework.

project image

Jigsaw Puzzle Solving Using Local Feature Co-Occurrences in Deep Neural Networks


Marie-Morgane Paumard, David Picard, Hedi Tabia
International Conference on Image Processing, 2018
arxiv / doi /

Archaeologists are in dire need of automated object reconstruction methods. Fragments reassembly is close to puzzle problems, which may be solved by computer vision algorithms. As they are often beaten on most image related tasks by deep learning algorithms, we study a classification method that can solve jigsaw puzzles. In this paper, we focus on classifying the relative position: given a couple of fragments, we compute their local relation (e.g. on top). We propose several enhancements over the state of the art in this domain, which is outperformed by our method by 25%.

project image

Image Reassembly Combining Deep Learning and Shortest Path Problem


Marie-Morgane PaumardDavid Picard, Hedi Tabia
European Conference on Computer Vision, 2018
arxiv / doi /

This paper addresses the problem of reassembling images from disjointed fragments. More specifically, given an unordered set of fragments, we aim at reassembling one or several possibly incomplete images. The main contributions of this work are: (1) several deep neural architectures to predict the relative position of image fragments that outperform the previous state of the art; (2) casting the reassembly problem into the shortest path in a graph problem for which we provide several construction algorithms depending on available information; (3) a new dataset of images taken from the Metropolitan Museum of Art (MET) dedicated to image reassembly for which we provide a clear setup and a strong baseline.

project image

Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings


Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Nicolas Thome, Matthieu Cord
ACM SIGIR Conference on Research and Development in Information Retrieval, 2018
arxiv / doi /

Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing them. In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space. We describe an effective learning scheme, capable of tackling large-scale problems, and validate it on the Recipe1M dataset containing nearly 1 million picture-recipe pairs. We show the effectiveness of our approach regarding previous state-of-the-art models and present qualitative results over computational cooking use cases.

project image

2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning


Diogo Luvizon, David Picard, Hedi Tabia
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018
arxiv / code / doi /

Action recognition and human pose estimation are closely related but both problems are generally handled as distinct tasks in the literature. In this work, we propose a multitask framework for jointly 2D and 3D pose estimation from still images and human action recognition from video sequences. We show that a single architecture can be used to solve the two problems in an efficient way and still achieves state-of-the-art results. Additionally, we demonstrate that optimization from end-to-end leads to significantly higher accuracy than separated learning. The proposed architecture can be trained with data from different categories simultaneously in a seamlessly way.




Design and source code from Jon Barron's website