Invited Speakers
In the alphabetical order.

Tat-Jun Chin
The University of Adelaide, Australia
Title:
Visual SLAM: Why Bundle Adjust?
Abstract:
Bundle adjustment plays a vital role in feature-based monocular SLAM. In
many modern SLAM pipelines, bundle adjustment is performed to estimate the
6DOF camera trajectory and 3D map (3D point cloud) from the input feature
tracks. However, two fundamental weaknesses plague SLAM systems based on
bundle adjustment. First, the need to carefully initialise bundle
adjustment means that all variables, in particular the map, must be
estimated as accurately as possible and maintained over time, which makes
the overall algorithm cumbersome. Second, since estimating the 3D structure
(which requires sufficient baseline) is inherent in bundle adjustment, the
SLAM algorithm will encounter difficulties during periods of slow motion or
pure rotational motion. We propose a different SLAM optimisation core:
instead of bundle adjustment, we conduct rotation averaging to
incrementally optimise only camera orientations. Given the orientations, we
estimate the camera positions and 3D points via a quasi-convex formulation
that can be solved efficiently and globally optimally. Our approach not
only obviates the need to estimate and maintain the positions and 3D map at
keyframe rate (which enables simpler SLAM systems), it is also more capable
of handling slow motions or pure rotational motions.

Angjoo Kanazawa
Google NYC, UC Berkeley
Title:
Recovering 3D mesh of humans and animals in the wild.
Abstract:
The lack of ground truth 3D annotation for images in the wild is one of the
main challenges in employing a deep learning based solution for 3D
reconstruction. This is particularly so for non-rigid, dynamically moving
objects such as humans and animals, where capturing their 3D data requires
heavily instrumented environments and actors. The problem is that models
trained on images obtained in such constrained environments do not
generalize to the complexity of the real world. In this talk I will discuss
several approaches that we have taken to overcome this problem. The key
idea is develop a method that can take advantage of unpaired data sources,
some of which contain 3D annotations and some of which only have 2D
annotations through means of analysis-by-synthesis or inverse graphics.
Specifically I will discuss our work in recovering 3D mesh of a person from
a single-image and its extension to video, where we take advantage of
unlabeled Internet videos. Such a system can be used to train a simulated
character to learn to act by watching YouTube videos and a system that can
learn to predict 3D human motion from video. I will also discuss our latest
work in recovering 3D zebras in the wild, where we mine textures from real
images to create synthetic dataset that we use to train a model that can
recover the 3D shape, pose, and texture of a zebra from a single image.

Vladlen Koltun
Intel., USA
Title:
Open 3D:A Modern Library for 3D Data Processing

Vincent Lepetit
Ecole des Ponts, France
Title:
Several Methods for 3D Reconstruction in the Wild
Abstract:
In this talk, I will present several approaches we very recently developed for different 3D reconstruction-in-the-wild problems. The first approach allows us to recover very accurate 3D meshes for objects seen in images taken in the wild. These meshes are also light, as they are made of few triangles, and can be easily used in robotics and augmented reality applications. The second approach is able to reconstruct accurate depth estimates of scenes from monocular images, with sharp occluding contours thanks to a very simple technique. The last approach introduces a new paradigm for matching images captured under very different conditions (different seasons, or night/day for example), which can be used for structure-from-motion with challenging images.

Marc Pollefeys
ETH Zurich, Microsoft
Title:
20 years of 3D reconstruction in the wild
Abstract:
3D reconstruction of rigid shapes such as scenes and objects can be achieved from different types of input data (3D, 2.5D, 2D). In this talk, I will present new directions using deep learning to reconstruct rigid shapes from point clouds and from images. In the first part of the talk, I will give an overview of methods aimed at learning descriptors of 3D objects from different 3D data representations, focusing in particular on point clouds. I will show how current approaches are moving towards unsupervised learning, using different architectures and learning methodologies. In the second part of the talk, I will focus instead on real-time 3D reconstruction from images. Starting from monocular depth prediction, I will show how deep learning can help monocular SLAM in reconstructing the 3D shape of a room. Finally, I will give an overview of current problems related to depth prediction for 3D reconstruction, such as LDI prediction and depth prediction and SLAM from omni-directional images.