Tianhao Zhang's Homepage

We present an approach for one-shot learning from a video of a human by using human and robot demonstration data from a variety of previous tasks to build up prior knowledge through meta-learning. Then, combining this prior knowledge and only a single video demonstration from a human, the robot can perform the task that the human demonstrated. We show experiments on both a PR2 arm and a Sawyer arm, demonstrating that after meta-learning, the robot can learn to place, push, and pick-and-place new objects using just one video of a human performing the manipulation.
arXiv

We describe how consumer-grade virtual reality headsets and hand tracking hardware can be used to naturally teleoperate robots to perform complex tasks. We also describe how imitation learning can learn deep neural network policies (mapping from pixels to actions) that can acquire the demonstrated skills. Our experiments showcase the effectiveness of our approach for learning visuomotor skills.
arXiv Video Website

We present a meta-imitation learning method that enables a robot to learn how to learn more efficiently, allowing it to acquire new skills from just a single demonstration. Unlike prior methods for one-shot imitation, our method can scale to raw pixel inputs and requires data from significantly fewer prior tasks for effective learning of new skills. Our experiments on both simulated and real robot platforms demonstrate the ability to learn new tasks, end-to-end, from a single visual demonstration.
arXiv Video Website

Model predictive control (MPC) is an effective control method but is limited by planning with short horizon due to practical constraints. We propose a general policy improvement scheme for MPC, hindsight iterative MPC (HIMPC), which incorporates long-term reasoning into MPC short-horizon planning and demonstrates superior empirical performance in simulated and real contact-rich manipulation tasks.
arXiv Video Website

We propose PLATO, an algorithm that trains complex control policies with supervised learning, using model-predictive control (MPC) to generate the supervision. PLATO uses an adaptive training method to modify the behavior of MPC to gradually match the learned policy, in order to generate training samples at states that are likely to be visited by the policy while avoiding highly undesirable on-policy actions. We prove that this type of adaptive MPC expert produces supervision that leads to good long-horizon performance of the resulting policy, and empirically demonstrate that MPC can still avoid dangerous on-policy actions in unexpected situations during training.
arXiv Video Website

Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search

, Gregory Kahn, Sergey Levine, Pieter Abbeel

In the IEEE International Conference on Robotics and Automation (ICRA), 2016.
Also, in Neural Information Processing Systems (NIPS) Deep Reinforcement Learning Workshop, 2015.

Model predictive control (MPC) is crucial for underactuacted systems such as autonomous aerial vehicles, but its application can be computationally demanding. We propose to combine MPC with reinforcement learning in the framework of guided policy search (GPS). The resulting neural network policy can successfully control the robot at a fraction of the computational cost of MPC.
arXiv Video Slides

Deterministic neural networks (DNNs) are shown effective policies with good generalization in robot control. However, their determinism restricts DNNs to only modelling uni-modal controls. Training DNNs on multi-modal controls can be inefficient or lead to garbage results. In contrast, stochastic neural networks (SNNs) are able to learn one-to-many mappings. In this paper, we introduce SNNs as control policies and extend existing learning algorithm for feed-forward SNNs to recurrent ones. PDF Poster

We propose an algorithm to automatically identify window regions on exterior-facing building facades in a colored 3D point cloud generated using data captured from an ambulatory backpack sensor system outfitted with multiple LiDAR sensors and cameras. Our work is based on a R-CNN-inspired algorithm with novel filtering and preprocessing techniques. We use multiscale combinatorial grouping (MCG) for region proposal generation, pass the proposals to a convolution neural network (CNN), and train a random forest with the CNN output vectors. PDF

Spring 2016 - CS188

Head Student Instructor

Spring 2015 - CS188.1x (MOOC)

Course Moderator

Education

University of California, Berkeley

Aug 2016 - present

PhD, Electrical Engineering and Computer Science

Advisor: Pieter Abbeel

University of California, Berkeley

Aug 2012 - May 2016

B.A., Computer Science and Statistics

Cumulative GPA: 3.92

Selected Coursework: (2xx - graduate courses)

(CS287) Advanced Robotics (A+; Rank: 2/34)

(Stat241A/CS281A) Statistical Learning Theory (A+)

(CS189) Machine Learning (A+; Rank: 1/297)

(CS188) Artificial Intelligence (A+)

(CS280) Computer Vision (A)

(CS288) Natural Language Processing (A)

(CS170) Efficient Algorithms (A)

(CS294-12) Deep Reinforcement Learning (A)

(Math110) Linear Algebra (A)

(Math104) Real Analysis (A)

Honors:

EECS Honors Degree Graduate (expected)

Dean's Honors List (five semesters)

Personal

Piano

I'm a fan of Chopin. Here is some of my recordings (list to be expanded):

Nocturne in E-flat major, Op. 9 No. 2 YouTube Sheet

Nocturne in F minor, Op. 55 No. 1 YouTube Sheet

Nocturne in E-flat major, Op. 9 No. 1 (in the works)

Polonaise in A major, Op. 40, No. 1 (a.k.a. Military Polonaise) (in the works)

Photography

Coming soon. (in fact, maybe much later...)

Contact

Email: tianhao.z AT eecs.berkeley.edu

Publications

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

Tianhe Yu, Chelsea Finn, Annie Xie, Sudeep Dasari, , Pieter Abbeel, Sergey Levine

Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation

, Zoe McCarthy, Owen Jow, Dennis Lee, Xi (Peter) Chen, Ken Goldberg, Pieter Abbeel

One-Shot Visual Imitation Learning via Meta-Learning

Chelsea Finn, Tianhe Yu, , Pieter Abbeel, Sergey Levine

Learning from the Hindsight Plan -- Episodic MPC Improvement

Aviv Tamar, Garrett Thomas, , Sergey Levine, Pieter Abbeel

PLATO: Policy Learning using Adaptive Trajectory Optimization

Gregory Kahn, , Sergey Levine, Pieter Abbeel

Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search

, Gregory Kahn, Sergey Levine, Pieter Abbeel

Projects

Towards Stochastic Neural Network Control Policies

Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN

Craig Hiller, David Zhang, , Zihao Zhang

Teaching

UC Berkeley CS188 (Introduction to Artificial Intelligence)

Instructors: Prof. Pieter Abbeel and Prof. Anca Dragan

Spring 2016 - CS188

Spring 2015 - CS188.1x (MOOC)

About

Education

Personal

Contact

Publications

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

Tianhe Yu, Chelsea Finn, Annie Xie, Sudeep Dasari, , Pieter Abbeel, Sergey Levine

Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation

*, Zoe McCarthy*, Owen Jow, Dennis Lee, Xi (Peter) Chen, Ken Goldberg, Pieter Abbeel

One-Shot Visual Imitation Learning via Meta-Learning

Chelsea Finn*, Tianhe Yu*, , Pieter Abbeel, Sergey Levine

Learning from the Hindsight Plan -- Episodic MPC Improvement

Aviv Tamar, Garrett Thomas, , Sergey Levine, Pieter Abbeel

PLATO: Policy Learning using Adaptive Trajectory Optimization

Gregory Kahn, , Sergey Levine, Pieter Abbeel

Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search

, Gregory Kahn, Sergey Levine, Pieter Abbeel

Projects

Towards Stochastic Neural Network Control Policies

Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN

Craig Hiller, David Zhang, , Zihao Zhang

Teaching

UC Berkeley CS188 (Introduction to Artificial Intelligence)

Instructors: Prof. Pieter Abbeel and Prof. Anca Dragan

Spring 2016 - CS188

Spring 2015 - CS188.1x (MOOC)

About

Education

Personal

Contact

, Zoe McCarthy, Owen Jow, Dennis Lee, Xi (Peter) Chen, Ken Goldberg, Pieter Abbeel

Chelsea Finn, Tianhe Yu, , Pieter Abbeel, Sergey Levine