My Photo

I'm Tianhao Zhang, a first-year PhD student in EECS at University of California, Berkeley, advised by Pieter Abbeel. Before that, I was an undergraduate at Berkeley with double major in Computer Science and Statistics, where I worked as a research assistant at the Berkeley Artificial Intelligence Research laboratory (BAIR) under the supervision of Pieter Abbeel and Sergey Levine.


Sep 15, 2016 New preprint: Policy Learning from the Hindsight Plan -- Episodic MPC Improvement
Aug 29, 2016 Excited to officially join Prof. Pieter Abbeel's group as a PhD candidate at UC Berkeley!
Jun 6, 2016 I started my summer research internship at Microsoft Research Redmond, working with Lihong Li, Ming-Wei Chang, Wen-Tau Yih, Chong Wang, and Dengyong Zhou.
May 17, 2016 I gave a short spotlight talk at ICRA 2016 about our work MPC-GPS. [slides]
Mar 2, 2016 New preprint: Policy Learning using Adaptive Trajectory Optimization.
Jan 19, 2016 I was appointed the Head (U)GSI for CS188.
Jan 14, 2016 Our paper on MPC-GPS was accepted to the ICRA 2016.
Oct 29, 2015 Our paper on MPC-GPS was accepted by the Deep Reinforcement Learning Workshop at NIPS 2015.


Learning from the Hindsight Plan -- Episodic MPC Improvement

Aviv Tamar, Garrett Thomas, , Sergey Levine, Pieter Abbeel

In the IEEE International Conference on Robotics and Automation (ICRA), 2017.


Model predictive control (MPC) is an effective control method but is limited by planning with short horizon due to practical constraints. We propose a general policy improvement scheme for MPC, hindsight iterative MPC (HIMPC), which incorporates long-term reasoning into MPC short-horizon planning and demonstrates superior empirical performance in simulated and real contact-rich manipulation tasks.
arXiv Video Website

PLATO: Policy Learning using Adaptive Trajectory Optimization

Gregory Kahn, , Sergey Levine, Pieter Abbeel

In the IEEE International Conference on Robotics and Automation (ICRA), 2017.


We propose PLATO, an algorithm that trains complex control policies with supervised learning, using model-predictive control (MPC) to generate the supervision. PLATO uses an adaptive training method to modify the behavior of MPC to gradually match the learned policy, in order to generate training samples at states that are likely to be visited by the policy while avoiding highly undesirable on-policy actions. We prove that this type of adaptive MPC expert produces supervision that leads to good long-horizon performance of the resulting policy, and empirically demonstrate that MPC can still avoid dangerous on-policy actions in unexpected situations during training.
arXiv Video Website

Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search

, Gregory Kahn, Sergey Levine, Pieter Abbeel

In the IEEE International Conference on Robotics and Automation (ICRA), 2016.
Also, in Neural Information Processing Systems (NIPS) Deep Reinforcement Learning Workshop, 2015.


Model predictive control (MPC) is crucial for underactuacted systems such as autonomous aerial vehicles, but its application can be computationally demanding. We propose to combine MPC with reinforcement learning in the framework of guided policy search (GPS). The resulting neural network policy can successfully control the robot at a fraction of the computational cost of MPC.
arXiv Video Slides


Towards Stochastic Neural Network Control Policies

UC Berkeley CS287 (Advanced Robotics) and CS281A (Statistical Learning Theory) Final Project (Fall 2015).


Deterministic neural networks (DNNs) are shown effective policies with good generalization in robot control. However, their determinism restricts DNNs to only modelling uni-modal controls. Training DNNs on multi-modal controls can be inefficient or lead to garbage results. In contrast, stochastic neural networks (SNNs) are able to learn one-to-many mappings. In this paper, we introduce SNNs as control policies and extend existing learning algorithm for feed-forward SNNs to recurrent ones. PDF Poster

Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN

Craig Hiller, David Zhang, , Zihao Zhang

UC Berkeley CS280 (Computer Vision) Final Project (Spring 2015).


We propose an algorithm to automatically identify window regions on exterior-facing building facades in a colored 3D point cloud generated using data captured from an ambulatory backpack sensor system outfitted with multiple LiDAR sensors and cameras. Our work is based on a R-CNN-inspired algorithm with novel filtering and preprocessing techniques. We use multiscale combinatorial grouping (MCG) for region proposal generation, pass the proposals to a convolution neural network (CNN), and train a random forest with the CNN output vectors. PDF


UC Berkeley CS188 (Introduction to Artificial Intelligence)

Instructors: Prof. Pieter Abbeel and Prof. Anca Dragan

Spring 2016 - CS188

Head Student Instructor

Spring 2015 - CS188.1x (MOOC)

Course Moderator



University of California, Berkeley
Aug 2016 - present
PhD, Electrical Engineering and Computer Science
Advisor: Pieter Abbeel

University of California, Berkeley
Aug 2012 - May 2016
B.A., Computer Science and Statistics
Cumulative GPA: 3.92
Selected Coursework: (2xx - graduate courses)
(CS287) Advanced Robotics (A+; Rank: 2/34)
(Stat241A/CS281A) Statistical Learning Theory (A+)
(CS189) Machine Learning (A+; Rank: 1/297)
(CS188) Artificial Intelligence (A+)
(CS280) Computer Vision (A)
(CS288) Natural Language Processing (A)
(CS170) Efficient Algorithms (A)
(CS294-12) Deep Reinforcement Learning (A)
(Math110) Linear Algebra (A)
(Math104) Real Analysis (A)
EECS Honors Degree Graduate (expected)
Dean's Honors List (five semesters)


I'm a fan of Chopin's nocturnes. Here is some of my recordings (list to be expanded):


Email: tianhao.z AT