Personal Blog of Jerry Xu

Autonomous Driving, Generative Models, Reinforcement Learning

2017.09 - 2022.03

University of California, Los Angeles, United State
Ph.D. in Statistics (Youngest Ph.D. awardee in UCLA class 2022)
Working at the Center for Vision, Cognition, Learning, and Autonomy (VCLA)
Advisor: Prof. Ying Nian Wu
GPA: 3.99 / 4.0

2013.09 - 2017.06

Shanghai Jiao Tong University, China
B. S. Eng. in Computer Science (Top 1% degree thesis in class 2017)
ACM Honor Class, Zhiyuan College (a pilot CS class in China)
Zhiyuan College, Shanghai Jiao Tong University, China
GPA: 3.83 / 4.0 (Major GPA)

2016.07 - 2016.09

University of California, Los Angeles, United State
Cross-disciplinary Scholars in Science and Technology Program
Department of Statistics
GPA: 4.0 / 4.0

2017.09 - 2022.03

University of California, Los Angeles, United State
Ph.D. in Statistics
Working at the Center for Vision, Cognition, Learning, and Autonomy (VCLA)
Department of Statistics
Advisor: Prof. Ying Nian Wu

The Center for Vision, Cognition, Learning, and Autonomy (VCLA)

The Center for Vision, Cognition, Learning, and Autonomy (VCLA) is affiliated with the Departments of Statistics and Computer Science at UCLA. We start from Computer Vision and expand to other disciplines. Our objective is to pursue a unified framework for representation, learning, inference and reasoning, and to build intelligent computer systems for real world applications.

Professor Ying Nian Wu

Professor Wu is a professor of Statistics in University of California, Los Angeles. He is interested in statistical modeling, computing and learning. In particular, he is interested in generative models and unsupervised learning.

GPA: 3.99 / 4.0 (First 4 Years)

Major Course grade A+ / A :

Applied Probability
Statistical Programming
Computer Vision and Pattern Recognition
Modeling and Learning
Matrix Algebra and Optimization
Modeling and Learning in Vision
Advance Modeling and Inference
Non-parametric Model
Probabilistic Programming
Statistical Computing and Inference
Machine Learning in Natural Language Processing

2013.09 - 2017.06

Shanghai Jiao Tong University

B. S. Eng. in Computer Science, Zhiyuan College

SJTU Excellent Bachelor's Degree Thesis (Top 1% in 3600 Undergraduates)

ACM Honored Class

ACM Honored Class is a pilot computer science class in China.

Over the past 10 years, ACM students have received hundreds of honors and awards. ACM studnets won the ACM International Student Programming Contest World Championship for three times in 2002, 2005 and 2010.

ACM students has published more than 40 academic papers as the first author in the NIPS, WWW, SIGIR, SIGMOD, SIGKDD, ICML, AAAI and other important international conferences and journals.

Zhiyuan College

Zhiyuan College, within Shanghai Jiao Tong University, is an institude that provides an Elite-education for our students. It aims to train them to become future leaders in science and in technology.

In order to be admitted to Zhiyuan College, a student must be on the top fo more than 17,000 undergraduate students within SJTU. Currently, 461 students are enrolled in Zhiyuan College.

By September 2016, 359 students have graduated from Zhiyuan College, 327 (91%) to pursue further studies, 273 (76%) admitted by world top 100 University listed in QS World University Ranking 2016, and 250 (70%) to pursue Ph.D. degrees.

Shanghai Jiao Tong University

Shanghai Jiao Tong University (SJTU), as one of the higher education institutions which enjoy a long history and a world-renowned reputation in China, is a key university directly under the administration of the Ministry of Education (MOE) of the People's Republic of China and co-constructed by MOE and Shanghai Municipal Government. SJTU has become a comprehensive, research-oriented, and internationalized top university in China.

GPA: 3.83 / 4.0 (Major GPA) (A+ = 4.3)

Major Course grade A+ / A :

Programming
Linear Algebra
Mathematical Analysis
University Physics
Science and Technology Innovation
Computer Architecture
Computer System
Course Design on Computer System
Cmputing Complexity
Machine Learning (Include Statistics)
Natural Language Processing
Database Systems
Lab Practice

2016.07 - 2016.09

University of California, Los Angeles, United State
Department of Statistics

Cross-disciplinary Scholars in Science and Technology Program

The CSST office administers the CSST Summer Program which brings outstanding third year undergraduate students, interested in PhD studies, nominated by top-tier universities in the People?s Republic of China (PRC) and Japan, to conduct 10 week intensive research training with UCLA faculty mentors. This 10-week program offers emerging scholars premier research training in a cutting edge scientific environment that fosters cross-disciplinary collaborations.

GPA: 4.0 / 4.0

Course grade A+ / A :

CSST Project
Directed Research

Center for Vision, Cognition, Learning and Autonomy

University of California, LA

2016.07 - 2022.03

Advisor: Ying Nian Wu

Inverse Reinforcement Learning by Energy-based Model
Cooperate both model-based and model-free approach
Apply on various task in reinforcement learning and optimal controls

Learning Generative ConvNet with Continuous Latent Factors
Model: a non-linear generalization of factor analysis where the mapping is parametrized by CNN
Optimized image synthesis training on large-scale images by batch normalization
Used new Back-Propagation inferenced by gradient descent / Langevin dynamics

Generative Hierarchical Structure Learning of Sparse FRAME Models
Model: Sparse FRAME, a multi-layer probability distribution model captured the part deformation
Designed experiments for Sparse FRAME model on detection and clustering
Compared Sparse FRAME model with DPM, And-or Graph on point, part, object level detection

Cognitive Computing Lab --- Research Intern (part-time)

Baidu Research USA

2021.09 - 2022.03

Advisor: Ping Li

Designed the united framework for Energy-based Model on Paddlepaddle and Pytorch
Support multiple data pipelines: texture, image, voxel, point cloud, etc.
Support multiple EBM training method: short-run, ABP, ABP through OT, etc.

Creative Vision team --- Research Intern

Snap Research

2021.06 - 2021.09

Advisor: Sergey Tulyakov

Model: Energy-based Implicit Function for 3D Shape Representation
Use energy-based model to represent objects in 3D space
Improved generating capability by incorporating VAE and EBM
Better versatility and easier preprocess compared to DeepSDF

Decision Intelligence Lab --- Research Intern

Alibaba DAMO Academy USA

2020.07 - 2020.09

Advisor: Jingqiao Zhang

SAS: Self-argmented Strategy for Self-supervised Learning
Model: A self-generating strategy for contextualized data argmentation without the separated generator
Researched on unsupervised pretrain model for Transformers in NLP
Outperform the SOTA result ELECTRA with 30% less computing cost

Research --- Research Intern

Hikvision Research USA

2019.06 - 2019.09

Advisor: Jianwen Xie

Generative PointNet: Deep Energy-Based Learning on Unordered Point Sets
Model : An energy-based model applied on 3D pointcloud generation via Langevin Dynamic
Compared on PointFlow, PointGlow, Autoencoder, etc.

Planning Group --- Research Intern

Isee Inc.

2018.07 - 2018.09

Advisor: Chris Baker

Continuous Inverse Optimal Control via Langevin Sampling to learn trajectory prediction
Model: A sample-based inverse reinforcement learning model driven by Energy-based Model and Langevin Dynamic
Energy function is formed by neural network enhanced human-crafted cost function
Langevin Dynamic is used to generate trajectories by sampling in energy-based distribution

Computer and Machine Intelligence Lab

Shanghai Jiao Tong University

2015.07 - 2017.06

Advisor: Liqing Zhang

Large-scale Image Retrieval Competition
Model: A model with saliency detection, image classification and image retrieval
Implemented Saliency Detection combining Dense and Sparse Reconstruction by Bayesian Integration
Classified large-scale images by SVM and Convolution Neural Network

Interactive Image Search for Clothing Recommendation
Model: Hybrid Topics Model, An LDA based model integrates both visual and text information
Used multi-trained Fast-RCNN to localize regions
Extracted 3 types of visual descriptors: HOG, LBP, Color
Implemented Hybrid Topics Model and introduced a demand-adaptive retrieval strategy

Visual Computing Lab --- Research Intern

Microsoft Research in Asia

2016.09 - 2017.02

Advisor: Fang Wen

Joint Face Detection and Alignment via Cascaded Compositional Learning
Model: Sparse FRAME, a multi-layer probability distribution model captured the part deformation
Jointed cascade face detection and alignment by advanced boosting algorithm
Considered multi-domain to overcome unconstrained face data
Trained multi domain on same random forest with both detection and alignment in parallel

Center for Vision, Cognition, Learning and Autonomy

University of California, LA

2016.07 - 2016.09

Advisor: Ying Nian Wu

Research Intern

Learning Generative ConvNet with Continuous Latent Factors

Model: a non-linear generalization of factor analysis where the mapping is parametrized by CNN
Optimized image synthesis training on large-scale images by batch normalization
Used new Back-Propagation inferenced by gradient descent / Langevin dynamics

Paper Abstract

This paper proposes an alternating back-propagation algorithm for learning the generator network model. The model is a non-linear generalization of factor analysis. In this model, the mapping from the latent factors to the observed vector is parametrized by a convolutional neural network. The alternating back-propagation algorithm iterates between the following two steps: (1) Inferential back-propagation, which infers the latent factors by Langevin dynamics or gradient descent. (2) Learning back-propagation, which updates the parameters given the inferred latent factors by gradient descent.

The project page : Link
The paper online : Link *My name is listed in the Acknowledgement
The poster : Link
The presentation : Link

Generative Hierarchical Structure Learning of Sparse FRAME Models

Model: Sparse FRAME, a multi-layer probability distribution model captured the part deformation
Designed experiments for Sparse FRAME model on detection and clustering
Compared Sparse FRAME model with DPM, And-or Graph on point, part, object level detection

Paper Abstract

This paper proposes a framework for generative learning of hierarchical structure of visual objects, based on training hierarchical random field models. The resulting model, which we call structured sparse FRAME model, is a straightforward variation on decomposing the original sparse FRAME model into multiple parts that are allowed to shift their locations, orientations and scales, so that the resulting model becomes a reconfigurable template.

The project page : Link
The paper online : Link
The poster : Link

Cognitive Computing Lab --- Research Intern (part-time)

Baidu Research USA

2021.09 - 2021.03

Advisor: Ping Li

Designed the united framework for Energy-based Model on Paddlepaddle and Pytorch
Support multiple data pipelines: texture, image, voxel, point cloud, etc.
Support multiple EBM training method: short-run, ABP, ABP through OT, etc.

Creative Vision team --- Research Intern

Snap Research

2021.06 - 2021.09

Advisor: Sergey Tulyakov

Energy-based Implicit Function for 3D Shape Representation
Model: Use energy-based model to represent objects in 3D space
Improved generating capability by incorporating VAE and EBM
Better versatility and easier preprocess compared to DeepSDF

Decision Intelligence Lab

Alibaba DAMO Academy USA

2020.07 - 2020.09

Advisor: Jingqiao Zhang

Research Intern

SAS: Self-Augmented Strategy for Language Model Pre-training

A novel generating strategy for contextualized data argmentation
Researched on unsupervised pretrain model for Transformers in NLP
Outperform the SOTA result ELECTRA with 30% less computing cost

Paper Abstract

The core of a self-supervised learning method for pre-training language models includes the design of appropriate data augmentation and corresponding pre-training task(s). Most data augmentations in language model pre-training are context-independent. The seminal contextualized augmentation recently proposed by the ELECTRA requires a separate generator, which leads to extra computation cost as well as the challenge in adjusting the capability of its generator relative to that of the other model component(s). We propose a self-augmented strategy (SAS) that uses a single forward pass through the model to augment the input data for model training in the next epoch. Essentially our strategy eliminates a separate generator network and uses only one network to generate the data augmentation and undertake two pre-training tasks (the MLM task and the RTD task) jointly, which naturally avoids the challenge in adjusting the generator's capability as well as reduces the computation cost. Additionally, our SAS is a general strategy such that it can seamlessly incorporate many new techniques emerging recently or in the future, such as the disentangled attention mechanism recently proposed by the DeBERTa model. Our experiments show that our SAS is able to outperform the ELECTRA and other state-of-the-art models in the GLUE tasks with the same or less computation cost.

The paper online : Arxiv

Research Group

Hikvision Research USA

2019.06 - 2019.09

Advisor: Jianwen Xie

Research Intern

Generative PointNet: Deep Energy-Based Learning on Unordered Point Sets

Model : An energy-based model applied on 3D pointcloud generation
Compared on PointFlow, PointGlow, Autoencoder, etc.

Paper Abstract

We propose a generative model of point clouds in the forms of an energy-based model, where the energy function is parameterized by an input-permutation-invariant bottom-up neural network. The energy function learns a coordinate encoding of each point and then aggregates all individual point features into an energy for the whole point cloud. We show that our model can be derived from the discriminative PointNet. The model is trained by MCMC-based maximum likelihood learning (as well as its variants), without the help of any assisting networks like those in GANs and VAEs. Our model does not rely on hand-crafting distance metric for point clouds in generation. It synthesizes point clouds that match to the observed examples. The learned point cloud representation can be useful for point cloud classification. Experiments demonstrate the advantages of the proposed model. Furthermore, we can learn a short-run MCMC toward the energy-based model as a flow-like generator for point cloud reconstruction and interpretation.

The paper online : Arxiv

Planning Group

Isee Inc.

2018.07 - 2018.09

Advisor: Chris Baker

Research Intern

Continuous Inverse Optimal Control by Energy-based Model

Model: A sample-based inverse reinforcement learning model driven by Energy-based Model and Langevin Dynamic
Energy function is formed by neural network enhanced human-crafted cost function
Langevin Dynamic is used to generate trajectories by sampling in energy-based distribution

Paper Abstract

Autonomous driving is a challenging multiagent domain which requires optimizing complex, mixed cooperative-competitive interactions. Learning to predict contingent distributions over other vehicles' trajectories simplifies the problem, allowing approximate solutions by trajectory optimization with dynamic constraints. We take a model-based approach to prediction, in order to make use of structured prior knowledge of vehicle kinematics, and the assumption that other drivers plan trajectories to minimize an unknown cost function. We introduce a novel inverse optimal control (IOC) algorithm to learn other vehicles' cost functions in an energy-based generative model. Langevin Sampling, a Monte Carlo based sampling algorithm, is used to directly sample the control sequence. Our algorithm provides greater flexibility than standard IOC methods, and can learn higher-level, non-Markovian cost functions defined over entire trajectories. We extend weighted feature-based cost functions with neural networks to obtain NN-augmented cost functions, which combine the advantages of both model-based and model-free learning. Results show that model-based IOC can achieve state-of-the-art vehicle trajectory prediction accuracy, and naturally take scene information into account.

The paper online : Link

Computer and Machine Intelligence Lab

Shanghai Jiao Tong University

2015.07 - 2017.06

Advisor: Liqing Zhang

Research Assistant

*1

Large-scale Image Retrieval Competition

Model: A model with saliency detection, image classification and image retrieval
Implemented Saliency Detection combining Dense and Sparse Reconstruction by Bayesian Integration
Classified large-scale images by SVM and Convolution Neural Network

This is a competition hold by Alibaba. The goal is to output the picture with the most similarity by the given picture. The database is a million web pictures. There are three part for our model. They are saliency detection, CNN classification and text matching. I am in charge of saliency detection and classification. Our team ranked in the top 16 in the competition(Over 2000 teams).

Interactive Image Search for Clothing Recommendation

Model: Hybrid Topics Model, An LDA based model integrates both visual and text information
Used multi-trained Fast-RCNN to localize regions
Extracted 3 types of visual descriptors: HOG, LBP, Color
Implemented Hybrid Topics Model and introduced a demand-adaptive retrieval strategy

Paper Abstract

This paper proposes a novel approach to meet users' multi-dimensional requirements in clothing image retrieval.We propose the Hybrid Topic (HT) model to learn the intricate semantic representation of the descriptors above. The model provides an effective multi-dimensional representation of clothes and is able to perform automatic image annotation by probabilistic reasoning from image search. Furthermore, we develop a demand-adaptive retrieval strategy which refines users' specific requirements and removes users' unwanted features. Our experiments show that the HT method significantly outperforms the deep neural network methods.

The system page : Magic Wardrobe
The paper online : Link
The poster : Link

Visual Computing Lab

Microsoft Research in Asia

2016.09 - 2017.02

Advisor: Fang Wen

Research Intern

*2

Joint Face Detection and Alignment via Cascaded Compositional Learning

Model: Sparse FRAME, a multi-layer probability distribution model captured the part deformation
Jointed cascade face detection and alignment by advanced boosting algorithm
Considered multi-domain to overcome unconstrained face data
Trained multi domain on same random forest with both detection and alignment in parallel

This work is based on "Joint cascade face detection and alignment" and "Unconstrained Face Alignment via Cascaded Compositional Learning". We aim to provide domain partition on the Joint cascade face detection and alignment method.

2022.03

Yifei Xu, Zeng Huang, Ying Nian Wu, Sergey Tulyakov" Energy-based Implicit Function for 3D Shape Representation" In Review

2023.02

Jianwen Xie, Yaxuan Zhu, Yifei Xu, Dingcheng Li, Ping Li " Generative Learning with Latent Space Flow-based Prior Model" In Proc. 37th AAAI Conference on Artificial Intelligence (AAAI) 2023

2022.02

Yifei Xu^†, Jingqiao Zhang^†, Ru He^†, Liangzhu Ge^†, Chao Yang, Cheng Yang, Ying Nian Wu " SAS: Self-Augmented Strategy for Language Model Pre-training" In Proc. 36th AAAI Conference on Artificial Intelligence (AAAI) 2022

2021.06

Yifei Xu^†, Jianwen Xie^†, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu " Generative PointNet : Deep Energy-Based Learning on Point Sets for 3D Generation and Reconstruction" IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021.

2020.02

Yifei Xu, Jianwen Xie, Tianyang Zhao, Chris Baker, Yibiao Zhao, Ying Nian Wu " Energe-based Continous Inverse Optimal Control" IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 2022; NeurIPS workshop on Machine Learning for Autonomous Driving, 2020

2018.11

Tianyang Zhao, Yifei Xu, Mathew Monfort, Wongun Choi, Chris Baker, Yibiao Zhao, Yizhou Wang, Ying Nian Wu " Convolutional Spatial Fusion for Multi-Agent Trajectory Prediction" IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019.

2017.03

Jianwen Xie, Yifei Xu, Erik Nijkamp, Ying Nian Wu, Song-Chun Zhu "Generative Hierarchical Structure Learning of Sparse FRAME Models" IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017.

2016.10

Zhengzhong Zhou, Yifei Xu, Jingjin Zhou and Liqing Zhang "Interactive Image Search for Clothing Recommendation. " The 24th ACM international conference on Multimedia. ACM, 2016.

2014-2016

Academic Excellence Scholarship at SJTU Prize B, C, B (Top 10%, 20%, 10% in University)

2016.04

Interdisciplinary Contest In Modeling 2016 Meritorious

2016.07

UCLA CSST Scholarship and CSST Award (2 in CSST Program CS Major)

2016.10

'ele' Scholarship for outstanding CS students (6 in university each year)

2016.12

'YuanKang' Scholarship for outstanding research (5 in university each year)

2017.06

SJTU Excellent Bachelor's Degree Thesis (Top 1% in 3600 Undergraduates)

AI

"FishTank" Game AI
"Texas Hold'em" Game AI

System

Compiler for simplified C
Simulated Pipeline CPU
C++ STL Container
Virus for Linux
SQL System

Web Dev.

Bookex System
ACM New Website
POI System

Machine Learning

Trajectory Compression
Implicit Discourse Parsing
Multi-label Text Classification
ML Toolkit (Regression, Clustering, Boosting...)
Clustering / Monte Carlo Algorithm
Generative Model (VAE, GAN, DCGAN, ...)
Descriptive Model (EBM, ABP, ...)
EM Algorithm

2013.12

AI "FishTank" Game AI

C++

Project of "Programming"

2014.08

AI "Texas Hold'em" Game AI

C++

Project of "Programming Practice"

2014.06

System C++ STL Container

C++

Project of "Data Struct" which include AVL tree, Hashmap, Linklist, etc.

2015.04

System Compiler for simplified C

Java

A compiler which transform C code into MIPS code.

2015.06

System Simulated Pipeline CPU

Verilog

Simulate the MIPS code’s running on Verilog simulator.

2015.10

System Virus for Linux

Linux C

A virus runs on Linux in order to have the super authority.

2016.05

System SQL System

C++

A SQL System.

2014.03

Web Dev. Bookex System (Part)

Html PHP javascript

A recommended system for a secondhand book market

2014.08

Web Dev. ACM New Website

Html PHP javascript

A new, Responsive website for ACM Class.

2016.05

Web Dev. POI System

Html JSP Javascript

A yelp-like website.

2015.08

ML Trajectory Compression

C++

Compress a trajectory with lossless and lossy method.

2016.06

ML Implicit Discourse Parsing

Python Matlab

The implementation of "Recognizing Implicit Discourse Relations in the Penn Discourse Treebank".

2018.12

ML Machine learning Toolkit

Python R

Multiple machine learning algorithms including Regression, Clustering, Boostering, MCMC, VAE, GAN, EBM, EM ...

2019.06

ML Multi-label Text Classification

Python

Multi-label text classification via ELMo and label attention

2017-2019

ML Advanced ML network implementation

Python

Including clustering algorithm, monte carlo algorithms, VAE, GAN, DCGAN, EBM, ABP, EM algorithm, etc.

Extracurricular

2013-14: Member of Zhiyuan Pandeng (leadership) Project
2015-16: Minisiter of the college publicity center
2014-17: Publicity commissary and Vice monitor for ACM 2013 Class

Teach Assistant

2015-16: Data Structure
2018, 2019 Fall: Statistical Programming (STAT 202A)
2019, 2020 Winter: Methods of Machine Learning (STAT 231B)
2019, 2020 Spring: Introduction to Probability (STAT 100A)
2020, 2021 Spring: Machine Learning (STAT 413)
2020 Fall: Pattern Recognition and Machine Learning (STAT M231A)
2021 Winter: Matrix Algebra and Optimization (STAT 202B)
2021 Winter: Introduction to Computational Statistics with R (STAT 102A)

Reviewer

2019, 2020, 2021, 2022: Conference on Computer Vision and Pattern Recognition (CVPR)
2020, 2021: Conference on Neural Information Processing Systems (NeurIPS)
2021, 2022: AAAI Conference on Artificial Intelligence (AAAI)
2021: The 24th International Conference on Artificial Intelligence and Statistics (AISTATS)
2022: International Conference on Learning Representations (ICLR)