- Research Interests
- Current Projects
- Previous Projects
I am interested in many different aspects of Computer Vision:
- Large-scale representation learning of 3D Point Cloud
- Instance Segmentation and Object Detection of 3D Point Cloud
- Generative model in video (Frame Prediction and Interpolation).
- Representation learning and feature disentanglement
- Statistical Learning
Object Detection in Point Cloud
3D Reconstruction from a Single Image
High quality single view 3D reconstruction by using implicit function and local features. Please refer to our publication [project site]
Fast Adaptive Meta-Learning based branching for Mixed Interger Programming
We propose a meta-learning framework for variable selection in the branch-and-bound algorithm of Mixed Integer Programming(MIP). With the meta-learning methodology, we train our base model with samples from various MIP problems to mimic the general variable selection behavior of Strong Branching (SB). At the same time, the base model can also adaptively mimic the behavior of SB for a specific MIP instance, and easily get updated based on the current state once for a while during the branch-and-bound search. Experiments on benchmark instances indicate that our model is able to produce more accurate variable ranking than the state-of-the-art method using SVM ranking[Khalil et al., 2016], even trained with fewer online training sample.
Stochastic Dynamics for Video Infilling [paper] [project site]
We introduce a stochastic generation framework (SDVI) to infill long intervals in video sequences. SDVI consists of two parts: (1) a bi-directional constraint propagation to guarantee the spatial-temporal coherency among frames, (2) a stochastic sampling process to generate dynamics from the inferred distributions. Experimental results show that SDVI can generate clear and varied sequences.
Variation Disentanglement Siamese Net [project site] [code]
A project to study unsupervised disintanglement of image representation: Supervised by Prof. Shih-Fu Chang:
- Designed a semi-supervised network that disentangles the label-related and variance representation of a manifold.
- Introduced a Shannon-Entropy based adversarial training counterpart to help the disentanglement, expelling any label-related information from the variance representation.
- Setting recognition and reconstruction task to keep the richness in the feature embedding.
- Implemented the proposed network and conducted the experiment on both MNIST and CASIA Face dataset. Conducted disentanglement, reconstruction, cross-stich evaluation and TSNE Clustering on learnt representation.
|Model||Feature space interpolation||Feature space clustering|
The project is leaded by Prof. John Kender, studying relationship between gestures of speakers and audiences’ attention. I have:
- Design and implemented distance matching algorithm based on “Time Warp Edit Distance” between every pair of eye tracking trajectories to get a eye-tracking distance matrix.
- Fine-tuned the hyper-parameter of TWED based algorithm by comparing eye movement trails.
- Clustered subjects based on “fast multi-scale detection of relevant communities” by using the distance matrix.
|Attention fixation||Eye trajectory and co-relation|