李坤 天津大学 (Kun Li Tianjin University

 

 

 

 

Kun Li 李坤  中文版Chinese Version   

Associate Professor

Doctoral Supervisor

School of Computer Science and Technology,

Tianjin University (Peiyang University)

Room B422, Building 55, Tianjin University, Jinnan District, Tianjin 300350, China


E-Mail:   lik@tju.edu.cn     

 

 

 

 

 

 

 

 

 

 


Brief Introduction

Kun Li received the B.E. degree from Beijing University of Posts and Telecommunications, Beijing, China in 2006, the Master and Ph.D. degrees from Tsinghua University, Beijing, China in 2011. She is an Associated Professor at School of Computer Science and Technology, Tianjin University, China. She visited EPFL in Lausanne, Switzerland from July to Aug. 2012 and from Oct. 2014 to Oct. 2015, respectively. Her research interests include dynamic scene 3-D reconstruction and image/video processing. In these fields, she published over 25 research papers in peer-reviewed journals and conferences (such as IEEE TIP/JSTSP/TCSVT/ TCybernetics, IEEE ICME/ICIP/ECCV). She served as reviewers in several journals (such as IEEE TVCG/TMM/TCybernetics) and TPC members in a number of conferences. She was selected into the Elite Peiyang Scholar Program and Reserved Peiyang Scholar Program of Tianjin University, in 2012 and 2016, respectively.


Call for Papers: ICME 2017 Special Session: "3D Content Creation for Virtual Reality".


Education


Professional Experience


Research Interests


Research Projects

Sparse Non-rigid Registration of 3D Shapes

Non-rigid registration of 3D shapes is an essential task of increasing importance as commodity depth sensors become more widely available for scanning dynamic scenes. Non-rigid registration is much more challenging than rigid registration as it estimates a set of local transformations instead of a single global transformation, and hence is prone to the overfitting issue due to underdetermination. The common wisdom in previous methods is to impose an L2-norm regularization on the local transformation differences. However, the L2-norm regularization tends to bias the solution towards outliers and noise with heavy-tailed distribution, which is verified by the poor goodness-of-fit of the Gaussian distribution over transformation differences. On the contrary, Laplacian distribution fits well with the transformation differences, suggesting the use of a sparsity prior. We propose a sparse non-rigid registration (SNR) method with an L1-norm regularized model for transformation estimation, which is effectively solved by an alternate direction method (ADM) under the augmented Lagrangian framework. We also devise a multi-resolution scheme for robust and progressive registration. Results on both public datasets and our scanned datasets show the superiority of our method, particularly in handling large-scale deformations as well as outliers and noise.

SPA: Sparse Photorealistic Animation Using a Single RGB-D Camera

Photorealistic animation is a desirable technique for computer games and movie production. We propose a new method to synthesize plausible videos of human actors with new motions using a single cheap RGB-D camera. A small database is captured in a usual office environment, which happens only once for synthesizing different motions. We propose a marker-less performance capture method using sparse deformation to obtain the geometry and pose of the actor for each time instance in the database. Then, we synthesize an animation video of the actor performing the new motion that is defined by the user. An adaptive model-guided texture synthesis method based on weighted low-rank matrix completion is proposed to be less sensitive to noise and outliers, which enables us to easily create photorealistic animation videos with new motions that are different from the motions in the database. Experimental results on the public dataset and our captured dataset have verified the effectiveness of the proposed method.

Foreground-Background Separation From Video Clips via Motion-assisted Matrix Restoration

Separation of video streams into foreground and background components is a useful and important technique in video analysis, making recognition, classification and scene analysis more efficient. In this paper, we propose a motion-assisted matrix restoration (MAMR) model for foreground-background separation from video clips. The backgrounds across frames are modeled by a low-rank matrix, while the foreground objects are modeled by a sparse matrix. To facilitate efficient foreground-background separation, a dense motion field is estimated for each frame, and mapped into a weighting matrix which indicates the likelihood that each pixel belongs to the background. Anchor frames are selected in the dense motion estimation to overcome the difficulty of detecting slowly-moving objects and camouflages. Then the foreground is computed by the background subtraction technique using the recovered background image. In addition, we extend our model to a robust MAMR model (RMAMR) which is robust to noise for practical applications. In the experiment, we compare our MAMR and RMAMR models with other state-of-the-art methods on challenging datasets. Experimental results demonstrate that our method is quite versatile for surveillance videos with different types of motions and lighting conditions, and outperforms many other state-of-the-art methods.

Keywords: Background extraction, optical flow, motion detection, matrix restoration, video surveillance

Non-Rigid Structure from Motion via Sparse Representation

This work proposes a new approach for non-rigid structure from motion with occlusion, based on sparse representation. We address the occlusion problem based on the latest developments on sparse representation: matrix completion, which can recover the observation matrix that has high percentages of missing data and can also reduce the noises and outliers in the known elements. We introduce sparse transform to the joint estimation of 3D shapes and motions. 3D shape trajectory space is fit by wavelet basis to achieve better modeling of complex motion. Experimental results on datasets without and with occlusion show that our method can better estimate the 3D shapes and motions, compared with state-of-the-art algorithms.

Video Super-resolution Using an Adaptived Superpixel-guided Auto-Regeressive Model

This work proposes a video super-resolution method based on an adaptive superpixel-guided auto-regressive (AR) model. The key-frames are automatically selected and super-resolved by a sparse regression method. The non-key-frames are super-resolved by simultaneously exploiting the spatio-temporal correlations: the temporal correlation is exploited by an optical flow method while the spatial correlation is modelled by a superpixel-guided AR model. Experimental results show that the proposed method outperforms the existing benchmark in terms of both subjective visual quality and objective peak signal-to-noise ratio (PSNR). The running time of the proposed method is the shortest in comparison with the state-of-the-art methods, which makes the proposed method suitable for practical applications.

Graph-based Segmentation for RGB-D Data Using 3-D Geometry Enhanced Superpixels

With the advances of depth sensing technologies, color image plus depth information (referred to as RGB-D data hereafter) are more and more popular for comprehensive description of 3-D scenes. This paper proposes a two-stage segmentation method for RGB-D data: 1) oversegmentation by 3-D geometry enhanced superpixels; and 2) graph-based merging with label cost from superpixels. In the oversegmentation stage, 3-D geometrical information is reconstructed from the accompanied depth map. Then, a K-means-like clustering method is applied on the RGB-D data for oversegmentation using an 8-D distance metric constructed from both color and 3-D geometrical information. In the merging stage, treating each superpixel as a node, a graph-based model is set up to relabel the superpixels into semantically-coherent segments. In the graph-based model, RGBD proximity, texture similarity, and boundary continuity are incorporated into the smoothness term to exploit the correlations of neighboring superpixels. To obtain a compact labeling, the label term is designed to penalize labels linking to similar superpixels that likely belong to the same object. Both the proposed 3-D geometry enhanced superpixel clustering method and the graph-based segmentation method from superpixels are evaluated by quantitative results and visual comparisons. By fusion the color and depth information, the proposed methods achieve superior segmentation performance over several state-ofthe-art algorithms.

Keywords: Graph-based segmentation, RGB-D Data, Superpixel oversegmentation

Color-Guided Depth Recovery From RGB-D Data Using an Adaptive Autoregressive Model

This work proposes an adaptive color-guided autoregressive (AR) model for high quality depth recovery from low quality measurements captured by depth cameras. We observe and verify that the AR model tightly fits depth maps of generic scenes. The depth recovery task is formulated into a minimization of AR prediction errors subject to measurement consistency. The AR predictor for each pixel is constructed according to both the local correlation in the initial depth map and the nonlocal similarity in the accompanied high quality color image. We analyze the stability of our method from a linear system point of view, and design a parameter adaptation scheme to achieve stable and accurate depth recovery. Quantitative and qualitative results show that our method outperforms four state-of-the-art schemes. Being able to handle various types of depth degradations, the proposed method is versatile for mainstream depth sensors, ToF camera and Kinect, as demonstrated by experiments on real systems.

Keywords: Depth recovery (upsampling, inpainting, denoising), autoregressive model, ToF camera, Kinect

Temporal-Dense Dynamic 3D Reconstruction with Low Frame Rate Cameras

Temporal-dense 3D reconstruction for dynamic scenes is a challenging and important research topic in signal processing. Although dynamic scenes can be captured by multiple high frame rate cameras, high price and large storage are still problematic for practical applications. To address this problem, we propose a new method for temporal-densely capturing and reconstructing dynamic scenes with low frame rate cameras, which consists of spatio-temporal sampling, spatio-temporal interpolation, and spatio-temporal fusion. With this method, not only shapes but also textures are recovered. This method can extend to temporal-denser reconstruction by simply adding more cameras or using higher frame rate cameras. Experimental results show that temporal-dense dynamic 3D reconstruction can be achieved with low frame rate cameras by our proposed method.

3D Motion Estimation via Matrix Completion

3D motion estimation from multi-view video sequences is of vital importance to achieve high quality dynamic scene reconstruction. In this paper, we propose a new 3D motion estimation method based on matrix completion. Taking a reconstructed 3D mesh as the underlying scene representation, this method automatically estimates motions of 3D objects. A “separating + merging” framework is introduced to multi-view 3D motion estimation. In the separating step, initial motions are first estimated for each view with a neighboring view. Then, in the merging step, the motions obtained by each view are merged together and optimized by low-rank matrix completion method. The most accurate motion estimation for each vertex in the recovered matrix is further selected by three spatio-temporal criteria. Experimental results on datasets with synthetic motions and real motions show that our method can reliably estimate 3D motions.
Markerless Shape and Motion Capture from Multi-view Video Sequences
This work proposes a new markerless shape and motion capture approach from multi-view video sequences. The shape recovery method consists of two steps: separating and merging. In the separating step, the depth map represented by the point cloud for each view is generated by a proposed variational model, which is regularized by four constraints to ensure the accuracy and completeness of the reconstruction. Then, in the merging step, the point clouds of all the views are merged together and reconstructed into a 3D mesh using a marching cubes method with silhouette constraints. The proposed shape recovery method is tested on two publicly available datasets and the datasets captured by a real multi-camera system. Experiments show that the geometric details are faithfully preserved in each estimated depth map. The 3D meshes reconstructed from the estimated depth maps are watertight and present rich geometric details, even for non-convex objects. Taking the 3D mesh as the underlying scene representation, a robust volumetric deformation technique is proposed to automatically capture motion of 3D objects, especially human performances. Our tracking method can capture non-rigid motions even for arbitrarily dressed humans without the aid of markers. Overall, the proposed shape and motion capture method is accurate, conceptually simple, and easy to implement.

Collaborative Color Calibration for Multi-Camera Systems

This work proposes a collaborative color calibration method for multi-camera systems with a novel omnidirectional color checker. The designed cylindrical color checker contains a periodic array of color patches, and is visible for all the cameras without manual adjustment. For color calibration, accurate global correspondences are first generated by local descriptors and areabased correlation methods. Then, the multi-camera color calibration problem is formulated as an overdetermined linear system, in which the dynamic range shaping is incorporated to ensure the high contrasts of captured images. The cameras are calibrated with the parameters obtained by solving the linear system. According to experimental results on both synthetic and real-system datasets, the proposed method shows high performance in achieving inter-camera color consistency and high dynamic range. Thanks to the omnidirectionality of the designed color checker and the generality of the linear system formulation, the proposed method is applicable to various multi-camera systems.

Multi-Camera and Multi-Lighting Dome


In the lab where I am a Ph.D. candinate, we construct a dome to record the geometry, texture and motion of human actors in a dedicated multiple-camera studio with controlled lighting and a chromakey background. The diameter of the dome is 6 meters which provides enough space for character perform. 40 PointGrey flea2 cameras are ring-shape arranged on the dome and 320 LEDs are evenly spaced on the hemisphere of the dome. This system is cooperated with Bennett Wilburn in MSRA.

Color Transfer Based on Wavelet Transform
A color transfer technique based on Adaptive Directional Wavelet Transform with Quincunx Sampling (ADWQS) is proposed to transfer color from a color reference image to a grayscale target image. This technique requires no human intervention. We first transform both the reference and the target images into wavelet domain using Adaptive Directional Wavelet with Quincunx Sampling (ADWQS), and then we apply the block-based matching technique to complete the color transfer process. The inherent smoothing property of wavelet transform greatly reduces the mismatches in color transfer process. Moreover, we find that searching the best matching only in LL subband speeds up the algorithm five times and meanwhile maintains the good performance. In addition, the proposed method has no constraint on the image size, i.e., the color reference image and the grayscale target image can be of different sizes. Finally, we have compared the proposed scheme with the method without using wavelet transform and those based on conventional discrete wavelets and other directional wavelets. ADWQS yields the best color transfer effect, which benefits from its symmetrical characteristic and directional selectivity.
Image Compression Based on Directional Wavelets
Discrete wavelet transform is an effective tool to generate scalable stream, but it cannot efficiently represent edges which are not aligned in horizontal or vertical directions, while natural images often contain rich edges and textures of this kind. Hence, recently, intensive research has been focused particularly on the directional wavelets which can effectively represent directional attributes of images. Specifically, there are two categories of directional wavelets: redundant wavelets (RW) and adaptive directional wavelets (ADW). One representative redundant wavelet is the dual-tree discrete wavelet transform (DDWT), while adaptive directional wavelets can be further categorized into two types: with or without side information. What’s more, adaptive directional wavelets with side information mainly have two sampling modes: orthogonal sampling and quincunx sampling. We compared their directional basis functions and image compression performances. Our experiments show that adaptive directional wavelet without side information has the worst performance. This is because its directional selectivity is significantly worse than with side information, the selectivity is further affected by quantization. On the whole, directional wavelets have good performances. Modification of context-based arithmetic coding based on the direction features of directional wavelets may improve their compression performances.

 


Selected Publications

Journal

Conference


Honors and Awards


Teaching


Academic Activities


Dataset


Code


 

Links

School of Computer Science and Technology

Tianjin University (Peiyang University)

Compressive Sensing Resources

ICME 2017 Special Session: 3D Content Creation for Virtual Reality


 

招收博士硕士研究生、本科实习生中……