Visual Saliency Detection Using Group Lasso Regularization in Videos of Natural Scenes
Visual saliency is the ability of a vision system to promptly select the most relevant data in the scene and reduce the visual data that needs to be processed. Thus, its applications for complex tasks such as object detection, object recognition and video compression have attained interest in computer vision studies.
In this paper, we introduce a novel unsupervised method for detecting visual saliency in videos of natural scenes.
High-saliency areas of a natural scene are the small portions that hold most important information and can be identied easily by human vision system.
An overview of proposed approach is depicted in the following figure.
We begin with extracting the feature matrix, X, of a video and segmenting the video into super-voxels. A dictionary, D, is learned online and then the video is represented by F in terms of coecients Y obtained from group lasso regularization over the dictionary. Afterward, salient parts, represented by Sparse matrix (S), and non-salient parts (L) are achieved via low-rank minimization technique (Robust PCA). Finally, a saliency map is generated based on L1 norm of columns of matrix S belonging to super-voxels.
Then, the video is represented as coecients of atoms from a dictionary learned from the feature data matrix of a video, and decomposed into salient and non-salient parts.
We propose to use group lasso regularization to find the sparse representation of a video, which benefits from grouping information provided by super-voxels and extracted features from the cuboids. We find saliency regions by decomposing the feature matrix of a video into low-rank and sparse matrices by using Robust Principal Component Analysis (RPCA) matrix recovery method.
We have evaluated our method on four different data sets: INB dataset, which consists of 18 high-resolution movie clips of natural outdoor scenes, UCF Sports Action data, UCF Saliency data set, and Hollywood2 Actions dataset. This is a large scale dataset with camera motion and clutter. Some qualitative results are shown in in the following figure:
Examples of frames from (a) UCF Sport data set videos, (b) super-voxels, (c) our results showing most salient regions plus gaze points shown in red considering calibration errors.
We have used the same experiments setup as aforementioned in the configuration for INB data set. In all data sets, we have shown that out method gives better results in terms of AUC.
AUC scores for videos in UCF Sports data set based on Default-Labeling configuration.
Nasim Souly, Mubarak Shah, Visual Saliency Detection Using Group Lasso Regularization in Videos of Natural Scenes, International Journal of Computer Vision, Aug 2015.