Scene Understanding by Statistical Modeling of Motion Patterns
Introduction
We present a novel method for the discovery and statistical representation of motion patterns in a scene observed by a static camera. Related methods involving learning of patterns of activity rely on trajectories obtained from object detection and tracking systems, which are unreliable in complex scenes of crowded motion. We propose a mixture model representation of salient patterns of optical flow, and present an algorithm for learning these patterns from dense optical flow in a hierarchical, unsupervised fashion. Using low level cues of noisy optical flow, Kmeans is employed to initialize a Gaussian mixture model for temporally segmented clips of video. The components of this mixture are then filtered and instances of motion patterns are computed using a simple motion model, by linking components across space and time. Motion patterns are then initialized and membership of instances in different motion patterns is established by using KL divergence between mixture distributions of pattern instances. Finally, a pixel level representation of motion patterns is proposed by deriving conditional expectation of optical flow. Results of extensive experiments are presented for multiple surveillance sequences containing numerous patterns involving both pedestrian and vehicular traffic.Problem

Video Sequence:
 Static camera
 Structured scene
 High density crowds
 Mulitiple flows

Goal:
 Learn patterns of motion
 Statistical distribution

Applications:
 Anomaly detection, prior motion model, persistent tracking
Figure: Examples of scenes to be analyzed and desirable patterns
Gaussian Mixture Formulation
 Compute optical flow

Define

A single Gaussian approximates a motion blob
Process

Gaussian component estimation
 Temporal quantization
 Kmeans clustering in 4d space
 No optimization
 Insensitive to choice of K
 Numerous, low variance clusters

Component Filtering
 Optical flow is noisy
 Filter high directional variance components

Pattern Instance Estimation
 Sequences of components form spatiotemporal worms (instances)
 Pattern instances are temporally bounded
 A pattern itself is periodic

Intercomponent Transition
 Pattern instance occurs over several clips

Two components i and j form an instance if,
 i and j are temporally proximal,
 j is 'reachable' from i

Instance Learning

Define a planar graph G = (V, E)
 V = {components from all video clips}
 E = {probability value if temporally proximal}
 Weak connected component analysis on G
 Connected components are pattern instances
Figure: Left: One instance each from 4 patterns. Right: More instances for each of the 4 patterns.

Define a planar graph G = (V, E)

Motion Patterns
 Multiple Instances per pattern
 Each instance is a Gaussian mixture
 KL divergence defines similarity between instances
 Approximate with Monte Carlo sampling
 Graph connected analysis

Conditional Expectation of flow
 Compute conditional expected orientation / magnitude given a pixel
 Compute conditional expected orientation / magnitude given a pixel