Identifying Behaviors in Crowd Scenes
Paper: | Identifying Behaviors in Crowd Scenes Using Stability Analysis for Dynamical Systems |
Contact: | Berkan Solmaz, Brian E. Moore, and Mubarak Shah |
Figure 1. Five specific collaborative behaviors in crowds.
Videos of crowd scenes present challenging problems in computer vision. High object-densities in real-world situations make individual object recognition and tracking impractical; understanding crowd behaviors, without knowing the actions of individuals, is often advantageous. Automated detection of crowd behaviors has numerous applications, such as prediction of congestion, which may help avoid unnecessary crowding or clogging, and discovery of abnormal behaviors or flow, which may help avoid tragic incidents. The aim of this particular work is to identify five common and specific crowd behaviors (Fig.1), which we call bottlenecks, fountainheads, lanes, arches, and blocking. In the algorithm, a scene is overlaid by a grid of particles initializing a dynamical system defined by the optical flow. Time integration of the dynamical system provides particle trajectories that represent the motion in the scene; these trajectories are used to locate regions of interest in the scene. Linear approximation of the dynamical system provides behavior classification through the Jacobian matrix; the eigenvalues determine the dynamic stability of points in the flow and each type of stability corresponds to one of the five crowd behaviors. The eigenvalues are only considered in the regions of interest, consistent with the linear approximation and the implicated behaviors. The algorithm is repeated over sequential clips of a video in order to recordchanges in eigenvalues, which may imply changes in behavior. The method was tested on over 60 crowd and traffic videos.
Using a Lagrangian particle dynamics model, crowds can be treated as collections of mutually interacting particles.
Figure 2. A sample sequence (left), the computed optical flow (center), and the particle trajectories (right).
Consider a continuous dynamical system,
where x and y are the particle positions and u and v represent particle velocities in the x and y directions, respectively. A first step in understanding solution behavior for the system is finding critical points w* such that F(w*) = 0. Behavior of trajectories near a point w* is determined by linearizing the system about w*. To find a linearization, let z = w – w*, which means,
By Taylor’s theorem,
where JF denotes the Jacobian matrix for F,
F(w*) = 0 implies a linearization of the system about w*,
where the solutions are completely defined by the initial conditions and the eigenvalues of the matrix JF, which are solutions of a characteristic equation λ2 – τλ + Δ = 0, where τ is the trace and Δ is the determinant of the matrix. It is easy to show that
where λ1 and λ2 are the eigenvalues, yielding important information about the flow, as depicted in Fig.3.
Figure 3. Five flows corresponding to Δ and τ , along with the related crowd behaviors.
We can consider the flows arising from JF, as depicted in Fig.3, in connection with specific crowd behaviors.
Bottlenecks:
If Δ > 0 and τ < 0, then particle trajectories from many points converge to one location, i.e. many pedestrians or vehicles from various locations enter through one narrow passage. Hence, we define a bottleneck to be the mouth of any narrow passage through which pedestrians regularly pass.
Fountainheads:
When Δ > 0 and τ > 0, particle trajectories diverge from one location. This behavior is noticed when pedestrians leave a narrow passage, persisting in many separate directions, and we call the mouth of such a passage a fountainhead. This behavior is the opposite of a bottleneck, so fountainheads aredetected as bottlenecks in backward time.
Lane Formation:
In crowd situations, lanes of flow in opposite directions naturally form, as pedestrians moving against the flow step aside to avoid collision and end up moving with other pedestrians with the same general direction and speed. In such instances, the motion near an individual is negligible, relative to other nearby individuals, because they are all moving together. This is precisely the behavior we see in what we call a lane, and the behavior is well-described by non-isolated critical points, rendering Δ = 0 along the path of the lane.
Ring/Arch Formation:
Motion described by Δ > 0 and τ = 0 is characteristic of crowd flow that is curved or circular. This behavior may be typical of a crowd scene in which pedestrians must maneuver around obstacles, forming an arch. In this case, the eigenvalues of the Jacobian matrix are complex conjugates and we look for this eigenvalue response along oblique paths over which many trajectories may pass.
Blocking:
Local flows in which particles are bouncing off of each other in somewhat random directions, unable to proceed in the direction desired, is represented by Δ < 0. This is characteristic behavior of people in densely populated scenes where the surrounding crowd prevents the desired motion of many individuals. We define this behavior as blocking, because pedestrians moving in opposite directions block each other as crowd density increases, preventing advancement from either group. In some situations the density of the crowd may lead to gridlock and no particle motion, in which case the optical flow is zero.
The method was tested on real video sequences downloaded from the web, representing crowd and traffic scenes. To evaluate method performance, we compared detection against manually generated groundtruth, consisting of points for bottlenecks, fountainheads and blockings, and regions for lanes and arches on all videos. Following the PASCAL VOC challenge, detection accuracy is based on overlap of the detected region and groundtruth. For lanes and arches we require an overlap of more than 40%, a relaxation of the Pascal measure appropriate for our problem. Similarly, the region around points that identify bottlenecks, fountainheads, or blocking is required to overlap with the analogous region from groundtruth; we require that the Euclidean distance between the detected point and groundtruth be sufficiently small, typically within 40 pixels. The results are shown in Table 1. Some sample movies and outputs are shown in Fig.4.
Figure 4. Scenes from 15 real video sequences, each showing the behaviors that are detected by the method.
Berkan Solmaz, Brian Moore, and Mubarak Shah, Identifying Behaviors in Crowd Scenes Using Stability Analysis for Dynamical Systems, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2012.