Statistical Inference of Motion in the Invisible
This work focuses on the unexplored problem of inferring motion of objects that are invisible to all cameras in a multiple camera setup. Given object trajectories within disjoint cameras’ FOVs (field-of view), we introduce constraints on the behavior of objects as they travel through the unobservable areas that lie in between. These constraints include vehicle following (the trajectories of vehicles adjacent to each other at entry and exit are time-shifted relative to each other), collision avoidance (no two trajectories pass through the same location at the same time) and temporal smoothness (restricts the allowable movements of vehicles based on physical limits). The constraints are embedded in a generalized, global cost function for the entire scene, incorporating influences of all objects, followed by a bounded minimization using an interior point algorithm, to obtain trajectory representations of objects that define their exact dynamics and behavior while invisible. Finally, a statistical representation of motion in the entire scene is estimated to obtain a probabilistic distribution representing individual behaviors, such as turns, constant velocity motion, deceleration to a stop, and acceleration from rest for evaluation and visualization. Experiments are reported on real world videos from multiple disjoint cameras in NGSIM data set, and qualitative as well as quantitative analysis confirms the validity of our approach.
In the above figure, the first image depicts the input to our method – correspondences across multiple disjoint cameras. In this case, there are five cameras, the FOV of cameras are shown with different colors whereas invisible region is represented by black. Given the input, we reconstruct individual trajectories using constraints introduced in this paper. Next, reconstructed trajectories are used to infer expected behavior at each location in the scene, shown as thick color regions, where the direction of motion is shown by HSV color wheel. We also infer different behaviors such as stopping and turning from the reconstructed trajectories.
- The input variables of the problem are the correspondences, i.e., a vehicle’s position, velocity, and time when it enters and exits the invisible region (or equivalently exits a camera’s field of view and enters another’s).
- A path (a set of 2d locations traversed by a vehicle) is obtained by connecting initial and final locations such that derivative of the path is computable at all points i.e. there are no sudden turns or bends. The path so obtained does not contain any information about time. Intersecting paths suggest collision is possible.
- Since inference of motion in invisible regions in a severely under-constrained problem, we impose some priors over the motion of vehicles as they travel through the region. These priors explained below, are used as constraints that will later allow us to reconstruct complete trajectories in the invisible region.
- Collision Avoidance: Since vehicles are driven by intelligent drivers who tend to avoid collisions with each other, this implies that probability a vehicle will occupy a location at particular time becomes low if the same location is occupied by another vehicle at that same time. Consider the two vehicle trajectories shown in (a) in the figure below where black trajectory shows a vehicle making a left-turn while vehicle with green trajectory moves straight. The corresponding 2d paths intersect at the point marked with a red sphere.
- Vehicle Following: This constraint reduces the solution space by making sure that relative positions of adjacent and nearby vehicles remain consistent throughout their travel in the invisible region. It is inspired from transportation theory, where vehicle following models describe the relationship between vehicles as they move on the roadway. Vehicle-following constraint enforces the condition that trajectories of vehicles adjacent to each other following the same path are time-shifted versions of each other, as can be seen in (c) in the figure below where red and yellow trajectories belong to the leading and following vehicles respectively.
- Smoothness: The smoothness constraint restricts the allowable movements of vehicles based on physical limits as it happens in real life. It prevents the solution from having abrupt acceleration or deceleration as well as sudden stops. In (d) below, the orange trajectory is has low smoothness cost whereas black trajectory has higher cost due to abrupt deceleration in the beginning.
- Stopping Behavior Localization: The above three constraints do not completely specify the solution because trivial solutions with high values of acceleration and deceleration can exist. This is possible when a vehicle is made to stop with high deceleration, stays there as long as possible before leaving the invisible region with high acceleration while satisfying the smoothness constraint. This constraint dictates that stopping point for a vehicle cannot be arbitrarily away from the possible collision locations, essentially localizing the move-stop-move events in space and time.
- In order to make the problem tractable, we reduce the parameters defining a trajectory to three: deceleration, duration of stopping time and acceleration. The constant velocity corresponds to the case when all parameters are zero. Thus, we can model all cases from constant velocity to complete stopping by varying values of these variables.
- We impose a prioritizing function on the trajectories using earlier exit first. The cost for each vehicle is the sum of costs due to collision, vehicle-following, and smoothness including penalty for trivial solution. The parameters are bounded, the cost is minimized through an Interior Point Algorithm with initialization provided by uniform grid search over the parameter space.
We ran our experiments on two datasets from NGSIM (see  for details). The first invisible region was from Lankershim 8:30am – 8:45am located at the intersection of Lankershim/Universal Hollywood Dr. (LA) with a total of 1211 vehicles passing through the region. The second invisible region was from Peachtree 4:00pm to 4:15pm located at the intersection of Peachtree/10th Street NE (Atlanta) with 657 vehicles passing through the region. Both intersections were typical four-legged with three possible paths that could be taken by a vehicle entering a particular leg, thus, resulting in 12 total paths. The following figure shows the trajectories that were output for both the datasets.
First, we present some qualitative results. In the figure below, the black trajectory corresponds to the vehicle under consideration while proximal vehicles which it could possibly collide with are shown in colors. In (a) and (c), the trajectories are drawn assuming constant velocity for each vehicle. In (a), the vehicle collides with one of the vehicles whereas in (c), vehicle under consideration collides with six different vehicles. The locations of collision are shown with red spheres partially invisible due to other vehicles. Notice the change in shape in (b) and (d) after inferring motion for all trajectories with the outcome that none of the trajectories collides with the black trajectory. Both vehicle-following and smoothness constraints are also visibly in effect in both the examples.
In the next figure, each row is an example of trajectory reconstruction. Vehicle under consideration is shown with squares, yellow depicts constant velocity, red is from proposed method and green square marks the ground truth. The rest of the vehicles are shown in black. In first row, reconstruction with constant velocity causes collisions at t = 381 and 521, and in the second row, between t = 1200 and 1500. On the other hand, proposed method and ground truth allow the vehicles to pass without any collision.
Finally, the following figure shows the error profile for our method (yellow) vs. constant velocity (black) for both datasets. As can be seen, our method has lower error (it has smaller magnitude), thus provides more accurate inference. (c) ROC curves for our method (solid) vs. constant velocity (dashed) for the Lankershim (red) and Peachtree (green). The x-axis is the distance threshold in feet while y-axis gives the percentage of points that lie within that threshold distance of the ground-truth.
In the following figure, each row is the Mixture of Gaussians representation for a particular path using constant velocity, proposed method and ground truth. The patterns in the second and third column are similar and capture acceleration, deceleration, start and stop behaviors whereas in first column, all Gaussians have the same variance due to constant velocity.
Scene Structure & Status Inference
Given the inferred motion and behavior of objects in the invisible regions, we propose to estimate some key aspects of the scene structure and status to, show the importance and usefulness of our framework, and allow evaluation. The following figure shows the stopping times (probability of green signal) for each of eight possible legs. The x-axis is time and y-axis in each graph is the probability from our method (blue) and groundtruth (black), which are evidently, perfectly aligned in time.
The probability maps for stopping positions inferred for both data sets are shown in the following figure, which are correct as vehicles in reality stop and queue before the signal.
Haroon Idrees, Imran Saleemi, and Mubarak Shah, Statistical Inference of Motion in the Invisible, 12th European Conference on Computer Vision (ECCV), Florence, Italy, October 7-13, 2012. [Video of Presentation]