The problem of calculating a flow field is in general undetermined due to the finite number of identifiable points. The solution is additionally non-unique due, for instance, to possible motion of the observer. This can lead to the connection between the observed motion in the image and the generating motion in three dimensions being ambiguous.
Definition of the Flow Field
Let i ( x , t ) be the image intensity as a function of time t and position,![]() |
[11] |
If an image element at position x at time t is at position x+δx at time t+δt and source element range, source element orientation and illumination conditions are unaltered, then intensity will be conserved at the transformed image location, with the result that
![]() |
[12] |
The optic flow field is defined by
![]() |
[13] |
.. with component form
![]() |
[14] |
To first order, in
![]() |
[15] |
Equation 12 may be rewritten by means of equation 15 as
![]() |
[16] |
If the intensity may be expanded to first order in the variations as a Taylor series, an assumption which will be invalid across discontinuities of intensity, then
![]() |
[17] |
Equating corresponding orders of
yields the motion constraint
equation of optic flow
![]() |
[18] |
A statement of the problem
An optic flow field, u(x,t), defined by equation 13, is a vector field which consists of a collection of smooth (analytic) segments separated by closed contours across which the flow is discontinuous. The discontinuity may be caused by an occluding edge, such as an object moving relative to a background, or be due to the motion of a part of an object, a rotating cube, for example, where the optic flow associated with two faces may be different, although both are derived from a single three-dimensional motion. Reliable estimates of the field may be made only in the vicinity of spatial changes of intensity, dots or edges, therefore only a sparse reliable estimate is available.The main problem is to convert a sparse flow field into a self-consistent dense flow field over the whole image. The problem is complicated by the initial values being subject to noise and the fact that if the initial estimate is derived from an intensity edge then only the component of flow that is normal to it may be obtained (the aperture problem). In addition, the contours of discontinuity of flow must be determined, since consistency does not apply across them.
We may think of the temporal sequence as a block of data in two spatial and one temporal dimensions, with the face of the block growing in the temporal direction. We wish to determine the optic flow, a vector field, u, defined throughout the data volume of (x,t), {-L/2 ≤ x,y ≤ L/2},
{0 ≤ t ≤ T}. Two distinct problems arise. Firstly when the flow field is entirely unknown and must be established ab initio, the initialisation problem. Secondly when the flow field is known for a set of previous time values, {0 ≤ t ≤ T}, and must be determined for t = T + Δt, the updating problem.
Initialisation
- Make an initial sparse estimate of the flow field in
the neighbourhood of intensity boundaries. Such an
estimate is inevitably noisy and subject to the aperture
problem
- Determine the location of possible flow boundaries
- Extend the sparse estimate to a dense flow field by
linear interpolation within the smooth flow segments
- Impose local consistency of the field by relaxing the
estimate, without, however, propagating the field across
possible flow boundaries
The underlying assumption of the above algorithm is that global consistency of flow follows from the overall imposition of local consistency. This assumption is consistent with the property of a piecewise smooth field.
Updating
For the problem of updating, a dense flow field exists for t=T, therefore any determination of flow at t = T + Δt must be consistent, not only with itself, but also with the field at t=T. Significant changes between the flows at T and T + Δt are caused by changes in the observed scene such that flow values may alter, flow boundaries may move, new flow segments may appear, either from the periphery of the image or because previously insignificant regions become resolved. The consistency requirement should be sufficiently flexible to allow for these possibilities.A solution to the problem of updating a flow field is as follows.
- Determine the location of the temporal boundaries at
time T + Δt.
- Determine the temporal and spatial gradients at T
+ Δt.
- Relax the flow field at time T to obtain
consistency with the intensity gradients at time T
+ Δt.
Updating is significantly simpler than initialisation. We shall consider the implementation of the procedures for obtaining the flow field in more detail, first initialisation and then updating.
Initialisation and Temporal Boundaries
The image is first segmented spatially into disjoint regions separated by closed contour boundaries. The components of flow vectors normal to the boundaries are then obtained from ratios of the temporal and spatial derivatives of intensity. Each component has a corresponding measure of reliability, derived from the spatial edge strength at the boundary. It is necessary that the spatial segmentation be robust and accurate since it is a prerequisite for the validity of the succeeding flow analysis. The method of spatial segmentation is based on spatial co-occurrence segmentation followed by relaxation [HB90] , and the use of the expectation-maximisation algorithm for co-occurrence parameter analysis.An initial estimate of the flow
![]() |
[19] |
is given by
![]() |
[20] |
since this assignment satisfies the optic flow equation (18).
Spatial derivatives
Previously, the spatial derivatives were calculated using a one-dimensional operator for δ/δX and δ/δY. The effect of this is to render them very sensitive to noise and signal intensity variation across their input values. The operators have been corrected to conform to the `bow-tie' operator style, thus introducing a degree of spatial averaging. The Spacek operator is first calculated in one dimension as before, but now it is spread across a bow-tie operator, then all the contributions are subject to weightings calculated from the binomial expansion coefficients in the perpendicular direction to the axis. This is shown for a 5-point spacek with respect to X in Figure . Temporal averaging is achieved by doing the same operation over a temporal range of frames centred around the required frame, summing the results and dividing by the number of frames summed. The Y operator is similar but rotated through 90 degrees.
5-point Spacek X operator |
→ |
Extended into Y |
| ↓ | ||
After binomial weightings |
← |
After 'bow-tie' operation |
Figure 3 : Spatial derivative calculation
The bow-tie operator was chosen because it separates the information used for the δ/&deltaX and δ/&deltaY calculations. The binomial weightings ensure that pixels common to both direction operators will be given as similar a weighting as possible when the information gained from them is combined in both X and Y.
Temporal derivatives
The Temporal derivative was similarly one-dimensional. We have extended this to be in effect a three-dimensional Spacek operator, as shown in Figure 4. The central (red) squares are first given the normal spacek values then these are propogated out to fill their spacek frame, finally the binomial weightings are applied in both X and Y. In this fashion we average the time derivative over X and Y.
Figure 4 : Temporal derivative layout
The spatial derivatives are two-dimensionally averaged versions of a Canny edge detector obtained by taking the outer product of an N-element Canny operator with a smoothing kernel such as that defined by the coefficients of the binomial expansion of 2(-N+1)(1+1)(N-1). The time derivatives are given by the outer product of
![]() |
[21] |
where E is the Canny operator acting in the time direction, and a similar smoothing kernel in the two spatial dimensions, while
denotes convolution.
Initial Estimates
The initial estimates are reliable only in the vicinity of strong edges, hence only those pixels for which![]() |
[22] |
exceeds a threshold are accepted as pixels from which flow may be estimated. The probability distribution of velocity for such pixels is taken as Gaussian, with mean given by equation 20 and standard deviation inversely proportional to edge strength, following from equation 22. Initial velocity distributions for pixels within a region are also assumed to be Gaussian, having means obtained by interpolation from the values on the boundary, and standard deviations again following from equation 22.
Further, there is an assumption in the flow-relaxation that edges will be step edges, and not roof or valley edges. These edges are shown in figure 5, the blue edges have neighbours that are either both above or both below themselves, this is referred to as a roof or valley edge. Problems are caused because the derivatives will then change very dramatically from one pixel to the next, leading to estimates of flow which also vary dramatically over a small spatial range.
Allowable step profile |
Roof/valley profiles |
|
| Figure 5 : Edge types | ||
We counter these occurrences by taking a 3x3 median filter of the values for flow around the pixels which are on edge. This results in a smoother transition for the flow vectors, but does not affect the non-boundary pixels.
The probability for a pixel to be interior or boundary to a region of smooth optic flow with respect to a given direction follows from the position of the intensities of the corresponding Δ-pair of pixels in the temporal co-occurrence matrix. As a first approximation it is proportional to their displacement from the leading diagonal.
As with other methods of flow determination, [Horn] [nagel1] [nagel2] [uras1] the initial estimates are reliable only in the vicinity of prominent edges. Computational effort and hence time is required to propagate the flow into the interior of a bland region. We surmount this problem by using the intensity segmentation into regions already available and assuming that the flow within such regions is smooth, so that an initial estimate of flow may be made by interpolation from the values at the boundary. The adjustment made by relaxation is then greatly reduced, with consequent easing of the computational burden. In addition the relaxation can incorporate information from the previous frames, thereby ensuring interframe consistency and further accelerating the convergence of the relaxation.
Interpolation
Consider a pixel at (x,y) within a closed contour. Suppose that the components of the flow has been determined everywhere on the boundary from the ratios of temporal and spatial intensity gradients, together with associated reliabilities, expressed via their standard deviations which are proportional to the edge strengths in the two coordinate directions. The initialising estimate of the flow at the pixel is a weighted average of the four flow values which are at the nearest boundary points in the cardinal directions together with the flow estimate at (x,y) taken from the ratios of the local values of the intensity gradients. The weighting factors are inversely proportional to the distance to the boundary point and directly proportional to the reliability of the flow estimate. This initial estimate is sufficient to initialise the flow field of an image sequence.Let the pixel of interest have coordinates (x0,y0) and let the nearest boundary pixels have coordinates (x1,y1),… ,(x4,y4) respectively. This is illustrated in Figure 6.
Figure 6 : Interpolation of flow from the boundary points |
Each pixel has an initial estimate of the flow field,
![]() |
[23] |
obtained from equation 20, together with estimates of the reliabilities of the components, expressed via standard deviations, with
![]() |
[24] |
Interpolation estimates of the value of the flow components at (x0,y0) are given in terms of those of its neighbours as
![]() |
[25] |
where
![]() |
[26] |
The above estimates of the flow at (x0,y0) which have been obtained by interpolation from the boundary values are combined with the estimate obtained from equation 20
![]() |
[27] |
with corresponding standard deviations, σx0 and σy0 respectively, to yield the combined estimates
![]() |
[28] |
These estimates are now input to a relaxation labelling algorithm in order to achieve overall consistency of flow estimation.
Determination of the Temporal Boundaries
The representation of the image sequence data as a 2+1 dimensional block, two spatial and one temporal, makes it natural to consider the statistics of adjacent pixels, both spatially and temporally, as expressed via the co-occurrence matrices. Just as a spatial intensity boundary is characterised by an off-diagonal distribution in a spatial co-occurrence matrix, so too is a temporal intensity boundary by an off-diagonal peak in a temporal matrix. Flow boundaries can only occur at temporal boundaries, therefore the possible flow boundaries may be identified by a co-occurrence segmentation based on temporal co-occurrence matrices. In the initial relaxation of flow any temporal co-occurrence boundary is treated as a flow boundary, in as much as flow is not propagated across it.| ← Segmentation | Relaxation theory → |

















