One of the more interesting parts of the ball catching robot is the algorithm used to estimate the 3D trajectory of the ball while it is flying through the air. This post describes in more detail how this algorithm works.
To start with, let’s define some assumptions and known parameters:
- The object we wish to estimate the trajectory of is flying through the air according to some parametric model
$ \mathbf{traj}(t) $
mapping a time$ t $
to a 3D position$ \mathbf{x} $
. To describe the algorithm, we use a simple trajectory description$ \mathbf{traj}(t)=-\frac{1}{2} \mathbf{g} t^2 + \mathbf{v}_0 t + \mathbf{x}_0 $
, where$ \mathbf{g} $
is the acceleration due to gravity, and$ \mathbf{x}_0 $
and$ \mathbf{v}_0 $
are the initial position and velocity of the object.$ \mathbf{g} $
,$ \mathbf{x}_0 $
, and$ \mathbf{v}_0 $
are all 3D vectors. - We have a set of cameras and object detection algorithms that can determine the 2D position of the object to estimate the trajectory of at some sampling rate over time. In other words, each camera gives a sequence of 2D position and time pairs describing where and when the camera observed the object.
- We have a function
$ \mathbf{proj}(\mathbf{x}) $
mapping a 3D position$ \mathbf{x} $
to the 2D position observed by a camera. This requires calibrating the cameras to account for the lens projection, and using some rigid body transformation to describe the orientation and position of the camera.
The algorithm should allow for more advanced trajectory models (as long as they are parametric on $ t $
), but this trajectory model is sufficient for my application. If you are trying to intercept an ICBM, this won’t be good enough… but if that describes you, hopefully you aren’t learning anything new from a blog post about a LEGO robot.
Note that the algorithm works for any number of cameras, including even just one camera. However, the optimization problem will be more stable with more cameras.
With this information, we can attempt to estimate the trajectory of an object being observed by the cameras. The algorithm works by finding the trajectory that minimizes reprojection error. The basic idea is to minimize the error between the estimated trajectory and the actual trajectory. Since we don’t know the actual trajectory, we instead measure error of the 2D projections of the observed and estimated trajectory, which are both known. Given a trajectory estimate $ \mathbf{traj}(t) $
, the total reprojection error $ f $
is defined by:
$$ f(\mathbf{x}_0, \mathbf{v}_0) = \sum_{{cam} \in \mathbf{C}} \sum_{ob \in \mathbf{O}_{cam}} \mathbf{d}(\mathbf{x}_{ob}, \mathbf{proj}_{cam}(\mathbf{traj}(t_{ob})))^2 $$
where:
$ \mathbf{d}(\mathbf{x}, \mathbf{\hat x}) $
is the euclidean distance between$ \mathbf{x} $
and$ \mathbf{\hat x} $
,$ \mathbf{C} $
is the set of cameras,$ \mathbf{O}_{cam} $
is the set of observations from camera$ cam $
, and$ t_{ob} $
and$ \mathbf{x}_{ob} $
are the time and position of the observation$ ob $
.
Conceptually, this is computing the sum of the squared 2D distances between the projection of the estimated 3D position of the object $ \mathbf{traj}(t_{ob}) $
at the time corresponding to the observation, and the 2D observation $ \mathbf{x}_{ob} $
.
Any suitable optimization technique can be used to find the trajectory minimizing the error $ f $
. The variables being minimized are the parameters of the trajectory $ \mathbf{x}_0 $
and $ \mathbf{v}_0 $
. I used Levenberg-Marquardt to minimize $ f $
.
A useful extension to this method is enabling using observations from cameras with an unknown synchronization. The algorithm described above assumes we have a globally synchronized clock to record the time of each observation. We can relax by only requiring that each camera have its own consistent clock that it reports observations from, and adding a per-camera offset to the time of each observation to find the shift between the clocks of the cameras:
$$ f(\mathbf{x}_0, \mathbf{v}_0, dt) = \sum_{cam \in \mathbf{C}} \sum_{ob \in \mathbf{O}_{cam}} \mathbf{d}(\mathbf{x}_{ob}, \mathbf{proj}_{cam}(\mathbf{traj}(t_{ob} + dt_{cam})))^2 $$
where $ dt_{cam} $
is the time offset for each camera. Allowing the offset of all cameras to float results in an ambiguous set of solutions for the time offset. To avoid this, fix the offset of the first camera, $ dt_0 = 0 $
. When minimizing this new function, in addition to the estimating trajectory, we also estimate the time shift $ dt $
for each camera relative to the first camera.
My implementation of this algorithm for the LEGO ball catching robot can be found on github.