Based on the PTAM Project (here PTAM-d for PTAM running on a desktop computer), which originally runs on a computer, Klein and Murray adapted the code to run on an iPhone 3G (here PTAM-m for PTAM running on mobile device such as smartphone). Of course everything need to become more light-weighted so that it can actually be handled by the smartphone.
Bringing computational expensive programs to smartphones a couple of problems (or "challenges"). Obviously the CPU in such a small device is rather slow compared to a desktop computer. The iPhone used has a CPU with approx. 400 MHz, which is not much but yet quite a number for a small device (especially a couple of years ago). The next thing is the camera. It is also much slower compared to a webcam. The iPhone's camera has a frame rate below 15Hz (where a webcam might have double this number), plus it has a rolling shutter (so you might not get the whole frame at once but just a part of it). Klein et al. also argues that the camera has a narrow field-of-view and since the original PTAM requires a wide-angle camera (the demonstration videos look almost like they had fish-eye lenses) this meant even more adapting work.
One of the changes is the reduction of calculated map points. The computer version takes every pound found into consideration and adds it to the map. This can sum up to over 1000 map points. PTAM-m is limited to 35 points. This reduces processing costs significantly. To not loose to much accuracy an image pyramid is calculated, from each 240 by 320 pixels video frame, up to a level of five (with the size of 15 by 20 pixels). On this image pyramids corner detection (Shi-Thomasi corners) is performed. By limiting the number of simultaneous map points this just needs to work faster (or smoothly on a low-end CPU). There are more reductions and limitations compared to PTAM-d.
Tracking motion on a smartphone needs to address potential motion blur, which will occur quickly since the in-built cameras are kind of tiny and not as light sensitive as a webcam or even real cameras. The user will most likely move the device rather slowly, because the device is literary the display which the user wants to look at. Nevertheless, when the light condition is not perfect the camera will produce blurry (and crispy) images. Where PTAM-d does quite extensive work to compensate blurriness using feature search around FAST corners, PTAM-m omits this completely. PTAM-m performs a feature point search in the first level of the image pyramid evaluating a 4-pixel-radius around a feature using a zero-normalised sum of squared differences against an 8x8 pixel template.
The paper describes the methods and techniques used in the PTAM project.
Usually AR systems only work probably when the running system has a certain knowledge of the surrounding area. This knowledge could pre-acquired information of an object or a know marker (a marker could for example a monochrome pattern that is put, like a sticker, onto an object). Those AR systems then can recognize and track their known object in the video stream, but just as long as it's within the view area of the camera. As soon as the object moves away and is not visible any more, the tracking stops.
PTAM works in completely unknown environments and uses "extensible tracking" techniques to achieve this. It requires a static and somewhat small environment, meaning that it is not designed to track and augment constantly moving objects or being moved (by the user) across a big area like a city. Other systems which perform tracking and mapping mostly do each task directly linked, at every video keyframe. Because most of those systems are being used in the robotics field this approach might be satisfying since robots tend to be moving slow in a predictable manner. But this is most likely not the case for a camera held by an user (who doesn't know or even care what is happening inside the device she is holding, but just want it to work smoothly and accurately). PTAM splits tracking and mapping into two independent tasks which perform in different threads. By having the mapping process separated it is not necessary to work on each new keyframe, which usually includes processing lots of redundant information, and by that have more useful and more accurate information. The approach within PTAM is kind of adopted by SLAM. Where the original SLAM methods (in the robotics field) use laser sensors, in this case we obviously don't have lasers but only one camera -- that's why we talk here about "monocular SLAM".
It took quite some time to get everything working. Here is a quick break-down. Compling works fine. But PTAM isn't doing anything meaningful so far. Everything was done using Visual Studio 2010 Ultimate.
- get the PTAM source code
- unpack and read the containing README.txt
- get all dependencies named in the README.txt
- gvars3 (see 3. )
- libvcd (see 3. )
- glew 1.6.0
- I noticed that the code revision for gvars3 and libvcd, which is referenced in the README.txt, dosen't work quite well. So I took the HEAD revision (date: 19.05.2011)
- unpack all dependencies (some come as self-extracting or install packages, so it is necessary to watch where the files go and move all the one place)
- adjust the include/library paths of libvcd and gvars3 and compile both (using the included VC2005 projects)
- gvars3 include:
- libcvd include:
- libcvd lib:
- (from now on everything in the PTAN project) adjust the include/lib paths also
- include paths:
include;$(ProjectDir)..\ptam dependencies;$(ProjectDir)..\ptam dependencies\gvars3;$(ProjectDir)..\ptam dependencies\pthreads\Pre-built.2\include;$(ProjectDir)..\ptam dependencies\libcvd
- lib paths:
$(ProjectDir)..\ptam dependencies\pthreads\Pre-built.2\lib;$(ProjectDir)..\ptam dependencies\gvars3\lib;$(ProjectDir)..\ptam dependencies\libcvd\lib;lib
- open SymEigen.h and search for
//calculate eigenvalues (at line 184)
replace in the next line the two calls of
/FORCE:MULTIPLE to the linker options (suggested by JC) and compile
- copy the following dlls from the dependencies to the Release folder
- lapack_win32_MT.dll, blas_win32_MT.dll
- run PTAM.exe