The described related work tracks motion using optical flow algorithms. It seems that those produce satisfying results but not yet cover the full potential of a AR tracking system. Others use interest-point based algorithms which are commonly known as very computational expensive. SIFT descriptors are probably the most used ones although they might belong the the most expensive ones. Nevertheless, some improvements have been achieved with SIFT and also SURF algorithms.
Similar to PTAM Wagner et. al also use a separated detection and tracking system. The detection system tries to find known targets in the currently available camera image using a modified SIFT algorithm. Instead of calculating the kind of expensive Differences of Gaussian (DoG) they use a FAST corner detection over multiple scales. Memory consumption is then reduced by using only 36-dimensional features instead of the original 128-dimensions of SIFT. Found descriptors are matched with entries from multiple spill trees, which is a similar data structure like the k-d-tree used in the original SIFT.
-- unfinished --
Based on the PTAM Project (here PTAM-d for PTAM running on a desktop computer), which originally runs on a computer, Klein and Murray adapted the code to run on an iPhone 3G (here PTAM-m for PTAM running on mobile device such as smartphone). Of course everything need to become more light-weighted so that it can actually be handled by the smartphone.
Bringing computational expensive programs to smartphones a couple of problems (or "challenges"). Obviously the CPU in such a small device is rather slow compared to a desktop computer. The iPhone used has a CPU with approx. 400 MHz, which is not much but yet quite a number for a small device (especially a couple of years ago). The next thing is the camera. It is also much slower compared to a webcam. The iPhone's camera has a frame rate below 15Hz (where a webcam might have double this number), plus it has a rolling shutter (so you might not get the whole frame at once but just a part of it). Klein et al. also argues that the camera has a narrow field-of-view and since the original PTAM requires a wide-angle camera (the demonstration videos look almost like they had fish-eye lenses) this meant even more adapting work.
One of the changes is the reduction of calculated map points. The computer version takes every pound found into consideration and adds it to the map. This can sum up to over 1000 map points. PTAM-m is limited to 35 points. This reduces processing costs significantly. To not loose to much accuracy an image pyramid is calculated, from each 240 by 320 pixels video frame, up to a level of five (with the size of 15 by 20 pixels). On this image pyramids corner detection (Shi-Thomasi corners) is performed. By limiting the number of simultaneous map points this just needs to work faster (or smoothly on a low-end CPU). There are more reductions and limitations compared to PTAM-d.
Tracking motion on a smartphone needs to address potential motion blur, which will occur quickly since the in-built cameras are kind of tiny and not as light sensitive as a webcam or even real cameras. The user will most likely move the device rather slowly, because the device is literary the display which the user wants to look at. Nevertheless, when the light condition is not perfect the camera will produce blurry (and crispy) images. Where PTAM-d does quite extensive work to compensate blurriness using feature search around FAST corners, PTAM-m omits this completely. PTAM-m performs a feature point search in the first level of the image pyramid evaluating a 4-pixel-radius around a feature using a zero-normalised sum of squared differences against an 8x8 pixel template.