In a pair of blog posts from Facebook today gave us more insight on how the Oculus Quest performs inside-out tracking. Facebook gave us detailed background information on how the the tracking made its way from team to team inside of Oculus, and how the labs from Zurich to Seattle gave us what we have today. A seamless transition from tethered to untethered, and a perfect balance of power and freedom.
We could sit here and try to tell you exactly how it works, but the truth is, we have never created a VR headset ourselves. We have a lot of experience reviewing them and playing Beat Saber, but we have yet to create our own virtual hardware. So instead of reading our writing that may or may not make sense about their technology, we have included the beef of the blog post from Facebook. You can see that below.
The Oculus Insight system uses a custom hardware architecture and advanced computer vision algorithms — including visual-inertial mapping, place recognition, and geometry reconstruction — to establish the location of objects in relation to other objects within a given space. This novel algorithm stack enables a VR device to pinpoint its location, identify aspects of room geometry (such as floor location), and track the positions of the headset and controllers with respect to a 3D map that is generated and constantly updated by Insight. The data used for this process comes from three types of sensors built into the Quest and Rift S hardware:
- Linear acceleration and rotational velocity data from IMUs in the headset and controllers are integrated to track the orientation and position of each with low latency.
- Image data from cameras in the headset helps generate a 3D map of the room, pinpointing landmarks like the corners of furniture or the patterns on your floor. These landmarks are observed repeatedly, which enables Insight to compensate for drift (a common challenge with IMUs, where even tiny measurement discrepancies build up over time, resulting in inaccurate location tracking).
- Infrared LEDs in the controllers are detected by the headset cameras, letting the system bound the controller position drift caused by integrating multiple IMUs.
As you move, Oculus Insight detects pixels in images with high contrast, such as the corners of a window. These high-contrast image regions are tracked and associated over time, from image to image. Given a long enough baseline of observations, Oculus Insight is able to triangulate the 3D position of each point in your surroundings. This forms the basis of the system’s 3D environment map.
As you can tell by reading that, the company has a firm grasp on how this was all put together, and also some great detailed descriptions for anyone trying to create something of their own. 3D maps have always been important to AR, and now VR is using the same technology with SLAM. Again, instead of us trying to explain SLAM to you, we will let the professionals do what they do best. Talk about technology and how it works. Check out their statement below.
Academic research has been done on SLAM techniques for several decades, but the technology has only recently become mature enough for consumer applications, such as driverless cars and mobile AR apps. Facebook previously released a version of SLAM for AR on mobile devices which uses a single camera and inertial measurement unit (IMU) to track a phone’s position and enable world-locked content — content that’s visually anchored to real objects in the world. Oculus Insight is the second generation of this library, and it incorporates significantly more information from a combination of multiple IMUs and ultra-wide-angle cameras, as well as infrared LEDs to jointly track the 6DoF position of a VR headset and controllers.