You Can Now Generate Meta’s Photoreal Avatars With an iPhone Scan

  • You Can Now Generate Meta’s Photoreal Avatars With an iPhone Scan

You can now generate Meta’s new prototype photoreal avatars with an iPhone scan.

The ‘Codec Avatars’ were first showcased by the then Facebook in March 2019. The avatars are powered by multiple neural networks. They can be driven in real-time using a prototype VR headset equipped with five cameras which include two internal cameras for viewing each of the eyes as well as the three external cameras for viewing the lower face.

Since their unveiling in 2019, the Codec Avatars have undergone multiple evolutions. For instance, more realistic eyes have been added. There has also been a version that requires eye tracking and microphone input. Recently, a 2.0 version was released that brings the avatars to almost complete realism.

Before, generating an individual Codec avatar needed a specialized capture rig known as MUGSY which has 171 high-resolution cameras.

MUGSY Codec Avatars Capture Rig has 171 high-resolution cameras
MUGSY Codec Avatars Capture Rig has 171 high-resolution cameras

Meta’s newest research removes the requirement for the rig and allows users to general a photoreal Codec avatar with a smartphone scan using the phone’s front-facing depth sensor like in any iPhone that is equipped with FaceID. To accomplish this, a user must first pan the phone around their neutral face and again when copying a series of 65 facial expressions.

The scanning process will take you, on average, 3 and a half minutes according to the researchers.  However, generating the photoreal avatar in full detail takes about 6 hours on a machine that is equipped with four high-end GPUs. When used in a product, this process would likely occur in the cloud GPUs rather than in the user’s device.

Meta Codec Avatars with phone scanning
Meta Codec Avatars with phone scanning

How have the researchers trimmed down a process that typically requires over 100 cameras into one that simply needs an iPhone depth sensor camera? The key is in leveraging a Universal Prior Model (UPM) “hypernetwork” which is a neural network that generates the weights for another neural network, the person-specific Codec Avatar in this instance. The researchers working on the Codec Avatars trained the UPM network by scanning the faces of 255 different individuals using a specialized capture rig which is built like the MUGSY but uses 90 cameras.

The use of smartphone scanning to generate avatars has already been implemented by researchers elsewhere but Meta says its technique provides a superior quality of avatar generation.

Meta’s Codec Avatar generation still has to overcome some obstacles. For instance, it cannot handle long hair or glasses, and the avatar generation is also limited to the head and is not full body.

The Meta avatar research still has some ground to cover before it can realize this level of fidelity in avatar generation in its consumer products. Besides, current Meta avatars still exhibit a cartoony art style and their realism has actually been degrading with time. Their use-cases are still more practical in larger groups with complex environments in apps such as Meta Horizon Worlds on a Quest 2 mobile processor.

In fact, Codec Avatars may eventually be a separate option instead of being integrated as a direct update to the platform’s current cartoon avatars.

Mark Zuckerberg did an interview with Lex Fridman where he referred to a future where we might use “expressionist” avatars in casual games while deploying realistic avatars in serious work meetings.

In April this year, Codec Avatars team lead Yaser Sheikh stated that predicting when the Codec Avatars might be shipped is still impossible and that taking the project to the shipment phase is still “five miracles away” compared to the beginning of the project when its potential realization was “ten miracles away”.

Translate »
Sign Up