Virtual Reality

Holodeck watch: Real-time volumetric video takes a giant leap

Holodeck watch: Real-time volumetric video takes a giant leap
Complex scenes, like the motion of this dancer's hair, can now be rendered in incredible detail in real-time using standard hardware
Complex scenes, like the motion of this dancer's hair, can now be rendered in incredible detail in real-time using standard hardware
View 2 Images
Complex scenes, like the motion of this dancer's hair, can now be rendered in incredible detail in real-time using standard hardware
1/2
Complex scenes, like the motion of this dancer's hair, can now be rendered in incredible detail in real-time using standard hardware
A new real-time rendering technology vastly speeds up the way complex volumetric scenes can be presented using standard hardware
2/2
A new real-time rendering technology vastly speeds up the way complex volumetric scenes can be presented using standard hardware

Imagine a next-gen VR experience that lets you speak realistic scenes, intelligent characters and complex situations into being, then interact with them in real time. It's coming, due to a convergence of tech like this advance in real-time 3D video.

I've been wonking on about this for months now; between AI video generators like Sora, AI character and narrative building tools, AI music and sound effects creation tools, and projects like Google Genie, dedicated to live-creating entire interactive games and experiences in real time, most of the major ingredients are already here – in embryonic form.

Sure, there's no proper hologram generator as yet, but if you're willing to accept a VR headset on your noggin, speed, latency and convergence strike me as the only barriers between where we are now and a fully functioning Holodeck experience, in which you simply say where you want to be, who else is there and what should be happening, and then have a version of that appear before your eyes as a fully interactive experience.

Every now and then, in the flurry of progress across the AI field, something catches my eye that seems to bring this kind of experience another step closer. And today's is a research paper titled Representing Long Volumetric Video with Temporal Gaussian Hierarchy.

Volumetric video is by its nature more complex than regular video. Instead of a 2D array of square pixels changing over time, volumetric video generates cubic 'voxels' in a 3D space, a much more useful representation of a scene if you want to be able to walk around it and change your perspective. When you're playing a video game that represents the world in 3D, you're looking at volumetric video.

This paper details an advance in volumetric video presentation that radically reduces the video RAM and data storage needed to render photo-realistic video from 3D video assets. It can render highly detailed scenes in 1080p resolution, at 450 frames per second, for a full 10 minutes or more, using a standard nVidia RTX 4090 GPU – and it can do it in real time, allowing interactive camera movement and whatnot.

The technique involved – Temporal Gaussian Hierarchy – essentially looks at the scene and works out which areas in the scene are changing quickly, and which are moving more slowly or not at all, and creates a hierarchy of representation so it can dedicate more time to rendering the complex, fast-moving bits and save time by dedicating less processing to the slow or static bits.

Boy does it do a good job, too. The researchers, a multinational team between Zhejiang University, Stanford University, and the Hong Kong University of Science and Technology, say the technique generated 18,000 frames of video using just 17.2 GB of VRAM and 2.2 GB of storage – a 30X and 26X reduction, respectively, compared to the previous state of the art 4K4D method.

Check out a more detailed explanation in the video below, if you've got the noggin for this kind of thing!

[SIGGRAPH Asia 2024 (TOG)] Representing Long Volumetric Video with Temporal Gaussian Hierarchy

Whatever the witchcraft behind it, the results are extraordinary, as you'll have seen in the videos embedded throughout this piece. Just the way the hair is rendered blows my tiny mind. Again, that's in real time on a standard, if high-end, consumer-grade video card.

This kind of efficient, instantaneous rendering of complex 3D worlds could well become a crucial part of that Holodeck VR experience; if you can generate 450 frames of volumetric video per second, well, you can generate 225 frames of stereo 1080p vision per second for a VR headset, as shown below with an Apple Vision Pro.

It's pretty crazy stuff, and yet another reminder of the wild acceleration we're seeing across multiple fields in 2024. Very neat!

Source: GitHub via Min Choi

5 comments
5 comments
Towerman
One step closer to the Holodeck !
veryken
We definitely want to see more Sora Aoi.
guzmanchinky
I bought a used Apple Vision Pro, and I can tell you the screen quality is insane compared to my Quest 3. I use it mainly to watch travel videos in 180 3D HDR VR.
Smokey_Bear
This is awesome. I remember back in 2016-ish, I had a HTC Vive, and watched a few volumetric video's, it was awesome. It makes your standard 3D VR video look like shit. When you move, and now see things at a different angle/perspective...its a game changer. It kinda fizzled out, cause it was too complex & expensive. I'm happy to see it coming back with a vengeance. guzmanchinky - You got more money then me, even used, you likely shelled out 2500. My quest 3 is good enough for now. That said...I'm very interested in the Samsung/Google partnership, and look forward to see what Samsung launches next year, it should be awesome, but hopefully doesn't break the bank like the Apple Vision Pro does.
Christian
very soon the mantra will be, "don't trust what you see on a screen" and if magicians have taught us anything, it's that screens can be hidden all sorts of places.
"I'll believe it when I see it!" will become, "I don't believe BECAUSE I see it"