Gaussian Point Splatting

(momentsingraphics.de)

167 points | by ibobev 11 hours ago ago

61 comments

keyle 10 hours ago
It will be interesting to see the first AAA game that uses these methods instead of rendering a 3D world. Even if made from CGI worlds, it would be a very interesting approach and with somewhat predictable performances.
Reminds me of Ecstatica [1], a 1994 game that had intense visuals with a very odd/different rendering engine made of 3D ellipsoids; in a way really crude splats in gouraud shading.
[1] https://ecstatica.fandom.com/wiki/Ecstatica
[-]
- dagmx 17 minutes ago
  I know this comes up a lot on HN because its not primarily a graphics community but:
  1. Gaussian Splats are very expensive to render. They capture a lot of detail which makes them seem cheaper than an equivalent raster render of that quality, but they wouldn't meet real time AAA game performance requirements
  2. Gaussian Splats don't have a concrete surface. Want to cast shadows or do physics? It's doable but very tricky. Want to relight them? Also tricky. What is the exact surface point that you want to affect or sample for any particular operation? Deformations also become very difficult to do well.
  3. Gaussian Splats are not sharp. You can get sharper with different kernel types or higher density of points, but your costs go up as well.
  4. Gaussian splats are awful for any kind of path tracing. You can do it but you go back to the issues above. So mixing and matching traditional content with splats becomes a performance bottleneck.
  I don't think you'll see a AAA game use splats for more than something like cinematics in the near term.
- sqrt_1 8 hours ago
  There was this FPS demo recently https://playcanv.as/p/qxGSuzYq/
  People have also converted some small sections of Unreal 5 demos into splats https://superspl.at/scene/692c4f91
  Or perhaps use a real world scan - it was suggested this one would make an ideal setting for zombies https://superspl.at/scene/6359774f
  [-]
  - monkpit 3 hours ago
    Can someone explain the unreal demo linked here - is the reflection in the street also using splatting, or is it something else?
    [-]
    - speps 3 hours ago
      Yes splats do reflections
- cyber_kinetist 9 hours ago
  Note that the first published work of rendering Gaussian Volumes was in this 1991 paper (https://articles.tomasparks.name/publications/Westover1991.p...) - so 3DGS is really a rehash of an old method from the 90s!
  The contributions of 3DGS lie in how fast you can make them in modern GPU hardware (tiling + sorting with threads), and how to make the pipeline differentiable so that you can fit the Gaussian splats with photogrammetry data. Similar to the history of deep learning, it became technically feasible once the GPU hardware was powerful enough.
- grumbel 9 hours ago
  Many years ago there was a game called Casebook[1], a small little detective game where you investigated rooms for clues. But unlike similar FMV games where you jumped from point to point, it had photorealistic environments that could be smoothly walk around in, much like later lightfield or gaussian splatting experiments.
  [1] https://www.youtube.com/watch?v=o-VAaC5BgVE
  [-]
  - cubefox 7 hours ago
    Any idea on how they achieved this?
    [-]
    - Cieric 7 hours ago
      I can't say I know how they actually did it, but taking a look at the trailer I can point out that it looks like the spaces are confined and your character is on rails. I'm mainly going off of the instant direction changes that don't appear to be 45 degrees off from the camera direction. Once it's constrained down to a single line/path you could do some wild things like cube mapping a video, where the position in the video is tied to the characters position. I can't say I know how they would take that video though, my best guess there is the scenes are constructed in 3d software, just it was to expensive for real time rendering.
      [-]
      - cubefox 6 hours ago
        Cube mapping a video sounds plausible, this is commonly known as 360° video. Putting the camera on rails (though I don't really notice rails in this case) and tying the video playback speed to the speed of the rail movement has also been done in the past in some pre-rendered PlayStation games, though without cube mapping. But I think it's not pre-rendered in this case. It looks far too realistic for a game that is at least 17 years old. My best guess: they captured the 360 degree videos with a real camera (stabilized in some way) and edited the equipment out frame by frame.
    - grumbel 2 hours ago
      The image capture was done with a robotic camera rig from what I understand, they photographed 360° images of the room from all possible position. They restricted the camera movement to a plane, which is why the player height is fixed. I don't know what they did on the software side with all the image.
      [-]
      - cubefox 43 minutes ago
        Oh cool, so the camera wasn't just on 1D "rails", it was on a 2D plane. I never before heard of a game (pre-rendered or photographed) which did that. Impressive.
- accrual 3 hours ago
  It's not gaussian splatting, but Outcast (1999) has an interesting voxel-like rendering for the world surface. It has a pretty distinct feeling when walking around in the early areas, and a somewhat clunky but usable UI.
  > The game does not actually model three-dimensional volumes of voxels. Instead, it models the ground as a surface, which may be seen as being made up of voxels. The ground is decorated with objects that are modeled using texture-mapped polygons. When Outcast was developed, the term "voxel engine", when applied to video games, commonly referred to a ray casting engine (for example the Voxel Space engine). On the engine technology page of the game's website, the landscape engine is also referred to as the "Voxels engine". The engine is purely software based; it does not rely on hardware-acceleration via a 3D graphics card.
  https://en.wikipedia.org/wiki/Outcast_(video_game)
- avaer 10 hours ago
  This is "rendering a 3D world". It's basically the exact same techniques that traditional rendering uses, just with a different primitive that's not triangles. Everything else pretty much carries over.
  If you mean the technique of splatting specifically, Dreams for PS4 [1] is prior art.
  If you mean pre-rendering, there's Myst and games like the original FF7 for PS1.
  [1] https://en.wikipedia.org/wiki/Dreams_(video_game)
- modeless 7 hours ago
  Dreams for PS4 used point splatting and has a very unique look as a result. The splats were created from distance fields instead of being scanned, so they don't look like modern gaussian splats. They have a painterly look instead. https://youtu.be/2ltgkcoQzow
- jayd16 5 hours ago
  Bladerunner: Revelations used a similar technique to bake down large CGI worlds with expensive lighting into something that ran on a Pixel 1 at VR specs.
  Its honestly really very hard to work with this stuff because you ultimately need to be able to meshes inside these scenes triangle seas and you need to do it in a way that plausibly fits in the world. You can't have unlit characters walking around a baked lit scene and have them fit in. That's just from a visual design perspective.
  You also always want to have bounce light from your dynamic things onto the baked scene and depending on the tech, you might not even be able to spatially place a dynamic thing and have it properly occlude what splats it needs to occlude.
  As is, its a niche technology for games. That might change one day.
  https://github.com/googlevr/seurat https://www.youtube.com/watch?v=Pf5Q3bvXj8E
- jamwise 3 hours ago
  I think it's inevitable it goes there. Right now the level of detail and quality of games is limited by the console/PC hardware you're playing on. But with the splats they can render the whole game's world in a massive server farm at Hollywood Movie quality. I imagine there might be some balance of splat and traditional rendering technology since not all objects will lend themselves well, but this might be truly transformative.
  [-]
  - dagmx 21 minutes ago
    Why would you limit one to your local hardware and one to a cloud infrastructure?
    Both can be done locally or on cloud? the comparison point becomes moot if you change the parameters that drastically
boppo1 6 hours ago
I really wantt to get into splatting and I have the tools: good camera, v comfy in blender, comfy with graphics programming ideas, 4080. But I haven't found a good 'all in one intro' to it yet. Possibly because I'm foss-biased and have dismissed proprietary options. But does anyone know of a good 'vertical tutorial' on this stuff?
[-]
- Yen 4 hours ago
  I recently got into splatting. I looked for some good all-in-one tutorials, but didn't find any, and mostly muddled through through trial and error and LLM assistance. I present this workflow as a straight-line pipeline, though in practice it took a lot of iteration and backtracking and rework to get the final result. Here's what worked for me:
  I captured a video on a smartphone camera, using the OpenCamera app. Specifically, this video was captured with exposure locked, framerate locked, focus locked, fairly high framerate and resolution. I walked slowly and carefully around an outdoor scene, trying to get fairly good coverage from multiple angles. I took roughly 20 minutes of video, weighing 19GB.
  This video was sampled into individual image frames at about 5fps using ffmpeg. There's room for experimentation and improvement here, an adaptive, coverage-aware sampling strategy would be better. But fixed 5fps was Good Enough (tm). This resulted in roughly 8,000 images at 4k. This was a pretty hefty dataset for my limited 1080, but I made it work.
  I then generated masks for these images, to ignore transient objects during the splat training. (i.e. to cut out people who transiently walked through the scene). For this I used Cutie (https://github.com/hkchengrex/Cutie). For outdoor scenes, it can also make sense to mask out low-parallax areas like faraway mountains or especially the sky, as these are difficult to train correctly. If masks are generated for some images, you'll need at least placeholder masks for the all of them. In the end I've got about 8,000 PNGs that are monochrome black/white masks.
  Then the images are handed to COLMAP (https://github.com/colmap/colmap), using the 'global mapper' option. This registers the camera positions in 3D space, and generates a crude point cloud that's good for sanity-checking. This step required a fair bit of iteration to get right. The full reconstructed output from COLMAP is not necessary, only the pose-estimate .bin files. The output directory here was about 500MB for this step for me.
  With COLMAP registration done, the next step is the actual training. I found two useful pieces of software for this, with different tradeoffs.
  Brush (https://github.com/ArthurBrussee/brush). Was very straightforward to install and use, requiring very little in external dependencies and setup. It was also pretty speedy on training, and gave good results. Minor modifications to the training process were possible by editing source, though I didn't get too wild here. Brush takes the *.bin files from COLMAP, plus the original images directory, and the masks directory if it exists. Run on its own, this could produce gaussian splat .ply files, 500-800MB in size, containing 1-10M splats. More than that and my poor little 8GB of VRAM OOM'd.
  nerfstudio (https://github.com/nerfstudio-project/nerfstudio) Was also useful, as many research papers get implemented in its framework. In particular, for this outdoor scene, I used wild-gaussians (https://github.com/jkulhanek/wild-gaussians/) to generate just a sky sphere (to help seed low-parallax areas in my particular dataset), stopped training, and used this as an init.ply to pass to brush.
  I then set up a very simple viewer website, using SuperSplat (https://github.com/playcanvas/supersplat). I used supersplat's editor to align the splat's coordinate system with the rotation and scaling that I wanted, and then exported an optimized .sog file, roughly 1/10th the size. .sog is nominally open-standards, though I'm not aware of any other projects using the format. This gave fairly good framerates and adequate controls across a variety of platforms.
  As a little bit extra, supersplat's splat-transform CLI tool was used to generate a crude collision mesh for the scene, enabling a walking mode that respected object boundaries.
  If there's interest I can post my results, I got a bit sidetracked with other projects and other splats, and this particular one I got fiddling with some more cleanup. I can get it up with a few more hours work. But hopefully that's a good start, all of these are fully FOSS, and resulted in a good-looking splat.
  [-]
  - boppo1 an hour ago
    Awesome, thank you! this is a good starting point!
  - ireadmevs 4 hours ago
    Thank you for sharing!
- dimitri-vs 5 hours ago
  Maybe not exactly the kind of tutorial you're looking for but very enjoyable none the less: https://youtu.be/eekCQQYwlgA
HexDecOctBin 10 hours ago
Can someone point to a resource/tutorial for learning point splatting (the 90s rendering technique)? Gaussian Splatting has completely over taken the search results, and the original technique is now near impossible to find.
[-]
- jasonjmcghee 9 hours ago
  Westover’s thesis https://www.cs.unc.edu/techreports/91-029.pdf
- cubefox 10 hours ago
  It's going to be even more impossible to find now because the present paper introduces "Gaussian point splatting".
andybak 4 hours ago
People are rendering huge splat scenes on mobile devices using LOD. This (currently) requires CUDA and an NVidia GPU to work. I would have been much more impressed to see a demo where it was running on low end mobile hardware faster than current splat renderers can.
I'm probably being a bit of a grinch about it but the abstract doesn't address performance or hardware constraints either so I guess I'm going to have to read the damn paper.
Epitaque 7 hours ago
Did not read the paper (sorry) but I wonder how this compares to mesh splatting (https://meshsplatting.github.io/). I feel like mesh splatting can produce higher quality results because triangles are very good at representing sharp features, and gaussians aren't.
[-]
- dpark 7 hours ago
  But only in the same sense that triangles are bad at representing curves, right? It seems that’s a wash.
phrotoma 10 hours ago
I love this site design. It uses the entire width of the monitor rather than a slender column of pixels down the middle with large blocks of unused space on either side, with a font for my old man eyes.
<3
[-]
- zokier 10 hours ago
  > It uses the entire width of the monitor rather than a slender column of pixels down the middle with large blocks of unused space on either side
  Umm on my machine it has 560px margin on both sides with the content being only 474px sliver in the middle?
- simonklitj 9 hours ago
  Imo they need to pad it just a bit. My scrollbar overlaps.
- docheinestages 9 hours ago
  Maybe use Tampermonkey?
sorenjan 6 hours ago
When looking at their linked interactive viewer it looks like they need 128 spp for the image quality to equal 3dgs. Maybe you can reduce that with some temporal tricks and noise reduction filtering, but that's still a lot of samples.
djmips 9 hours ago
Could this be a new direction for Google Streetview perhaps?
samch 5 hours ago
It seems like there are fairly regular posts on HN about splatting, and most appear to be fairly technical or proof-of-concept level. While the outputs look nice, I’m not sure that I could distinguish them from a nice ray-traced scene. What I think I’m missing is the “why?” of splatting. What are the material benefits of this area of research?
[-]
- jerf 5 hours ago
  At the moment, combining your statement "I’m not sure that I could distinguish them from a nice ray-traced scene" and adding "your graphics card can move through them in real time so cheaply that it can easily be used as a component in other tech even at high frame rates" covers it pretty nicely. There's some research into how to make them move or do other things they don't do very well, but the fact that you can swoop through them in real time on cell-phone level of power means they fit a lot of niches. Plus the fact you can "record" them from a real-world physical environment without ever having to "model" it opens up a lot of utility too.
  Personally I suspect they are getting a bit more attention then they "deserve"; people aren't talking about their weaknesses very much and I think that's resulting in some overexcitement. Some of the "we can replace everything with splats!" reminds me of the people who still don't understand why "if GPUs are thousands of times faster than CPUs why don't we run everything on GPUs?" is basically not even a sensible question. I don't see them as ever being the foundation of a graphics stack, but they definitely have a place as part of a well-rounded menu of techniques that can be brought to bear on a wide range of problems.
  [-]
  - zokier 4 hours ago
    > Plus the fact you can "record" them from a real-world physical environment without ever having to "model" it opens up a lot of utility too.
    This is the big thing imho. Sure, you can do traditional photogrammetry to capture meshes and textures but getting the shaders exactly right is afaik non-trivial etc, and if you want real-time rendering then you likely need some further post-processing of the assets. With 3dgs you can pretty much bypass all that complexity and the whole pipeline from photos to rendered frame is much more straightforward.
cyber_kinetist 10 hours ago
Really nice idea for 3DGS rendering - though the main problem is the noise (an unfortunate issue for all Monte-Carlo based methods).
I think future papers would probably continue improving on this method and focus on how to sample the points more efficiently while being unbiased (similar to how ray-tracing solved their performance issues). Or maybe... we can just add a deep-learning based denoiser and call it a day!
lucamark 10 hours ago
This feels like Monte Carlo rendering applied to rasterization. I'm wondering if it's a brand-new or a well established methodology
[-]
- pixelesque 10 hours ago
  It's not new - that was sort of my point with my other comment.
  At least if it's progressive (so refines and resolves over time), this has been done with pointclouds in the VFX industry in GPU shaders for years in terms of stochastically drawing different points so eventually the whole point set gets rasterised to a fidelity threshold.
  [-]
  - lucamark 10 hours ago
    ookay, thanks for the clarification! So, the interesting part here seems to be the 3DGS-specific opacity correction and GPU workload mapping. Am I wrong?
    [-]
    - pixelesque 10 hours ago
      Possibly yeah.
      Or the per-pixel coord atomic I guess?
      [-]
      - lucamark 10 hours ago
        Right, that part seems to be based on Schütz et al. 2021 https://arxiv.org/abs/2104.07526
- avaer 10 hours ago
  Monte Carlo in 3dgs is established enough that Spark [1] has been doing it for a while in the browser.
  https://github.com/sparkjsdev/spark
  [-]
  - cyber_kinetist 9 hours ago
    Cannot find anything related to Monte Carlo methods in the source code. I thought Spark implemented a conventional 3DGS pipeline with LoD optimizations (And it seems they do the sorting on the CPU using Rust/WebAssembly because of WebGL limitations)
- convolvatron 5 hours ago
  that goes all the way back to the Kajiya rendering equation https://en.wikipedia.org/wiki/Rendering_equation
MattCruikshank 7 hours ago
My dumb idea... do outdoor scans, and then convert the contents into 1m^2 blocks... And then, just dumbly stitch them together.
Kind of like Minecraft... but with user-generated gaussian-splat blocks.
[-]
- jamilton 3 hours ago
  1m^3, right? I can picture what you mean, but I'm not sure it works technically, since I think the splats for a given region are not actually bound to the region they represent. Like, for example, reflections work by having the reflection being physically behind the reflective surface. And they're all transparent, so it'd blend together.
  [-]
  - MattCruikshank 2 hours ago
    Sure, you could think in terms of 1m^3.
    Yes, you're right that composing the best picture for an eye point could (and does) use splats from all over the scene.
    But I think if you limit to splats that are (entirely, mostly, partially?) inside the 1m^3 block, you'll do pretty well. And you're absolutely right that reflective surfaces would probably be the first to suffer.
    Well, it's worse than that. Because if you make a 1m^3 pond cube, and then I go putting trees around it, a naive rendering would still show YOUR reflections in the pond, rather than rendering from that pond's point of view, etc, like traditional rendering.
    One of Gaussian Splats strengths, that it doesn't care... becomes a problem for me.
praveen9920 10 hours ago
Sorting the gaussians is the compute heavy part in gaussian splatting. So, Im guessing this will give only marginal improvement in terms rendering speed.
[-]
- xyzsparetimexyz 10 hours ago
  I'm not sure it does a sort. Each group of threads only handles a select number of gaussians
  [-]
  - zokier 10 hours ago
    Yea, I think avoiding sorting is kinda the whole point here
cubefox 10 hours ago
Their point splatting method is orthogonal to level-of-detail rendering (they reference a few papers which try to do this), so both point splatting and LoD could be combined in the future for an even greater performance gain during rendering. They already implement occlusion and frustum culling.
Point splatting does introduce a lot of noise though, and their denoiser introduces ghosting, but they say a more sophisticated denoiser would give considerably better quality.
pixelesque 10 hours ago
> millions of threads
Really?! What OSs can handle that many native threads?
Also, this seems quite similar to stochastic progressive drawing of pointclouds for realtime that has been done for > 15 years in the VFX industry with GPU shaders in a tiled/bucketed fashion, unless this isn't progressive maybe? (The fact it's been accepted for Siggraph likely indicates it's slightly different).
[-]
- Calavar 10 hours ago
  I believe they mean GPU threads. Plenty of cuda files in their repository.
  [-]
  - pixelesque 10 hours ago
    Fair enough, but that's then only absolutely max 1024 threads per SM, which wouldn't get anywhere near 1 million, given 5090 only has 192 SMs...
    Future proofing I guess...
    [-]
    - cyber_kinetist 10 hours ago
      You can launch much more logical threads than the available physical threads. The GPU scheduler will automatically dispatch the work to the SMs.
    - ks6g10 6 hours ago
      Just like 2 threads can execute on the same core at the "same" time, i.e. no synchronization, the same is true for GPU threads/ thread groups.
    - zipy124 9 hours ago
      I guess they never say that they execute at the same time technically haha
DamnInteresting 7 hours ago
Video overview of the technology: https://www.youtube.com/watch?v=X8yRlA7jqEQ
Ordinarily I don't prefer video, but the visuals are helpful here.
Also, an online interactive, but it seems to only work in Chrome: https://superspl.at/scene/ff1d0393