Performance is crucial for WebXR apps. If you do not know why your app does not meet target frame rate, the first step is to profile.
This article covers methods for profiling WebXR and WebGL apps (independent of the underlying
framework). Profiling is the first step to optimizing your application.
Optimizing without Profiling is wasted effort, as described in Bottlenecks.
Note that the post does not address how to fix any of the found performance
issues. There are many blog posts and talks targeting specific bottlenecks,
but once you know what the problem is, you will find resources for it.
and application load time. The goal is to provide the most comprehensive collection
of information on profiling for free. And no: “poly count” is not a metric we care about!
Without finding the bottleneck of your app, your efforts to optimize will not
Imagine you are rendering a large scene with many objects. Your app is barely
reaching 60 fps. You can optimize every one of the object’s vertex counts and
still remain at 60 fps.
The CPU is responsible for sending the GPU the command to render each object
(“Draw Call”). In this case, it is way more likely that the CPU is overloaded
with the work to send the draw calls, while the GPU is just twiddling its thumbs
waiting for the CPU to send more work!
But we don’t know… until we measure. That is what Profiling is for.
Any part of an application can cause a bottleneck. Here are some examples
of common bottlenecks in order of frequency:
CPU Draw Calls: Too many calls to the driver cause too much overhead.
Garbage: The application is performing fine, but produces garbage that
causes hitches at regular or irregular intervals.
CPU Logic: Your application is unable to produce work for the GPU fast
enough. Often caused by heavy physics simulations or ray casts.
GPU Fragment Shading: The per-fragment performance cost is too high.
GPU Vertex Processing: Per-vertex processing requirement is too
GPU Resolve: Caused by post-processing on mobile GPUs.
GPU Vertex Fetch: Memory for the vertices is not read fast enough.
Each could have sub-bottlenecks.
The metrics we will be profiling for are:
Work for any XR application is split between CPU and GPU. The CPU is responsible
for preparing the graphics work for the GPU to render and runs the app’s logic.
simulate physics, render audio and compute scene graph transformations.
The GPU is responsible for graphics heavy lifting. While the CPU will send a draw call
along the lines of “draw mesh X with shader Y with texture Z and material parameters W”,
the GPU will perform the actual rasterization and per-vertex transformations to produce
the pixels on the screen.
The GPU is also responsible for sending the final image to the screen, waiting for
“V-Sync” which synchronizes the screen’s refresh rate with
that of the application.
Frame Rate and V-Sync
We do not use frame rate (frames per second = fps) to profile our application.
This metric is only listed to explain the complexity of understanding how performance
interacts with V-Sync. It is too coarse to make useful judgments with.
Consider V-Sync as fixed deadlines: 60, 72, 90 or more deadlines per second (the frame rate).
We will refer to “making V-Sync” as having rendered and submitted the frame on time for
making the deadline.
When you miss the deadline, all your work is discarded and you need to try to catch up to
make the next deadline instead.
In flexible-frame rate environments, this can mean that your drivers will drop you to half
the frame rate.
If your application is just barely too slow, this means you would be seeing e.g. 30 fps
instead of 59 fps, telling a completely different story. Maybe your application is
performing decently with 3ms CPU and 3ms GPU time, but because the work starts late, you miss
V-Sync for your target frame rate–which cuts your effective frame rate in half.
No vertex count, draw call, or any classic optimization will help you here.
Latency, the time between input (e.g. head movement) and the finished frame, is especially
important for VR. WebXR implementations will usually take care of this for us and might
schedule the frame callbacks a bit later, if we don’t use our frame budget, to reduce
latency. We have limited control over this from WebXR and will therefore not cover it in
this blog post.
without thought. Hence it is common to do so.
Without the need for managing memory, you lose the control to specify when you want to have
your memory be managed. This process is called “Garbage Collection” and will occur at random
times in your application life cycle–e.g. when you would have made V-Sync and were just
about to submit your frame.
Garbage collection might take 0.1 - 10 ms. Considering your usual VR frame budget is 11 ms,
you absolutely don’t want this to happen at random moments in time.
So how do we avoid it? The only way is to avoid any garbage that would need cleanup.
The less you produce, the smaller and rarer the garbage collection hitches will be.
Application Loading Time
Just as with website loading, the user’s commitment to using your app decreases with every
second spent waiting for it. Since WebXR applications are fairly large, there might be a
bit more goodwill here compared to a Website, but there is no need for this.
By starting the application early and loading resources that aren’t immediately needed later,
we can reduce the perceived load time.
By optimizing assets and ensuring our server settings are optimal, we can reduce the loading
time in general.
And by using formats that need less parsing, we can reduce the amount of work the CPU needs
to do after downloading the resources.
time, Garbage Collection, and very roughly your GPU frame time.
You can find the Performance tab by navigating to any website and pressing Ctrl + Shift + C
(Command + Shift + C on MacOS). Find “Performance” there.
To record a profile session, hit the record button on the top left (Ctrl/Command + E).
3-5 seconds is usually fully sufficient as we are interested in single frames usually.
enable it in Chrome from the Chrome Profiler described above:
The following is a profile on Meta Quest 2 from Elysian,
which is based on Three.js:
You can see that the “JS Heap” fluctuates between 14.1 MB - 23.3 MB.
Wherever the memory drops suddenly, we find Garbage Collection occuring.
In this case, the GC is so large that it delays the next frame start, causing a dropped frame:
As a result, we have to hope that the dropped frame is reprojected from the last frame, otherwise,
significant stutter will occur.
Since animations cannot be reprojected, there will be some stutter either way.
Any browser’s networking tab is a great tool for profiling your Application Loading time.
You can find the Networking tab by navigating to any website and pressing Ctrl + Shift + C
(Command + Shift + C on MacOS). Find “Networking” there.
To record the networking activity, reload the page while having the tab open.
If a resource is not needed immediately at application start, it should be loaded later, since
browsers will only run limited parallel requests
at a time (e.g. 6 for Chrome).
Meta Quest Performance HUD
When profiling VR rendering on the Meta Quest, you can use the Performance HUD.
This is especially useful for debugging view-dependent performance problems, since you can
use the headset while getting continuous feedback on performance.
In the example below, Ayushman Johri shows the excellent performance of his Wonderland Engine-based
“Vintage Study Room” using the Meta Quest Performance HUD:
Here's a performance test with the
FPS Graph! I will mention that I observed more stale frames than I
would get without system recording~ Also for artistic choice, I decided
to use uncompressed textures for some extra detailed assets ✨ Not
perfect but pretty close!
You will often read that reading and writing memory are usually the most expensive operations
in a shader program. Whether this is true depends on how cache-friendly your memory operations
L1 (“Level 1”) cache memory is incredibly fast. To ensure you are using it, make sure you obey the
principle of locality: access memory that is
close to the memory you previously accessed.
is a browser extension for Chrome and Firefox that allows capturing WebGL
frame traces. The tool can show the full list of commands and summarize
stats like vertex counts and draw calls.
First enable the tool via the add-ons button at the top, then record a frame
by clicking the red record button.
You can see that Wonderland Engine is drawing many many objects in a total
of 11 draw calls here. For any other framework, this number will be 10-100x
Disjoint Timer Query
is an WebGL extension that allows measuring the GPU time of a set of
WebGL commands. It is only supported well on Chrome [August 2023] and if the
“WebGL Debug Extensions” Chrome flag is enabled.
Since all commands run asynchronously on the GPU, the time measurements
need to be scheduled asynchronously as well.
Wonderland Editor Profiler
Wonderland Editor comes with a built-in profiling tool that helps understanding where the
performance of your WebXR app might be suffering.
Every framework has its particular performance characteristics. Most are draw call bound before
anything else, and then fragment bound–no need to go low poly if not at 90 fps!
We designed Wonderland Engine from the ground up to avoid most of the above bottlenecks–
it’s free up to 120k USD revenue per year.
Reach out here for Enterprise Licenses and Support.