WebGL Performance
WebGL has a bad reputation: Developers assume WebGL is slow and unable to render complex 3D graphics. Many examples show this is not true, yet developer’s experience reinforces this idea over and over again.
The hardware running WebGL code is not throttled for browsers. So why do developers believe that it is impossible to render fast 3D graphics on the web?
This article will guide you through WebGL performance and how you, too, can achieve fast 3D rendering on the web.
CPU vs GPU
In a 3D application, some code runs on the CPU, and some code on accelerated hardware, the GPU.
WebGL is designed to allow using the GPU to accelerate graphics applications on the web. WebGL therefore can be understood as functions that send work to the graphics hardware and retrieve the result.
The code that creates the work for the GPU runs on the CPU and for WebGL we control it via JavaScript.
Browser Overhead
Because a goal of the browser is to keep the user safe from potential malicious websites, it will check the safety of every WebGL call the website makes through JavaScript. It also needs to isolate the process running the website’s code and any calls to and from this process needs to be converted into a sendable format, which is called “marshalling”.
Both marshalling and checking of the website’s WebGL calls incur work that adds a performance cost, compared to the same call in a native environment.
Avoid WebGL Calls
To achieve WebGL performance, we should therefore avoid using calls that have a high overhead. You will easily figure out which calls are especially costly by profiling your WebGL application.
The following are some of the calls that are counter-intuitively expensive:
An excellent way to avoid these calls is to do WebGL state tracking and reduce error checking in production builds of your WebGL application.
Avoid Draw Calls
The secret to excellent WebGL performance, though, is to have as few draw calls as possible. A draw call is any function that uses one of the following WebGL functions:
- drawArrays()
- drawElements()
- drawArraysInstanced()
- drawElementsInstanced()
- drawBuffers()
- drawRangeElements()
There are many ways to do this. By using one function call that draws many objects, rather than calling a function many times to draw the same objects, for example. You can use instancing to render a large amount of the same mesh, or the WEBGL_multi_draw to render many different objects with the same shader.
3D Engines often have a feature called “batching”, which will merge many calls together into a single WebGL call. Wonderland Engine takes this to an extreme level, where scenes with tens of thousands of dynamic objects are rendered in less than ten draw calls automatically.
WebGL performance on mobile is especially affected by draw calls.
WebGL on Safari
On Safari (both iOS and MacOS), there are additional WebGL calls that come with surprisingly large amounts of overhead, for example:
Pay special attention to your use of Uniform Buffers when optimizing for Safari.
JavaScript
The language used for interfacing with Browser APIs, such as WebGL, is JavaScript. The main performance culprit with JavaScript is Garbage Collection, which will automatically find and remove memory that is no longer needed.
The Garbage Collection process can make WebGL application performance unpredictable and unreliable, as you have no control over when it will occur. As a result, it will often occur at times when it may cause stutter in the rendering as a result, giving the appearance of bad performance, even though the code is otherwise well optimized.
To produce stable framerates, take a strict approach to writing garbage-free JavaScript.
iOS Safari and Memory
WebGL apps that run well on Safari are rare. WebGL performance on Safari adds some additional challenges. We wrote an entire blog post about how to optimize for Safari.
Memory Limits
On iOS you get another challenge: The browser tabs have very limited memory, especially on older iPhone hardware. If you exceed the memory limits, the tab may reload or freeze. Because iPhones use unified memory, the RAM for CPU and GPU is shared and both texture and buffer data, as well as JavaScript or WebAssembly memory will count towards the limit.