Blurs are the basic building block for many video game post-processing effects and essential for sleek and modern GUIs. Video game Depth of Field and Bloom or frosted panels in modern user interfaces - used subtly or obviously - they’re everywhere. Even your browser can do it, just tap this sentence!
Effect of "Bloom", one of many use-cases for blur algorithms
Conceptually, “Make thing go blurry” is easy, boiling down to some form of “average colors in radius”. Doing so in realtime however, took many a graphics programmer through decades upon decades of research and experimentation, across computer science and maths. In this article, we’ll follow their footsteps.
A graphics programming time travel, if you will.
Using the GPU in the device you are reading this article on, and the WebGL capability of your browser, we’ll implement realtime blurring techniques and retrace the trade-offs graphics programmers had to make in order to marry two, sometimes opposing, worlds: Mathematical theory and Technological reality.
This is my submission to this year's Summer of Math Exposition
With many interactive visualizations to guide us, we’ll journey through a bunch of blurs, make a detour through frequency space manipulations, torture your graphics processor to measure performance, before finally arriving at an algorithm with years worth of cumulative graphics programmer sweat - The ✨ Dual Kawase Blur 🌟
Setup - No blur yet #
In the context of video game post-processing, a 3D scene is drawn, also called rendering, and saved to an intermediary image - a framebuffer. In turn, this framebuffer is processed to achieve various effects. Since this processing happens after a 3D scene is rendered, it’s called post-processing. All that, many times a second.
This is where we jump in: with a framebuffer in hand, after the 3D scene was drawn. We’ll use a scene from a mod called NEOTOKYO°. Each time we’ll implement a blur, there will be a box, a canvas instructed with WebGL 1.0, rendering at native resolution of your device. Each box has controls and relevant parts of its code below.
No coding or graphics programming knowledge required to follow along. But also no curtains! You can always see how we talk with your GPU. Terms and meanings will be explained, once it's relevant.
Scene Lights Bloom Animate ❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again. FPS: ? / ? ms Resolution: ? x ? lightBrightness lightBrightness 100 %
Blur Fragment Shader noBlurYet.fs precision highp float ; varying vec2 uv ; uniform float lightBrightness ; uniform sampler2D texture ; void main ( ) { gl_FragColor = texture2D ( texture , uv ) * lightBrightness ; } WebGL Javascript simple.js import * as util from '../utility.js' export async function setupSimple ( ) { const WebGLBox = document . getElementById ( 'WebGLBox-Simple' ) ; const canvas = WebGLBox . querySelector ( 'canvas' ) ; const radius = 0.12 ; const gl = canvas . getContext ( 'webgl' , { preserveDrawingBuffer : false , antialias : false , alpha : false , } ) ; const ctx = { mode : "scene" , flags : { isRendering : false , buffersInitialized : false , initComplete : false , benchMode : false } , tex : { sdr : null , selfIllum : null , frame : null , frameFinal : null } , fb : { scene : null , final : null } , shd : { scene : { handle : null , uniforms : { offset : null , radius : null } } , blur : { handle : null , uniforms : { frameSizeRCP : null , samplePosMult : null , lightBrightness : null } } , bloom : { handle : null , uniforms : { offset : null , radius : null , texture : null , textureAdd : null } } } } ; const ui = { display : { spinner : canvas . parentElement . querySelector ( 'svg' , canvas . parentElement ) , contextLoss : canvas . parentElement . querySelector ( 'div' , canvas . parentElement ) , fps : WebGLBox . querySelector ( '#fps' ) , ms : WebGLBox . querySelector ( '#ms' ) , width : WebGLBox . querySelector ( '#width' ) , height : WebGLBox . querySelector ( '#height' ) , } , rendering : { animate : WebGLBox . querySelector ( '#animateCheck' ) , modes : WebGLBox . querySelectorAll ( 'input[type="radio"]' ) , lightBrightness : WebGLBox . querySelector ( '#lightBrightness' ) , lightBrightnessReset : WebGLBox . querySelector ( '#lightBrightnessReset' ) , } } ; const circleAnimation = await util . fetchShader ( "shader/circleAnimation.vs" ) ; const simpleTexture = await util . fetchShader ( "shader/simpleTexture.fs" ) ; const bloomVert = await util . fetchShader ( "shader/bloom.vs" ) ; const bloomFrag = await util . fetchShader ( "shader/bloom.fs" ) ; const simpleQuad = await util . fetchShader ( "shader/simpleQuad.vs" ) ; const noBlurYetFrag = await util . fetchShader ( "shader/noBlurYet.fs" ) ; ui . rendering . lightBrightness . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . rendering . animate . addEventListener ( "change" , ( ) => { if ( ui . rendering . animate . checked ) startRendering ( ) ; else { ui . display . fps . value = "-" ; ui . display . ms . value = "-" ; ctx . flags . isRendering = false ; redraw ( ) } } ) ; canvas . addEventListener ( "webglcontextlost" , ( ) => { ui . display . contextLoss . style . display = "block" ; } ) ; ui . rendering . modes . forEach ( radio => { if ( radio . value === "scene" ) radio . checked = true ; radio . addEventListener ( 'change' , ( event ) => { ctx . mode = event . target . value ; ui . rendering . lightBrightness . disabled = ctx . mode === "scene" ; ui . rendering . lightBrightnessReset . disabled = ctx . mode === "scene" ; if ( ! ui . rendering . animate . checked ) redraw ( ) ; } ) ; } ) ; ctx . shd . scene = util . compileAndLinkShader ( gl , circleAnimation , simpleTexture , [ "offset" , "radius" ] ) ; ctx . shd . bloom = util . compileAndLinkShader ( gl , bloomVert , bloomFrag , [ "texture" , "textureAdd" , "offset" , "radius" ] ) ; function reCompileBlurShader ( ) { ctx . shd . blur = util . compileAndLinkShader ( gl , simpleQuad , noBlurYetFrag , [ "lightBrightness" ] ) ; } reCompileBlurShader ( ) util . bindUnitQuad ( gl ) ; async function setupTextureBuffers ( ) { ui . display . spinner . style . display = "block" ; ctx . flags . buffersInitialized = true ; ctx . flags . initComplete = false ; gl . deleteFramebuffer ( ctx . fb . scene ) ; gl . deleteFramebuffer ( ctx . fb . final ) ; [ ctx . fb . scene , ctx . tex . frame ] = util . setupFramebuffer ( gl , canvas . width , canvas . height ) ; [ ctx . fb . final , ctx . tex . frameFinal ] = util . setupFramebuffer ( gl , canvas . width , canvas . height ) ; let [ base , selfIllum ] = await Promise . all ( [ fetch ( "/dual-kawase/img/SDR_No_Sprite.png" ) , fetch ( "/dual-kawase/img/Selfillumination.png" ) ] ) ; let [ baseBlob , selfIllumBlob ] = await Promise . all ( [ base . blob ( ) , selfIllum . blob ( ) ] ) ; let [ baseBitmap , selfIllumBitmap ] = await Promise . all ( [ createImageBitmap ( baseBlob , { colorSpaceConversion : 'none' , resizeWidth : canvas . width * 1.12 , resizeHeight : canvas . height * 1.12 , resizeQuality : "high" } ) , createImageBitmap ( selfIllumBlob , { colorSpaceConversion : 'none' , resizeWidth : canvas . width * 1.12 , resizeHeight : canvas . height * 1.12 , resizeQuality : "high" } ) ] ) ; ctx . tex . sdr = util . setupTexture ( gl , null , null , ctx . tex . sdr , gl . LINEAR , baseBitmap ) ; ctx . tex . selfIllum = util . setupTexture ( gl , null , null , ctx . tex . selfIllum , gl . LINEAR , selfIllumBitmap ) ; baseBitmap . close ( ) ; selfIllumBitmap . close ( ) ; ctx . flags . initComplete = true ; ui . display . spinner . style . display = "none" ; } let prevNow = performance . now ( ) ; let lastStatsUpdate = prevNow ; let fpsEMA = 60 ; let msEMA = 16 ; async function redraw ( ) { if ( ! ctx . flags . buffersInitialized ) await setupTextureBuffers ( ) ; if ( ! ctx . flags . initComplete ) return ; ui . display . width . value = canvas . width ; ui . display . height . value = canvas . height ; let radiusSwitch = ui . rendering . animate . checked ? radius : 0.0 ; let speed = ( performance . now ( ) / 10000 ) % Math . PI * 2 ; const offset = [ radiusSwitch * Math . cos ( speed ) , radiusSwitch * Math . sin ( speed ) ] ; gl . useProgram ( ctx . shd . scene . handle ) ; const texture = ctx . mode == "scene" ? ctx . tex . sdr : ctx . tex . selfIllum ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , texture ) ; gl . uniform2fv ( ctx . shd . scene . uniforms . offset , offset ) ; gl . uniform1f ( ctx . shd . scene . uniforms . radius , radiusSwitch ) ; gl . bindFramebuffer ( gl . FRAMEBUFFER , ctx . fb . scene ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; gl . useProgram ( ctx . shd . blur . handle ) ; const finalFB = ctx . mode == "bloom" ? ctx . fb . final : null ; gl . bindFramebuffer ( gl . FRAMEBUFFER , finalFB ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . lightBrightness , ctx . mode == "scene" ? 1.0 : ui . rendering . lightBrightness . value ) ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . frame ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; if ( ctx . mode == "bloom" ) { gl . bindFramebuffer ( gl . FRAMEBUFFER , null ) ; gl . useProgram ( ctx . shd . bloom . handle ) ; gl . uniform2fv ( ctx . shd . bloom . uniforms . offset , offset ) ; gl . uniform1f ( ctx . shd . bloom . uniforms . radius , radiusSwitch ) ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . sdr ) ; gl . uniform1i ( ctx . shd . bloom . uniforms . texture , 0 ) ; gl . activeTexture ( gl . TEXTURE1 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . frameFinal ) ; gl . uniform1i ( ctx . shd . bloom . uniforms . textureAdd , 1 ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; } gl . finish ( ) ; const now = performance . now ( ) ; let dt = now - prevNow ; if ( dt > 0 ) { const instFPS = 1000 / dt ; const ALPHA = 0.05 ; fpsEMA = ALPHA * instFPS + ( 1 - ALPHA ) * fpsEMA ; msEMA = ALPHA * dt + ( 1 - ALPHA ) * msEMA ; } prevNow = now ; if ( ui . rendering . animate . checked && now - lastStatsUpdate >= 1000 ) { ui . display . fps . value = fpsEMA . toFixed ( 0 ) ; ui . display . ms . value = msEMA . toFixed ( 2 ) ; lastStatsUpdate = now ; } } let animationFrameId ; function nativeResize ( ) { const [ width , height ] = util . getNativeSize ( canvas ) ; if ( width && canvas . width !== width || height && canvas . height !== height ) { canvas . width = width ; canvas . height = height ; if ( ! ctx . flags . benchMode ) { stopRendering ( ) ; startRendering ( ) ; } if ( ! ui . rendering . animate . checked ) redraw ( ) ; } } nativeResize ( ) ; let resizePending = false ; window . addEventListener ( 'resize' , ( ) => { if ( ! resizePending ) { resizePending = true ; requestAnimationFrame ( ( ) => { resizePending = false ; nativeResize ( ) ; } ) ; } } ) ; function renderLoop ( ) { if ( ctx . flags . isRendering && ui . rendering . animate . checked ) { redraw ( ) ; animationFrameId = requestAnimationFrame ( renderLoop ) ; } } function startRendering ( ) { ctx . flags . isRendering = true ; renderLoop ( ) ; } function stopRendering ( ) { ctx . flags . isRendering = false ; cancelAnimationFrame ( animationFrameId ) ; gl . finish ( ) ; gl . deleteTexture ( ctx . tex . sdr ) ; ctx . tex . sdr = null ; gl . deleteTexture ( ctx . tex . selfIllum ) ; ctx . tex . selfIllum = null ; gl . deleteTexture ( ctx . tex . frame ) ; ctx . tex . frame = null ; gl . deleteTexture ( ctx . tex . frameFinal ) ; ctx . tex . frameFinal = null ; gl . deleteFramebuffer ( ctx . fb . scene ) ; ctx . fb . scene = null ; gl . deleteFramebuffer ( ctx . fb . final ) ; ctx . fb . final = null ; ctx . flags . buffersInitialized = false ; ctx . flags . initComplete = false ; ui . display . fps . value = "-" ; ui . display . ms . value = "-" ; } function handleIntersection ( entries ) { entries . forEach ( entry => { if ( entry . isIntersecting ) { if ( ! ctx . flags . isRendering && ! ctx . flags . benchMode ) startRendering ( ) ; } else { stopRendering ( ) ; } } ) ; } let observer = new IntersectionObserver ( handleIntersection ) ; observer . observe ( canvas ) ; }
We don’t have a blur implemented yet, not much happening. Above the box you have an Animate button, which will move the scene around to tease out problems of upcoming algorithms. Movement happens before our blur will be applied, akin to the player character moving. To see our blur in different use-cases, there are 3 modes:
Different blur algorithms behave differently based on use-case. Some are very performance efficient, but break under movement. Some reveal their flaws with small, high contrast regions like far-away lights
In Scene mode the blur will be applied across the whole image
mode the blur will be applied across the whole image In Lights mode we see and blur just the Emission parts of the scene, sometimes called “Self-Illumination” This also unlocks the lightBrightness slider, where you can boost the energy output of the lights
mode we see and blur just the Emission parts of the scene, sometimes called “Self-Illumination” In Bloom mode, we use the original scene and add the blurred lights from the previous mode on top to create a moody scene. This implements the effect of Bloom, an important use-case for blurs in real-time 3D graphics
not actually how modern video games do bloom. We'll get into that a bit later. Adding the blurred emission pass as we do in this article, or thresholding the scene and blurring that, isactually how modern video games do bloom. We'll get into that a bit later.
Finally, you see Resolution of the canvas and Frames per Second / time taken per frame, aka “frametime”. A very important piece of the puzzle is performance, which will become more and more important as the article continues and the mother of invention behind our story.
Frame-rate will be capped at your screen's refresh rate , most likely 60 fps / 16.6 ms. We'll get into proper benchmarking as our hero descents this article into blurry madness
Technical breakdown #
Understanding the GPU code is not necessary to follow this article, but if you do choose to peek behind the curtain , here is what you need to know
We’ll implement our blurs as a fragment shader written in GLSL. In a nut-shell, a fragment shader is code that runs on the GPU for every output-pixel, in-parallel. Image inputs in shaders are called Textures. These textures have coordinates, often called UV coordinates - these are the numbers we care about.
Technically, fragment shaders run per fragment , which aren't necessarily pixel sized and there are other ways to read framebuffers, but none of that matters in the context of this article.
Texture coordinates, also called "UV" Coordinates or "UVs" for short
Note the squished appearance of the image
UV coordinates specify the position we read in the image, with bottom left being 0,0 and the top right being 1,1 . Neither UV coordinates, nor shaders themselves have any concept of image resolution, screen resolution or aspect ratio. If we want to address individual pixels, it’s on us to express that in terms of UV coordinates.
The framebuffer is passed into the fragment shader in line uniform sampler2D texture as a texture. Using the blur shader, we draw a “Full Screen Quad”, a rectangle covering the entire canvas, with matching 0,0 in the bottom-left and 1,1 in the top-right varying vec2 uv UV coordinates to read from the texture.
The texture’s aspect-ratio and resolution are the same as the output canvas’s aspect-ratio and resolution, thus there is a 1:1 pixel mapping between the texture we will process and our output canvas. The graphics pipeline steps and vertex shader responsible for this are not important for this article.
The blur fragment shader accesses the color of the texture with texture2D(texture, uv) , at the matching output pixel’s position. In following examples, we’ll read from neighboring pixels, for which we’ll need to calculate a UV coordinate offset, a decimal fraction corresponding to one pixel step, calculated with with 1 / canvasResolution
One way to think of fragment shader code is "What are the instructions to construct this output pixel?"
Graphics programming is uniquely challenging in the beginning, because of how many rules and limitations the hardware, graphics APIs and the rendering pipeline impose. But it also unlocks incredible potential, as other limitations dissolve. Let’s find out how graphics programmers have leveraged that potential.
Box Blur #
From a programmer’s perspective, the most straight forward way is to average the neighbors of a pixel using a for-loop. What the fragment shader is expressing is: “look Y pixels up & down, X pixels left & right and average the colors”. The more we want to blur, the more we have to increase kernelSize , the bounds of our for-loop.
for ( int y = - kernel_size ; y <= kernel_size ; ++ y ) { for ( int x = - kernel_size ; x <= kernel_size ; ++ x ) { vec2 offset = vec2 ( x , y ) * samplePosMult * frameSizeRCP ; sum += texture2D ( texture , uv + offset ) ; } }
The bigger the for-loop, the more texture reads we perform, per output-pixel. Each texture read is often called a “texture tap” and the total amount of those “taps” per-frame will now also be displayed. New controls, new samplePosMultiplier , new terms - Play around with them, get a feel for them, with a constant eye on FPS.
Scene Lights Bloom Animate ❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again. FPS: ? / ? ms Resolution: ? x ? Texture Taps: ? kernelSize kernelSize 7x7 px samplePosMultiplier samplePosMultiplier 100 % lightBrightness lightBrightness 100 % Benchmark 10 Iterations 10 100 1000 10000 100000
Detailed Benchmark Results ・Resolution: 1600x1200 ・Execution Time: ~ ? / iteration ・Texture taps: ? Million / iteration ・GPU info: Blur Fragment Shader boxBlur.fs precision highp float ; varying vec2 uv ; uniform vec2 frameSizeRCP ; uniform float samplePosMult ; uniform float bloomStrength ; uniform sampler2D texture ; const int kernel_size = KERNEL_SIZE ; void main ( ) { vec4 sum = vec4 ( 0.0 ) ; const int size = 2 * kernel_size + 1 ; const float totalSamples = float ( size * size ) ; for ( int y = - kernel_size ; y <= kernel_size ; ++ y ) { for ( int x = - kernel_size ; x <= kernel_size ; ++ x ) { vec2 offset = vec2 ( x , y ) * samplePosMult * frameSizeRCP ; sum += texture2D ( texture , uv + offset ) ; } } gl_FragColor = ( sum / totalSamples ) * bloomStrength ; } WebGL Javascript boxBlur.js import * as util from '../utility.js' export async function setupBoxBlur ( ) { const WebGLBox = document . getElementById ( 'WebGLBox-BoxBlur' ) ; const WebGLBoxDetail = document . getElementById ( 'WebGLBox-BoxBlurDetail' ) ; const canvas = WebGLBox . querySelector ( 'canvas' ) ; const radius = 0.12 ; const gl = canvas . getContext ( 'webgl' , { preserveDrawingBuffer : false , antialias : false , alpha : false , } ) ; const ctx = { mode : "scene" , flags : { isRendering : false , buffersInitialized : false , initComplete : false , benchMode : false } , tex : { sdr : null , selfIllum : null , frame : null , frameFinal : null } , fb : { scene : null , final : null } , shd : { scene : { handle : null , uniforms : { offset : null , radius : null } } , blur : { handle : null , uniforms : { frameSizeRCP : null , samplePosMult : null , bloomStrength : null } } , bloom : { handle : null , uniforms : { offset : null , radius : null , texture : null , textureAdd : null } } } } ; const ui = { display : { spinner : canvas . parentElement . querySelector ( 'svg' , canvas . parentElement ) , contextLoss : canvas . parentElement . querySelector ( 'div' , canvas . parentElement ) , fps : WebGLBox . querySelector ( '#fps' ) , ms : WebGLBox . querySelector ( '#ms' ) , width : WebGLBox . querySelector ( '#width' ) , height : WebGLBox . querySelector ( '#height' ) , tapsCount : WebGLBox . querySelector ( '#taps' ) , } , blur : { kernelSize : WebGLBox . querySelector ( '#sizeRange' ) , samplePos : WebGLBox . querySelector ( '#samplePosRange' ) , samplePosReset : WebGLBox . querySelector ( '#samplePosRangeReset' ) , } , rendering : { animate : WebGLBox . querySelector ( '#animateCheck' ) , modes : WebGLBox . querySelectorAll ( 'input[type="radio"]' ) , lightBrightness : WebGLBox . querySelector ( '#lightBrightness' ) , lightBrightnessReset : WebGLBox . querySelector ( '#lightBrightnessReset' ) , } , benchmark : { button : WebGLBox . querySelector ( '#benchmark' ) , label : WebGLBox . querySelector ( '#benchmarkLabel' ) , iterOut : WebGLBox . querySelector ( '#iterOut' ) , renderer : WebGLBoxDetail . querySelector ( '#renderer' ) , iterTime : WebGLBoxDetail . querySelector ( '#iterTime' ) , tapsCount : WebGLBoxDetail . querySelector ( '#tapsCountBench' ) , iterations : WebGLBox . querySelector ( '#iterations' ) } } ; const circleAnimation = await util . fetchShader ( "shader/circleAnimation.vs" ) ; const simpleTexture = await util . fetchShader ( "shader/simpleTexture.fs" ) ; const bloomVert = await util . fetchShader ( "shader/bloom.vs" ) ; const bloomFrag = await util . fetchShader ( "shader/bloom.fs" ) ; const simpleQuad = await util . fetchShader ( "shader/simpleQuad.vs" ) ; const boxBlurFrag = await util . fetchShader ( "shader/boxBlur.fs" ) ; ui . blur . kernelSize . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . blur . samplePos . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . rendering . lightBrightness . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . rendering . animate . addEventListener ( "change" , ( ) => { if ( ui . rendering . animate . checked ) startRendering ( ) ; else { ui . display . fps . value = "-" ; ui . display . ms . value = "-" ; ctx . flags . isRendering = false ; redraw ( ) } } ) ; canvas . addEventListener ( "webglcontextlost" , ( ) => { ui . display . contextLoss . style . display = "block" ; } ) ; ui . blur . kernelSize . addEventListener ( 'input' , ( ) => { reCompileBlurShader ( ui . blur . kernelSize . value ) ; ui . blur . samplePos . disabled = ui . blur . kernelSize . value == 0 ; ui . blur . samplePosReset . disabled = ui . blur . kernelSize . value == 0 ; } ) ; ui . rendering . modes . forEach ( radio => { if ( radio . value === "scene" ) radio . checked = true ; radio . addEventListener ( 'change' , ( event ) => { ctx . mode = event . target . value ; ui . rendering . lightBrightness . disabled = ctx . mode === "scene" ; ui . rendering . lightBrightnessReset . disabled = ctx . mode === "scene" ; if ( ! ui . rendering . animate . checked ) redraw ( ) ; } ) ; } ) ; ui . benchmark . button . addEventListener ( "click" , ( ) => { ctx . flags . benchMode = true ; stopRendering ( ) ; ui . display . spinner . style . display = "block" ; ui . benchmark . button . disabled = true ; const worker = new Worker ( "./js/benchmark/boxBlurBenchmark.js" , { type : "module" } ) ; worker . postMessage ( { iterations : ui . benchmark . iterOut . value , blurShaderSrc : boxBlurFrag , kernelSize : ui . blur . kernelSize . value , samplePos : ui . blur . samplePos . value } ) ; worker . addEventListener ( "message" , ( event ) => { if ( event . data . type !== "done" ) return ; ui . benchmark . label . textContent = event . data . benchText ; ui . benchmark . tapsCount . textContent = event . data . tapsCount ; ui . benchmark . iterTime . textContent = event . data . iterationText ; ui . benchmark . renderer . textContent = event . data . renderer ; worker . terminate ( ) ; ui . benchmark . button . disabled = false ; ctx . flags . benchMode = false ; if ( ui . rendering . animate . checked ) startRendering ( ) ; else redraw ( ) ; } ) ; } ) ; ui . benchmark . iterations . addEventListener ( "change" , ( event ) => { ui . benchmark . iterOut . value = event . target . value ; ui . benchmark . label . textContent = "Benchmark" ; } ) ; ctx . shd . scene = util . compileAndLinkShader ( gl , circleAnimation , simpleTexture , [ "offset" , "radius" ] ) ; ctx . shd . bloom = util . compileAndLinkShader ( gl , bloomVert , bloomFrag , [ "texture" , "textureAdd" , "offset" , "radius" ] ) ; function reCompileBlurShader ( blurSize ) { ctx . shd . blur = util . compileAndLinkShader ( gl , simpleQuad , boxBlurFrag , [ "frameSizeRCP" , "samplePosMult" , "bloomStrength" ] , "#define KERNEL_SIZE " + blurSize + '
' ) ; } reCompileBlurShader ( ui . blur . kernelSize . value ) util . bindUnitQuad ( gl ) ; async function setupTextureBuffers ( ) { ui . display . spinner . style . display = "block" ; ctx . flags . buffersInitialized = true ; ctx . flags . initComplete = false ; gl . deleteFramebuffer ( ctx . fb . scene ) ; gl . deleteFramebuffer ( ctx . fb . final ) ; [ ctx . fb . scene , ctx . tex . frame ] = util . setupFramebuffer ( gl , canvas . width , canvas . height ) ; [ ctx . fb . final , ctx . tex . frameFinal ] = util . setupFramebuffer ( gl , canvas . width , canvas . height ) ; let [ base , selfIllum ] = await Promise . all ( [ fetch ( "/dual-kawase/img/SDR_No_Sprite.png" ) , fetch ( "/dual-kawase/img/Selfillumination.png" ) ] ) ; let [ baseBlob , selfIllumBlob ] = await Promise . all ( [ base . blob ( ) , selfIllum . blob ( ) ] ) ; let [ baseBitmap , selfIllumBitmap ] = await Promise . all ( [ createImageBitmap ( baseBlob , { colorSpaceConversion : 'none' , resizeWidth : canvas . width * 1.12 , resizeHeight : canvas . height * 1.12 , resizeQuality : "high" } ) , createImageBitmap ( selfIllumBlob , { colorSpaceConversion : 'none' , resizeWidth : canvas . width * 1.12 , resizeHeight : canvas . height * 1.12 , resizeQuality : "high" } ) ] ) ; ctx . tex . sdr = util . setupTexture ( gl , null , null , ctx . tex . sdr , gl . LINEAR , baseBitmap ) ; ctx . tex . selfIllum = util . setupTexture ( gl , null , null , ctx . tex . selfIllum , gl . LINEAR , selfIllumBitmap ) ; baseBitmap . close ( ) ; selfIllumBitmap . close ( ) ; ctx . flags . initComplete = true ; ui . display . spinner . style . display = "none" ; } let prevNow = performance . now ( ) ; let lastStatsUpdate = prevNow ; let fpsEMA = 60 ; let msEMA = 16 ; async function redraw ( ) { if ( ! ctx . flags . buffersInitialized ) await setupTextureBuffers ( ) ; if ( ! ctx . flags . initComplete ) return ; const KernelSizeSide = ui . blur . kernelSize . value * 2 + 1 ; const tapsNewText = ( canvas . width * canvas . height * KernelSizeSide * KernelSizeSide / 1000000 ) . toFixed ( 1 ) + " Million" ; ui . display . tapsCount . value = tapsNewText ; ui . display . width . value = canvas . width ; ui . display . height . value = canvas . height ; let radiusSwitch = ui . rendering . animate . checked ? radius : 0.0 ; let speed = ( performance . now ( ) / 10000 ) % Math . PI * 2 ; const offset = [ radiusSwitch * Math . cos ( speed ) , radiusSwitch * Math . sin ( speed ) ] ; gl . useProgram ( ctx . shd . scene . handle ) ; const texture = ctx . mode == "scene" ? ctx . tex . sdr : ctx . tex . selfIllum ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , texture ) ; gl . uniform2fv ( ctx . shd . scene . uniforms . offset , offset ) ; gl . uniform1f ( ctx . shd . scene . uniforms . radius , radiusSwitch ) ; gl . bindFramebuffer ( gl . FRAMEBUFFER , ctx . fb . scene ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; gl . useProgram ( ctx . shd . blur . handle ) ; const finalFB = ctx . mode == "bloom" ? ctx . fb . final : null ; gl . bindFramebuffer ( gl . FRAMEBUFFER , finalFB ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . bloomStrength , ctx . mode == "scene" ? 1.0 : ui . rendering . lightBrightness . value ) ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . frame ) ; gl . uniform2f ( ctx . shd . blur . uniforms . frameSizeRCP , 1.0 / canvas . width , 1.0 / canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . samplePosMult , ui . blur . samplePos . value ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; if ( ctx . mode == "bloom" ) { gl . bindFramebuffer ( gl . FRAMEBUFFER , null ) ; gl . useProgram ( ctx . shd . bloom . handle ) ; gl . uniform2fv ( ctx . shd . bloom . uniforms . offset , offset ) ; gl . uniform1f ( ctx . shd . bloom . uniforms . radius , radiusSwitch ) ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . sdr ) ; gl . uniform1i ( ctx . shd . bloom . uniforms . texture , 0 ) ; gl . activeTexture ( gl . TEXTURE1 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . frameFinal ) ; gl . uniform1i ( ctx . shd . bloom . uniforms . textureAdd , 1 ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; } gl . finish ( ) ; const now = performance . now ( ) ; let dt = now - prevNow ; if ( dt > 0 ) { const instFPS = 1000 / dt ; const ALPHA = 0.05 ; fpsEMA = ALPHA * instFPS + ( 1 - ALPHA ) * fpsEMA ; msEMA = ALPHA * dt + ( 1 - ALPHA ) * msEMA ; } prevNow = now ; if ( ui . rendering . animate . checked && now - lastStatsUpdate >= 1000 ) { ui . display . fps . value = fpsEMA . toFixed ( 0 ) ; ui . display . ms . value = msEMA . toFixed ( 2 ) ; lastStatsUpdate = now ; } } let animationFrameId ; function nativeResize ( ) { const [ width , height ] = util . getNativeSize ( canvas ) ; if ( width && canvas . width !== width || height && canvas . height !== height ) { canvas . width = width ; canvas . height = height ; if ( ! ctx . flags . benchMode ) { stopRendering ( ) ; startRendering ( ) ; } if ( ! ui . rendering . animate . checked ) redraw ( ) ; } } nativeResize ( ) ; let resizePending = false ; window . addEventListener ( 'resize' , ( ) => { if ( ! resizePending ) { resizePending = true ; requestAnimationFrame ( ( ) => { resizePending = false ; nativeResize ( ) ; } ) ; } } ) ; function renderLoop ( ) { if ( ctx . flags . isRendering && ui . rendering . animate . checked ) { redraw ( ) ; animationFrameId = requestAnimationFrame ( renderLoop ) ; } } function startRendering ( ) { ctx . flags . isRendering = true ; renderLoop ( ) ; } function stopRendering ( ) { ctx . flags . isRendering = false ; cancelAnimationFrame ( animationFrameId ) ; gl . finish ( ) ; gl . deleteTexture ( ctx . tex . sdr ) ; ctx . tex . sdr = null ; gl . deleteTexture ( ctx . tex . selfIllum ) ; ctx . tex . selfIllum = null ; gl . deleteTexture ( ctx . tex . frame ) ; ctx . tex . frame = null ; gl . deleteTexture ( ctx . tex . frameFinal ) ; ctx . tex . frameFinal = null ; gl . deleteFramebuffer ( ctx . fb . scene ) ; ctx . fb . scene = null ; gl . deleteFramebuffer ( ctx . fb . final ) ; ctx . fb . final = null ; ctx . flags . buffersInitialized = false ; ctx . flags . initComplete = false ; ui . display . fps . value = "-" ; ui . display . ms . value = "-" ; } function handleIntersection ( entries ) { entries . forEach ( entry => { if ( entry . isIntersecting ) { if ( ! ctx . flags . isRendering && ! ctx . flags . benchMode ) startRendering ( ) ; } else { stopRendering ( ) ; } } ) ; } let observer = new IntersectionObserver ( handleIntersection ) ; observer . observe ( canvas ) ; }
Visually, the result doesn’t look very pleasing. The stronger the blur, the more “boxy” features of the image become. This is due to us reading and averaging the texture in a square shape. Especially in bloom mode, with strong lightBrightness and big kernelSize , lights become literally squares.
Performance is also really bad. With bigger kernelSizes , our Texture Taps count skyrockets and performance drops. Mobile devices will come to a slog. Even the worlds fastest PC graphics cards will fall below screen refresh-rate by cranking kernelSize and zooming the article on PC, thus raising canvas resolution.
We kinda failed on all fronts. It looks bad and runs bad.
Then, there’s this samplePosMultiplier . It seems to also seemingly increase blur strength, without increasing textureTaps or lowering performance (or lowering performance just a little on certain devices). But if we crank that too much, we get artifacts in the form of repeating patterns. Let’s play with a schematic example:
The white center square represents the output pixel
Grey squares are the pixels we would read, with the current kernelSize , with samplePosMult untouched
, with untouched the black dots are our actual texture reads per-output-pixel, our “sample” positions
kernelSize kernelSize 3×3 samplePosMult samplePosMult 100 %
On can say, that an image is a “continous 2D signal”. When we texture tap at a specific coordinate, we are sampling the “image signal” at that coordinate. As previously mentioned, we use UV coordinates and are not bound by concepts like “pixels position”. Where we place our samples is completely up to us.
A fundamental blur algorithm option is increasing the sample distance away from the center, thus increasing the amount of image we cover with our samples - more bang for your sample buck. This works by multiplying the offset distance. That is what samplePosMult does and is something you will have access to going forward.
Doing it too much, brings ugly repeating patterns. This of course leaves some fundamental questions, like where these artifacts come from and what it even means to read between two pixels. And on top of that we have to address performance and the boxyness of our blur! But first…
What even is a kernel? #
What we have created with our for-loop, is a convolution. Very simplified, in the context of image processing, it’s usually a square of numbers constructing an output pixel, by gathering and weighting pixels, that the square covers. The square is called a kernel and was the thing we visualized previously.
For blurs, the kernel weights must sum up to 1. If that were not the case, we would either brighten or darken the image. Ensuring that is the normalization step. In the box blur above, this happens by dividing the summed pixel color by totalSamples , the total amount of samples taken. A basic “calculate the average” expression.
The same can be expressed as weights of a kernel, a number multiplied with each sample at that position. Since the box blur weighs all sample the same regardless of position, all weights are the same. This is visualized next. The bigger the kernel size, the smaller the weights.
kernelSize kernelSize 3×3 samplePosMult samplePosMult 100 %
Kernels applied at the edges of our image will read from areas “outside” the image, with UV coordinates smaller than 0,0 and bigger than 1,1 . Luckily, the GPU handles this for us and we are free to decide what happens to those outside samples, by setting the Texture Wrapping mode.
Texture Wrapping Modes and results on blurring (Note the color black bleeding-in)
Top: Framebuffer, zoomed out. Bottom: Framebuffer normal, with strong blur applied
Among others, we can define a solid color to be used, or to “clamp” to the nearest edge’s color. If we choose a solid color, then we will get color bleeding at the edges. Thus for almost all post-processing use-cases, edge color clamping is used, as it prevents weird things happening at the edges. This article does too.
You may have noticed a black "blob" streaking with stronger blur levels along the bottom. Specifically here, it happens because the lines between the floor tiles align with the bottom edge, extending black color to infinity
Convolution as a mathematical concept is surprisingly deep and 3blue1brown has an excellent video on it, that even covers the image processing topic. Theoretically, we won’t depart from convolutions. We can dissect our code and express it as weights and kernels. With the for-loop box blur, that was quite easy!
But what is a convolution?
YouTube Video by 3Blue1Brown
On a practical level though, understanding where the convolution is, how many there are and what kernels are at play will become more and more difficult, once we leave the realm of classical blurs and consider the wider implications of reading between pixel bounds. But for now, we stay with the classics:
Gaussian Blur #
The most famous of blur algorithms is the Gaussian Blur. It uses the normal distribution, also known as the bell Curve to weight the samples inside the kernel, with a new variable sigma σ to control the flatness of the curve. Other than generating the kernel weights, the algorithm is identical to the box blur algorithm.
Gaussian blur weights formula for point (x,y) (Source)
To calculate the weights for point (x,y) , the above formula is used. The gaussian formula has a weighting multiplier 1/√(2πσ²) . In the code, there is no such thing though. The formula expresses the gaussian curve as a continuous function going to infinity. But our code and its for-loop are different - discrete and finite.
float gaussianWeight ( float x , float y , float sigma ) { return exp ( - ( x * x + y * y ) / ( 2.0 * sigma * sigma ) ) ; }
For clarity, the kernel is generated in the fragment shader. Normally, that should be avoided. Fragment shaders run per-output-pixel, but the kernel weights stay the same, making this inefficient.
Just like with the box blur, weights are summed up and divided at the end, instead of the term 1/√(2πσ²) precalculating weights. sigma controls the sharpness of the curve and thus the blur strength, but wasn’t that the job of kernelSize ? Play around with all the values below and get a feel for how the various values behave.
Scene Lights Bloom Animate ❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again. FPS: ? / ? ms Resolution: ? x ? Texture Taps: ? kernelSize kernelSize 7x7 px samplePosMultiplier samplePosMultiplier 100 % lightBrightness lightBrightness 100 % sigma sigma ± 2.00 σ Benchmark 10 Iterations 10 100 1000 10000 100000
Detailed Benchmark Results ・Resolution: 1600x1200 ・Execution Time: ~ ? / iteration ・Texture taps: ? Million / iteration ・GPU info: Blur Fragment Shader gaussianBlur.fs precision highp float ; varying vec2 uv ; uniform vec2 frameSizeRCP ; uniform float samplePosMult ; uniform float sigma ; uniform float bloomStrength ; uniform sampler2D texture ; const int kernel_size = KERNEL_SIZE ; float gaussianWeight ( float x , float y , float sigma ) { return exp ( - ( x * x + y * y ) / ( 2.0 * sigma * sigma ) ) ; } void main ( ) { vec4 sum = vec4 ( 0.0 ) ; float weightSum = 0.0 ; const int size = 2 * kernel_size + 1 ; const float totalSamples = float ( size * size ) ; for ( int y = - kernel_size ; y <= kernel_size ; ++ y ) { for ( int x = - kernel_size ; x <= kernel_size ; ++ x ) { float w = gaussianWeight ( float ( x ) , float ( y ) , sigma ) ; vec2 offset = vec2 ( x , y ) * samplePosMult * frameSizeRCP ; sum += texture2D ( texture , uv + offset ) * w ; weightSum += w ; } } gl_FragColor = ( sum / weightSum ) * bloomStrength ; } WebGL Javascript gaussianBlur.js import * as util from '../utility.js' export async function setupGaussianBlur ( ) { const WebGLBox = document . getElementById ( 'WebGLBox-GaussianBlur' ) ; const WebGLBoxDetail = document . getElementById ( 'WebGLBox-GaussianBlurDetail' ) ; const canvas = WebGLBox . querySelector ( 'canvas' ) ; const radius = 0.12 ; const gl = canvas . getContext ( 'webgl' , { preserveDrawingBuffer : false , antialias : false , alpha : false , } ) ; const ctx = { mode : "scene" , flags : { isRendering : false , buffersInitialized : false , initComplete : false , benchMode : false } , tex : { sdr : null , selfIllum : null , frame : null , frameFinal : null } , fb : { scene : null , final : null } , shd : { scene : { handle : null , uniforms : { offset : null , radius : null } } , blur : { handle : null , uniforms : { frameSizeRCP : null , samplePosMult : null , sigma : null , bloomStrength : null } } , bloom : { handle : null , uniforms : { offset : null , radius : null , texture : null , textureAdd : null } } } } ; const ui = { display : { spinner : canvas . parentElement . querySelector ( 'svg' , canvas . parentElement ) , contextLoss : canvas . parentElement . querySelector ( 'div' , canvas . parentElement ) , fps : WebGLBox . querySelector ( '#fps' ) , ms : WebGLBox . querySelector ( '#ms' ) , width : WebGLBox . querySelector ( '#width' ) , height : WebGLBox . querySelector ( '#height' ) , tapsCount : WebGLBox . querySelector ( '#taps' ) , } , blur : { kernelSize : WebGLBox . querySelector ( '#sizeRange' ) , sigma : WebGLBox . querySelector ( '#sigmaRange' ) , samplePos : WebGLBox . querySelector ( '#samplePosRange' ) , samplePosReset : WebGLBox . querySelector ( '#samplePosRangeReset' ) , } , rendering : { animate : WebGLBox . querySelector ( '#animateCheck' ) , modes : WebGLBox . querySelectorAll ( 'input[type="radio"]' ) , lightBrightness : WebGLBox . querySelector ( '#lightBrightness' ) , lightBrightnessReset : WebGLBox . querySelector ( '#lightBrightnessReset' ) , } , benchmark : { button : WebGLBox . querySelector ( '#benchmark' ) , label : WebGLBox . querySelector ( '#benchmarkLabel' ) , iterOut : WebGLBox . querySelector ( '#iterOut' ) , renderer : WebGLBoxDetail . querySelector ( '#renderer' ) , iterTime : WebGLBoxDetail . querySelector ( '#iterTime' ) , tapsCount : WebGLBoxDetail . querySelector ( '#tapsCountBench' ) , iterations : WebGLBox . querySelector ( '#iterations' ) } } ; const circleAnimation = await util . fetchShader ( "shader/circleAnimation.vs" ) ; const simpleTexture = await util . fetchShader ( "shader/simpleTexture.fs" ) ; const bloomVert = await util . fetchShader ( "shader/bloom.vs" ) ; const bloomFrag = await util . fetchShader ( "shader/bloom.fs" ) ; const simpleQuad = await util . fetchShader ( "shader/simpleQuad.vs" ) ; const gaussianBlurFrag = await util . fetchShader ( "shader/gaussianBlur.fs" ) ; ui . blur . kernelSize . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . blur . sigma . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . blur . samplePos . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . rendering . lightBrightness . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . rendering . animate . addEventListener ( "change" , ( ) => { if ( ui . rendering . animate . checked ) startRendering ( ) ; else { ui . display . fps . value = "-" ; ui . display . ms . value = "-" ; ctx . flags . isRendering = false ; redraw ( ) } } ) ; canvas . addEventListener ( "webglcontextlost" , ( ) => { ui . display . contextLoss . style . display = "block" ; } ) ; ui . blur . kernelSize . addEventListener ( 'input' , ( ) => { reCompileBlurShader ( ui . blur . kernelSize . value ) ; ui . blur . samplePos . disabled = ui . blur . kernelSize . value == 0 ; ui . blur . samplePosReset . disabled = ui . blur . kernelSize . value == 0 ; } ) ; ui . rendering . modes . forEach ( radio => { if ( radio . value === "scene" ) radio . checked = true ; radio . addEventListener ( 'change' , ( event ) => { ctx . mode = event . target . value ; ui . rendering . lightBrightness . disabled = ctx . mode === "scene" ; ui . rendering . lightBrightnessReset . disabled = ctx . mode === "scene" ; if ( ! ui . rendering . animate . checked ) redraw ( ) ; } ) ; } ) ; ui . benchmark . button . addEventListener ( "click" , ( ) => { ctx . flags . benchMode = true ; stopRendering ( ) ; ui . display . spinner . style . display = "block" ; ui . benchmark . button . disabled = true ; const worker = new Worker ( "./js/benchmark/gaussianBlurBenchmark.js" , { type : "module" } ) ; worker . postMessage ( { iterations : ui . benchmark . iterOut . value , blurShaderSrc : gaussianBlurFrag , kernelSize : ui . blur . kernelSize . value , samplePos : ui . blur . samplePos . value , sigma : ui . blur . sigma . value } ) ; worker . addEventListener ( "message" , ( event ) => { if ( event . data . type !== "done" ) return ; ui . benchmark . label . textContent = event . data . benchText ; ui . benchmark . tapsCount . textContent = event . data . tapsCount ; ui . benchmark . iterTime . textContent = event . data . iterationText ; ui . benchmark . renderer . textContent = event . data . renderer ; worker . terminate ( ) ; ui . benchmark . button . disabled = false ; ctx . flags . benchMode = false ; if ( ui . rendering . animate . checked ) startRendering ( ) ; else redraw ( ) ; } ) ; } ) ; ui . benchmark . iterations . addEventListener ( "change" , ( event ) => { ui . benchmark . iterOut . value = event . target . value ; ui . benchmark . label . textContent = "Benchmark" ; } ) ; ctx . shd . scene = util . compileAndLinkShader ( gl , circleAnimation , simpleTexture , [ "offset" , "radius" ] ) ; ctx . shd . bloom = util . compileAndLinkShader ( gl , bloomVert , bloomFrag , [ "texture" , "textureAdd" , "offset" , "radius" ] ) ; function reCompileBlurShader ( blurSize ) { ctx . shd . blur = util . compileAndLinkShader ( gl , simpleQuad , gaussianBlurFrag , [ "frameSizeRCP" , "samplePosMult" , "bloomStrength" , "sigma" ] , "#define KERNEL_SIZE " + blurSize + '
' ) ; } reCompileBlurShader ( ui . blur . kernelSize . value ) util . bindUnitQuad ( gl ) ; async function setupTextureBuffers ( ) { ui . display . spinner . style . display = "block" ; ctx . flags . buffersInitialized = true ; ctx . flags . initComplete = false ; gl . deleteFramebuffer ( ctx . fb . scene ) ; gl . deleteFramebuffer ( ctx . fb . final ) ; [ ctx . fb . scene , ctx . tex . frame ] = util . setupFramebuffer ( gl , canvas . width , canvas . height ) ; [ ctx . fb . final , ctx . tex . frameFinal ] = util . setupFramebuffer ( gl , canvas . width , canvas . height ) ; let [ base , selfIllum ] = await Promise . all ( [ fetch ( "/dual-kawase/img/SDR_No_Sprite.png" ) , fetch ( "/dual-kawase/img/Selfillumination.png" ) ] ) ; let [ baseBlob , selfIllumBlob ] = await Promise . all ( [ base . blob ( ) , selfIllum . blob ( ) ] ) ; let [ baseBitmap , selfIllumBitmap ] = await Promise . all ( [ createImageBitmap ( baseBlob , { colorSpaceConversion : 'none' , resizeWidth : canvas . width * 1.12 , resizeHeight : canvas . height * 1.12 , resizeQuality : "high" } ) , createImageBitmap ( selfIllumBlob , { colorSpaceConversion : 'none' , resizeWidth : canvas . width * 1.12 , resizeHeight : canvas . height * 1.12 , resizeQuality : "high" } ) ] ) ; ctx . tex . sdr = util . setupTexture ( gl , null , null , ctx . tex . sdr , gl . LINEAR , baseBitmap ) ; ctx . tex . selfIllum = util . setupTexture ( gl , null , null , ctx . tex . selfIllum , gl . LINEAR , selfIllumBitmap ) ; baseBitmap . close ( ) ; selfIllumBitmap . close ( ) ; ctx . flags . initComplete = true ; ui . display . spinner . style . display = "none" ; } let prevNow = performance . now ( ) ; let lastStatsUpdate = prevNow ; let fpsEMA = 60 ; let msEMA = 16 ; async function redraw ( ) { if ( ! ctx . flags . buffersInitialized ) await setupTextureBuffers ( ) ; if ( ! ctx . flags . initComplete ) return ; const KernelSizeSide = ui . blur . kernelSize . value * 2 + 1 ; const tapsNewText = ( canvas . width * canvas . height * KernelSizeSide * KernelSizeSide / 1000000 ) . toFixed ( 1 ) + " Million" ; ui . display . tapsCount . value = tapsNewText ; ui . display . width . value = canvas . width ; ui . display . height . value = canvas . height ; let radiusSwitch = ui . rendering . animate . checked ? radius : 0.0 ; let speed = ( performance . now ( ) / 10000 ) % Math . PI * 2 ; const offset = [ radiusSwitch * Math . cos ( speed ) , radiusSwitch * Math . sin ( speed ) ] ; gl . useProgram ( ctx . shd . scene . handle ) ; const texture = ctx . mode == "scene" ? ctx . tex . sdr : ctx . tex . selfIllum ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , texture ) ; gl . uniform2fv ( ctx . shd . scene . uniforms . offset , offset ) ; gl . uniform1f ( ctx . shd . scene . uniforms . radius , radiusSwitch ) ; gl . bindFramebuffer ( gl . FRAMEBUFFER , ctx . fb . scene ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; gl . useProgram ( ctx . shd . blur . handle ) ; const finalFB = ctx . mode == "bloom" ? ctx . fb . final : null ; gl . bindFramebuffer ( gl . FRAMEBUFFER , finalFB ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . bloomStrength , ctx . mode == "scene" ? 1.0 : ui . rendering . lightBrightness . value ) ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . frame ) ; gl . uniform2f ( ctx . shd . blur . uniforms . frameSizeRCP , 1.0 / canvas . width , 1.0 / canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . samplePosMult , ui . blur . samplePos . value ) ; gl . uniform1f ( ctx . shd . blur . uniforms . sigma , Math . max ( ui . blur . kernelSize . value / ui . blur . sigma . value , 0.001 ) ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; if ( ctx . mode == "bloom" ) { gl . bindFramebuffer ( gl . FRAMEBUFFER , null ) ; gl . useProgram ( ctx . shd . bloom . handle ) ; gl . uniform2fv ( ctx . shd . bloom . uniforms . offset , offset ) ; gl . uniform1f ( ctx . shd . bloom . uniforms . radius , radiusSwitch ) ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . sdr ) ; gl . uniform1i ( ctx . shd . bloom . uniforms . texture , 0 ) ; gl . activeTexture ( gl . TEXTURE1 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . frameFinal ) ; gl . uniform1i ( ctx . shd . bloom . uniforms . textureAdd , 1 ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; } gl . finish ( ) ; const now = performance . now ( ) ; let dt = now - prevNow ; if ( dt > 0 ) { const instFPS = 1000 / dt ; const ALPHA = 0.05 ; fpsEMA = ALPHA * instFPS + ( 1 - ALPHA ) * fpsEMA ; msEMA = ALPHA * dt + ( 1 - ALPHA ) * msEMA ; } prevNow = now ; if ( ui . rendering . animate . checked && now - lastStatsUpdate >= 1000 ) { ui . display . fps . value = fpsEMA . toFixed ( 0 ) ; ui . display . ms . value = msEMA . toFixed ( 2 ) ; lastStatsUpdate = now ; } } let animationFrameId ; function nativeResize ( ) { const [ width , height ] = util . getNativeSize ( canvas ) ; if ( width && canvas . width !== width || height && canvas . height !== height ) { canvas . width = width ; canvas . height = height ; if ( ! ctx . flags . benchMode ) { stopRendering ( ) ; startRendering ( ) ; } if ( ! ui . rendering . animate . checked ) redraw ( ) ; } } nativeResize ( ) ; let resizePending = false ; window . addEventListener ( 'resize' , ( ) => { if ( ! resizePending ) { resizePending = true ; requestAnimationFrame ( ( ) => { resizePending = false ; nativeResize ( ) ; } ) ; } } ) ; function renderLoop ( ) { if ( ctx . flags . isRendering && ui . rendering . animate . checked ) { redraw ( ) ; animationFrameId = requestAnimationFrame ( renderLoop ) ; } } function startRendering ( ) { ctx . flags . isRendering = true ; renderLoop ( ) ; } function stopRendering ( ) { ctx . flags . isRendering = false ; cancelAnimationFrame ( animationFrameId ) ; gl . finish ( ) ; gl . deleteTexture ( ctx . tex . sdr ) ; ctx . tex . sdr = null ; gl . deleteTexture ( ctx . tex . selfIllum ) ; ctx . tex . selfIllum = null ; gl . deleteTexture ( ctx . tex . frame ) ; ctx . tex . frame = null ; gl . deleteTexture ( ctx . tex . frameFinal ) ; ctx . tex . frameFinal = null ; gl . deleteFramebuffer ( ctx . fb . scene ) ; ctx . fb . scene = null ; gl . deleteFramebuffer ( ctx . fb . final ) ; ctx . fb . final = null ; ctx . flags . buffersInitialized = false ; ctx . flags . initComplete = false ; ui . display . fps . value = "-" ; ui . display . ms . value = "-" ; } function handleIntersection ( entries ) { entries . forEach ( entry => { if ( entry . isIntersecting ) { if ( ! ctx . flags . isRendering && ! ctx . flags . benchMode ) startRendering ( ) ; } else { stopRendering ( ) ; } } ) ; } let observer = new IntersectionObserver ( handleIntersection ) ; observer . observe ( canvas ) ; }
The blur looks way smoother than our previous box blur, with things generally taking on a “rounder” appearance, due to the bell curve’s smooth signal response. That is, unless you move the sigma slider down. If you move sigma too low, you will get our previous box blur like artifacts again.
Let’s clear up what the values actually represent and how they interact. The following visualization shows the kernel with its weights expressed as height in an Isometric Dimetric perspective projection. There are two different interaction modes with sigma when changing kernelSize and two ways to express sigma .
Absolute Sigma Relative Sigma kernelSize kernelSize 7×7 sigma sigma ± 3.00 σ 1.00 px
sigma describes the flatness of our mathematical curve, a curve going to infinity. But our algorithm has a limited kernelSize . Where the kernel stops, no more pixel contributions occur, leading to box-blur-like artifacts due to the cut-off. In the context of image processing, there are two ways to setup a gaussian blur…
A small sigma, thus a flat bell curve, paired with a small kernel size effectively is a box blur, with the weights making the kernel box-shaped.
… way 1: Absolute Sigma. sigma is an absolute value in pixels independent of kernelSize , with kernelSize acting as a “window into the curve” or way 2: sigma is expressed relative to the current kernelSize . For practical reasons (finicky sliders) the relative to kernelSize mode is used everywhere.
Eitherway, the infinite gaussian curve will have a cut-off somewhere. sigma too small? - We get box blur like artifacts. sigma too big? - We waste blur efficiency, as the same perceived blur strength requires bigger kernels, thus bigger for-loops with lower performance. An artistic trade-off every piece of software has to make.
An optimal kernel would be one, where the outer weights are almost zero. Thus, if we increased kernelSize in Absolute Sigma mode by one, it would make close to no more visual difference.
There are other ways of creating blur kernels, with other properties. One way is to follow Pascal’s triangle to get a set of predefined kernel sizes and weights. These are called Binomial Filters and lock us into specific “kernel presets”, but solve the infinity vs cut-off dilemma, by moving weights to zero within the sampling window.
Binomial Kernels are also Gaussian-like in their frequency response. We won’t expand on these further, just know that we can choose kernels by different mathematical criteria, chasing different signal response characteristics. But speaking of which, what even is Gaussian Like? Why do we care?
What is Gaussian-like? #
In Post-Processing Blur algorithms you generally find two categories. Bokeh Blurs and Gaussian-Like Blurs. The gaussian is chosen for its natural appearance, its ability to smooth colors without “standout features”. Gaussian Blurs are generally used as an ingredient in an overarching visual effect, be it frosted glass Interfaces or Bloom.
Bokeh Blur and Gaussian Blur compared.
In contrast to that, when emulating lenses and or creating Depth of Field, is “Bokeh Blur” - also known as “Lens Blur” or “Cinematic Blur”. This type of blur is the target visual effect. The challenges and approaches are very much related, but algorithms used differ.
Algorithms get really creative in this space, all with different trade-offs and visuals. Some sample using a poission disk distribution and some have cool out of the box thinking: Computerphile covered a comlex numbers based approach to creating Bokeh Blurs, a fascinating number theory cross-over.
Video Game & Complex Bokeh Blurs
YouTube Video by Computerphile
This article though doesn’t care about these stylistics approaches. We are here to chase a basic building block of graphics programming and realtime visual effects, a “Gaussian-Like” with good performance. Speaking of which!
The main motivator of our journey here, is the chase of realtime performance. Everything we do must happen within few milliseconds. The expected performance of an algorithm and the practical cost once placed in the graphics pipeline, are sometimes surprisingly different numbers though. Gotta measure!
This chapter is about a very technical motivation. If you don't care about how fast a GPU does what it does, feel free to skip this section.
With performance being such a driving motivator, it would be a shame if we couldn’t measure it in this article. Each WebGL Box has a benchmark function, which blurs random noise at a fixed resolution of 1600x1200 with the respective blur settings you chose and a fixed iteration count workload, a feature hidden so far.
Realtime graphics programming is sometimes more about measuring than programming.
Benchmarking is best done by measuring shader execution time. This can be done in the browser reliably, but only on some platforms. No way exists to do so across all platforms. Luckily, there is the classic method of “stalling the graphics pipeline”, forcing a wait until all commands finish, a moment in time we can measure.
Across all platforms a stall is guaranteed to occur on command gl.readPixels() . Interestingly, the standards conform command for this: gl.finish() is simply ignored by mobile apple devices.
Below is a button, that unlocks this benchmarking feature, unhiding a benchmark button and Detailed Benchmark Results section under each blur. This allows you to start a benchmark with a preset workload, on a separate Browser Worker. There is only one issue: Browsers get very angry if you full-load the GPU this way.
If the graphics pipeline is doing work without reporting back (called “yielding”) to the browser for too long, browsers will simply kill all GPU access for the whole page, until tab reload. If we yield back, then the measured results are useless and from inside WebGL, we can’t stop the GPU, once its commands are issued.
⚠️ Especially on mobile: please increase kernelSize and iterations slowly. The previous algorithms have bad kernelSize performance scaling on purpose, be especially careful with them.
Stay below 2 seconds of execution time, or the browser will lock GPU access for the page, disabling all blur examples, until a browser restart is performed. On iOS Safari this requires a trip to the App Switcher, a page reload won't be enough.
Unlock Benchmarks
iOS and iPad OS are especially strict, will keep GPU access disabled, even on Tab Reload. You will have go to the App Switcher (Double Tap Home Button), Swipe Safari Up to close it and relaunch it from scratch.
What are we optimizing for? #
With the above Box Blur and above Gaussian Blur, you will measure performance scaling very badly with kernelSize . Expressed in the Big O notation, it has a performance scaling of O(pixelCount * kernelSize²) . Quadratic scaling of required texture taps in terms of kernelSize . We need to tackle this going forward.
Especially dedicated Laptop GPUs are slow to get out of their lower power states. Pressing the benchmark button multiple times in a row may result in the performance numbers getting better.
Despite the gaussian blur calculating the kernel completely from scratch on every single pixel in our implementation, the performance of the box blur and gaussian blur are very close to each other at higher iteration counts. In fact, by precalculating the those kernels we could performance match both.
But isn't gaussian blur a more complicated algorithm?
As opposed to chips from decades ago, modern graphics cards have very fast arithmetic, but comparatively slow memory access times. With workloads like these, the slowest thing becomes the memory access, in our case the texture taps. The more taps, the slower the algorithm.
Our blurs perform a dependant texture read, a graphics programming sin. This is when texture coordinates are determined during shader execution, which opts out of a many automated shader optimizations.
Especially on personal computers, you may also have noticed that increasing samplePosMultiplier will negatively impact performance (up to a point), even though the required texture taps stay the same.
This is due hardware texture caches accelerating texture reads which are spatially close together and not being able to do so effectively, if the texture reads are all too far apart. Platform dependant tools like Nvidia NSight can measure GPU cache utilization. The browser cannot.
These are key numbers graphics programmers chase when writing fragment shaders: Texture Taps and Cache Utilization. There is another one, we will get into in a moment. Clearly, our Blurs are slow. Time for a speed up!
Separable Gaussian Blur #
We have not yet left the classics of blur algorithms. One fundamental concept left on the table is “convolution separability”. Certain Convolutions like our Box Blur, our Gaussian Blur and the Binominal filtering mentioned in passing previously can all be performed in two separate passes, by two separate 1D Kernels.
Gaussian blur weights formula for, separated
Not all convolutions are separable. In the context of graphics programming: If you can express the kernel weights as a formula with axes X, Y and factor-out both X and Y into two separate formulas, then you have gained separability of a 2D kernel and can perform the convolution in two passes, massively saving on texture taps.
Some big budget video games have used effects with kernels that are not separable, but did it anyway in two passes + 1D Kernel for the performance gain, with the resulting artifacts being deemed not too bad.
Computerphile covered the concept of separability in the context of 2D image processing really well, if you are interested in a more formal explanation.
Separable Filters and a Bauble
YouTube Video by Computerphile
Here is our Gaussian Blur, but expressed as a separable Version. You can see just Pass 1 and Pass 2 in isolation or see the final result. Same visual quality as our Gaussian Blur, same dials, but massively faster, with no more quadratic scaling of required texture taps.
Scene Lights Bloom Animate ❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again. Pass 1 Pass 2 Both FPS: ? / ? ms Resolution: ? x ? Texture Taps: ? kernelSize kernelSize 7x7 px samplePosMultiplier samplePosMultiplier 100 % lightBrightness lightBrightness 100 % sigma sigma ± 2.00 σ Benchmark 10 Iterations 10 100 1000 10000 100000
Detailed Benchmark Results ・Resolution: 1600x1200 ・Pass Mode: ? ・Execution Time: ~ ? / iteration ・Texture taps: ? Million / iteration ・GPU info: Blur Fragment Shader gaussianBlurSeparable.fs precision highp float ; varying vec2 uv ; uniform vec2 frameSizeRCP ; uniform float samplePosMult ; uniform float sigma ; uniform vec2 direction ; uniform float bloomStrength ; uniform sampler2D texture ; const int kernel_size = KERNEL_SIZE ; float gaussianWeight ( float x , float sigma ) { return exp ( - ( x * x ) / ( 2.0 * sigma * sigma ) ) ; } void main ( ) { vec4 sum = vec4 ( 0.0 ) ; float weightSum = 0.0 ; const int size = 2 * kernel_size + 1 ; for ( int i = - kernel_size ; i <= kernel_size ; ++ i ) { float w = gaussianWeight ( float ( i ) , sigma ) ; vec2 offset = vec2 ( i ) * direction * samplePosMult * frameSizeRCP ; sum += texture2D ( texture , uv + offset ) * w ; weightSum += w ; } gl_FragColor = ( sum / weightSum ) * bloomStrength ; } WebGL Javascript gaussianSeparableBlur.js import * as util from '../utility.js' export async function setupGaussianSeparableBlur ( ) { const WebGLBox = document . getElementById ( 'WebGLBox-GaussianSeparableBlur' ) ; const canvas = WebGLBox . querySelector ( 'canvas' ) ; const radius = 0.12 ; const gl = canvas . getContext ( 'webgl' , { preserveDrawingBuffer : false , antialias : false , alpha : false , } ) ; const ctx = { mode : "scene" , passMode : "pass1" , flags : { isRendering : false , buffersInitialized : false , initComplete : false , benchMode : false } , tex : { sdr : null , selfIllum : null , frame : null , frameIntermediate : null , frameFinal : null } , fb : { scene : null , intermediate : null , final : null } , shd : { scene : { handle : null , uniforms : { offset : null , radius : null } } , blur : { handle : null , uniforms : { frameSizeRCP : null , samplePosMult : null , sigma : null , bloomStrength : null , direction : null } } , bloom : { handle : null , uniforms : { offset : null , radius : null , texture : null , textureAdd : null } } } } ; const ui = { display : { spinner : canvas . parentElement . querySelector ( 'svg' , canvas . parentElement ) , contextLoss : canvas . parentElement . querySelector ( 'div' , canvas . parentElement ) , fps : WebGLBox . querySelector ( '#fps' ) , ms : WebGLBox . querySelector ( '#ms' ) , width : WebGLBox . querySelector ( '#width' ) , height : WebGLBox . querySelector ( '#height' ) , tapsCount : WebGLBox . querySelector ( '#taps' ) , } , blur : { kernelSize : WebGLBox . querySelector ( '#sizeRange' ) , sigma : WebGLBox . querySelector ( '#sigmaRange' ) , samplePos : WebGLBox . querySelector ( '#samplePosRange' ) , samplePosReset : WebGLBox . querySelector ( '#samplePosRangeReset' ) , } , rendering : { animate : WebGLBox . querySelector ( '#animateCheck' ) , modes : WebGLBox . querySelectorAll ( 'input[name="modeGaussSep"]' ) , passModes : WebGLBox . querySelectorAll ( 'input[name="passMode"]' ) , lightBrightness : WebGLBox . querySelector ( '#lightBrightness' ) , lightBrightnessReset : WebGLBox . querySelector ( '#lightBrightnessReset' ) , } , benchmark : { button : WebGLBox . querySelector ( '#benchmark' ) , label : WebGLBox . querySelector ( '#benchmarkLabel' ) , iterOut : WebGLBox . querySelector ( '#iterOut' ) , renderer : document . getElementById ( 'WebGLBox-GaussianSeparableBlurDetail' ) . querySelector ( '#renderer' ) , passMode : document . getElementById ( 'WebGLBox-GaussianSeparableBlurDetail' ) . querySelector ( '#passMode' ) , iterTime : document . getElementById ( 'WebGLBox-GaussianSeparableBlurDetail' ) . querySelector ( '#iterTime' ) , tapsCount : document . getElementById ( 'WebGLBox-GaussianSeparableBlurDetail' ) . querySelector ( '#tapsCountBench' ) , iterations : WebGLBox . querySelector ( '#iterations' ) } } ; const circleAnimation = await util . fetchShader ( "shader/circleAnimation.vs" ) ; const simpleTexture = await util . fetchShader ( "shader/simpleTexture.fs" ) ; const bloomVert = await util . fetchShader ( "shader/bloom.vs" ) ; const bloomFrag = await util . fetchShader ( "shader/bloom.fs" ) ; const simpleQuad = await util . fetchShader ( "shader/simpleQuad.vs" ) ; const gaussianBlurFrag = await util . fetchShader ( "shader/gaussianBlurSeparable.fs" ) ; ui . blur . kernelSize . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . blur . sigma . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . blur . samplePos . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . rendering . lightBrightness . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . rendering . animate . addEventListener ( "change" , ( ) => { if ( ui . rendering . animate . checked ) startRendering ( ) ; else { ui . display . fps . value = "-" ; ui . display . ms . value = "-" ; ctx . flags . isRendering = false ; redraw ( ) } } ) ; canvas . addEventListener ( "webglcontextlost" , ( ) => { ui . display . contextLoss . style . display = "block" ; } ) ; ui . blur . kernelSize . addEventListener ( 'input' , ( ) => { reCompileBlurShader ( ui . blur . kernelSize . value ) ; ui . blur . samplePos . disabled = ui . blur . kernelSize . value == 0 ; ui . blur . samplePosReset . disabled = ui . blur . kernelSize . value == 0 ; } ) ; ui . rendering . modes . forEach ( radio => { if ( radio . value === "scene" ) radio . checked = true ; radio . addEventListener ( 'change' , ( event ) => { ctx . mode = event . target . value ; ui . rendering . lightBrightness . disabled = ctx . mode === "scene" ; ui . rendering . lightBrightnessReset . disabled = ctx . mode === "scene" ; if ( ! ui . rendering . animate . checked ) redraw ( ) ; } ) ; } ) ; ui . rendering . passModes . forEach ( radio => { if ( radio . value === "pass1" ) radio . checked = true ; radio . addEventListener ( 'change' , ( event ) => { ctx . passMode = event . target . value ; if ( ! ui . rendering . animate . checked ) redraw ( ) ; } ) ; } ) ; ui . benchmark . button . addEventListener ( "click" , ( ) => { ctx . flags . benchMode = true ; stopRendering ( ) ; ui . display . spinner . style . display = "block" ; ui . benchmark . button . disabled = true ; const worker = new Worker ( "./js/benchmark/gaussianSeparableBlurBenchmark.js" , { type : "module" } ) ; worker . postMessage ( { iterations : ui . benchmark . iterOut . value , blurShaderSrc : gaussianBlurFrag , kernelSize : ui . blur . kernelSize . value , samplePos : ui . blur . samplePos . value , sigma : ui . blur . sigma . value , passMode : ctx . passMode } ) ; worker . addEventListener ( "message" , ( event ) => { if ( event . data . type !== "done" ) return ; ui . benchmark . label . textContent = event . data . benchText ; ui . benchmark . tapsCount . textContent = event . data . tapsCount ; ui . benchmark . iterTime . textContent = event . data . iterationText ; ui . benchmark . renderer . textContent = event . data . renderer ; ui . benchmark . passMode . textContent = event . data . passMode ; worker . terminate ( ) ; ui . benchmark . button . disabled = false ; ctx . flags . benchMode = false ; if ( ui . rendering . animate . checked ) startRendering ( ) ; else redraw ( ) ; } ) ; } ) ; ui . benchmark . iterations . addEventListener ( "change" , ( event ) => { ui . benchmark . iterOut . value = event . target . value ; ui . benchmark . label . textContent = "Benchmark" ; } ) ; ctx . shd . scene = util . compileAndLinkShader ( gl , circleAnimation , simpleTexture , [ "offset" , "radius" ] ) ; ctx . shd . bloom = util . compileAndLinkShader ( gl , bloomVert , bloomFrag , [ "texture" , "textureAdd" , "offset" , "radius" ] ) ; function reCompileBlurShader ( blurSize ) { ctx . shd . blur = util . compileAndLinkShader ( gl , simpleQuad , gaussianBlurFrag , [ "frameSizeRCP" , "samplePosMult" , "bloomStrength" , "sigma" , "direction" ] , "#define KERNEL_SIZE " + blurSize + '
' ) ; } reCompileBlurShader ( ui . blur . kernelSize . value ) util . bindUnitQuad ( gl ) ; async function setupTextureBuffers ( ) { ui . display . spinner . style . display = "block" ; ctx . flags . buffersInitialized = true ; ctx . flags . initComplete = false ; gl . deleteFramebuffer ( ctx . fb . scene ) ; gl . deleteFramebuffer ( ctx . fb . intermediate ) ; gl . deleteFramebuffer ( ctx . fb . final ) ; [ ctx . fb . scene , ctx . tex . frame ] = util . setupFramebuffer ( gl , canvas . width , canvas . height ) ; [ ctx . fb . intermediate , ctx . tex . frameIntermediate ] = util . setupFramebuffer ( gl , canvas . width , canvas . height ) ; [ ctx . fb . final , ctx . tex . frameFinal ] = util . setupFramebuffer ( gl , canvas . width , canvas . height ) ; gl . bindFramebuffer ( gl . FRAMEBUFFER , ctx . fb . intermediate ) ; gl . clearColor ( 0.0 , 0.0 , 0.0 , 1.0 ) ; gl . clear ( gl . COLOR_BUFFER_BIT ) ; let [ base , selfIllum ] = await Promise . all ( [ fetch ( "/dual-kawase/img/SDR_No_Sprite.png" ) , fetch ( "/dual-kawase/img/Selfillumination.png" ) ] ) ; let [ baseBlob , selfIllumBlob ] = await Promise . all ( [ base . blob ( ) , selfIllum . blob ( ) ] ) ; let [ baseBitmap , selfIllumBitmap ] = await Promise . all ( [ createImageBitmap ( baseBlob , { colorSpaceConversion : 'none' , resizeWidth : canvas . width * 1.12 , resizeHeight : canvas . height * 1.12 , resizeQuality : "high" } ) , createImageBitmap ( selfIllumBlob , { colorSpaceConversion : 'none' , resizeWidth : canvas . width * 1.12 , resizeHeight : canvas . height * 1.12 , resizeQuality : "high" } ) ] ) ; ctx . tex . sdr = util . setupTexture ( gl , null , null , ctx . tex . sdr , gl . LINEAR , baseBitmap ) ; ctx . tex . selfIllum = util . setupTexture ( gl , null , null , ctx . tex . selfIllum , gl . LINEAR , selfIllumBitmap ) ; baseBitmap . close ( ) ; selfIllumBitmap . close ( ) ; ctx . flags . initComplete = true ; ui . display . spinner . style . display = "none" ; } let prevNow = performance . now ( ) ; let lastStatsUpdate = prevNow ; let fpsEMA = 60 ; let msEMA = 16 ; async function redraw ( ) { if ( ! ctx . flags . buffersInitialized ) await setupTextureBuffers ( ) ; if ( ! ctx . flags . initComplete ) return ; const KernelSizeSide = ui . blur . kernelSize . value * 2 + 1 ; const samplesPerPixel = ctx . passMode == "combined" ? KernelSizeSide * 2 : KernelSizeSide ; const tapsNewText = ( canvas . width * canvas . height * samplesPerPixel / 1000000 ) . toFixed ( 1 ) + " Million" ; ui . display . tapsCount . value = tapsNewText ; ui . display . width . value = canvas . width ; ui . display . height . value = canvas . height ; let radiusSwitch = ui . rendering . animate . checked ? radius : 0.0 ; let speed = ( performance . now ( ) / 10000 ) % Math . PI * 2 ; const offset = [ radiusSwitch * Math . cos ( speed ) , radiusSwitch * Math . sin ( speed ) ] ; gl . useProgram ( ctx . shd . scene . handle ) ; const texture = ctx . mode == "scene" ? ctx . tex . sdr : ctx . tex . selfIllum ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , texture ) ; gl . uniform2fv ( ctx . shd . scene . uniforms . offset , offset ) ; gl . uniform1f ( ctx . shd . scene . uniforms . radius , radiusSwitch ) ; gl . bindFramebuffer ( gl . FRAMEBUFFER , ctx . fb . scene ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; gl . useProgram ( ctx . shd . blur . handle ) ; if ( ctx . passMode == "pass1" ) { const finalFB = ctx . mode == "bloom" ? ctx . fb . final : null ; gl . bindFramebuffer ( gl . FRAMEBUFFER , finalFB ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . bloomStrength , ctx . mode == "scene" ? 1.0 : ui . rendering . lightBrightness . value ) ; gl . uniform2f ( ctx . shd . blur . uniforms . direction , 1.0 , 0.0 ) ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . frame ) ; gl . uniform2f ( ctx . shd . blur . uniforms . frameSizeRCP , 1.0 / canvas . width , 1.0 / canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . samplePosMult , ui . blur . samplePos . value ) ; gl . uniform1f ( ctx . shd . blur . uniforms . sigma , Math . max ( ui . blur . kernelSize . value / ui . blur . sigma . value , 0.001 ) ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; } else if ( ctx . passMode == "pass2" ) { const finalFB = ctx . mode == "bloom" ? ctx . fb . final : null ; gl . bindFramebuffer ( gl . FRAMEBUFFER , finalFB ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . bloomStrength , ctx . mode == "scene" ? 1.0 : ui . rendering . lightBrightness . value ) ; gl . uniform2f ( ctx . shd . blur . uniforms . direction , 0.0 , 1.0 ) ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . frame ) ; gl . uniform2f ( ctx . shd . blur . uniforms . frameSizeRCP , 1.0 / canvas . width , 1.0 / canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . samplePosMult , ui . blur . samplePos . value ) ; gl . uniform1f ( ctx . shd . blur . uniforms . sigma , Math . max ( ui . blur . kernelSize . value / ui . blur . sigma . value , 0.001 ) ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; } else { gl . bindFramebuffer ( gl . FRAMEBUFFER , ctx . fb . intermediate ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . bloomStrength , ctx . mode == "scene" ? 1.0 : ui . rendering . lightBrightness . value ) ; gl . uniform2f ( ctx . shd . blur . uniforms . direction , 1.0 , 0.0 ) ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . frame ) ; gl . uniform2f ( ctx . shd . blur . uniforms . frameSizeRCP , 1.0 / canvas . width , 1.0 / canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . samplePosMult , ui . blur . samplePos . value ) ; gl . uniform1f ( ctx . shd . blur . uniforms . sigma , Math . max ( ui . blur . kernelSize . value / ui . blur . sigma . value , 0.001 ) ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; const finalFB = ctx . mode == "bloom" ? ctx . fb . final : null ; gl . bindFramebuffer ( gl . FRAMEBUFFER , finalFB ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . bloomStrength , ctx . mode == "scene" ? 1.0 : ui . rendering . lightBrightness . value ) ; gl . uniform2f ( ctx . shd . blur . uniforms . direction , 0.0 , 1.0 ) ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . frameIntermediate ) ; gl . uniform2f ( ctx . shd . blur . uniforms . frameSizeRCP , 1.0 / canvas . width , 1.0 / canvas . height ) ; gl . uniform1f ( ctx . shd . blur . uniforms . samplePosMult , ui . blur . samplePos . value ) ; gl . uniform1f ( ctx . shd . blur . uniforms . sigma , Math . max ( ui . blur . kernelSize . value / ui . blur . sigma . value , 0.001 ) ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; } if ( ctx . mode == "bloom" ) { gl . bindFramebuffer ( gl . FRAMEBUFFER , null ) ; gl . useProgram ( ctx . shd . bloom . handle ) ; gl . uniform2fv ( ctx . shd . bloom . uniforms . offset , offset ) ; gl . uniform1f ( ctx . shd . bloom . uniforms . radius , radiusSwitch ) ; gl . activeTexture ( gl . TEXTURE0 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . sdr ) ; gl . uniform1i ( ctx . shd . bloom . uniforms . texture , 0 ) ; gl . activeTexture ( gl . TEXTURE1 ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . frameFinal ) ; gl . uniform1i ( ctx . shd . bloom . uniforms . textureAdd , 1 ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; } gl . finish ( ) ; const now = performance . now ( ) ; let dt = now - prevNow ; if ( dt > 0 ) { const instFPS = 1000 / dt ; const ALPHA = 0.05 ; fpsEMA = ALPHA * instFPS + ( 1 - ALPHA ) * fpsEMA ; msEMA = ALPHA * dt + ( 1 - ALPHA ) * msEMA ; } prevNow = now ; if ( ui . rendering . animate . checked && now - lastStatsUpdate >= 1000 ) { ui . display . fps . value = fpsEMA . toFixed ( 0 ) ; ui . display . ms . value = msEMA . toFixed ( 2 ) ; lastStatsUpdate = now ; } } let animationFrameId ; function nativeResize ( ) { const [ width , height ] = util . getNativeSize ( canvas ) ; if ( width && canvas . width !== width || height && canvas . height !== height ) { canvas . width = width ; canvas . height = height ; if ( ! ctx . flags . benchMode ) { stopRendering ( ) ; startRendering ( ) ; } if ( ! ui . rendering . animate . checked ) redraw ( ) ; } } nativeResize ( ) ; let resizePending = false ; window . addEventListener ( 'resize' , ( ) => { if ( ! resizePending ) { resizePending = true ; requestAnimationFrame ( ( ) => { resizePending = false ; nativeResize ( ) ; } ) ; } } ) ; function renderLoop ( ) { if ( ctx . flags . isRendering && ui . rendering . animate . checked ) { redraw ( ) ; animationFrameId = requestAnimationFrame ( renderLoop ) ; } } function startRendering ( ) { ctx . flags . isRendering = true ; renderLoop ( ) ; } function stopRendering ( ) { ctx . flags . isRendering = false ; cancelAnimationFrame ( animationFrameId ) ; gl . finish ( ) ; gl . deleteTexture ( ctx . tex . sdr ) ; ctx . tex . sdr = null ; gl . deleteTexture ( ctx . tex . selfIllum ) ; ctx . tex . selfIllum = null ; gl . deleteTexture ( ctx . tex . frame ) ; ctx . tex . frame = null ; gl . deleteTexture ( ctx . tex . frameIntermediate ) ; ctx . tex . frameIntermediate = null ; gl . deleteTexture ( ctx . tex . frameFinal ) ; ctx . tex . frameFinal = null ; gl . deleteFramebuffer ( ctx . fb . scene ) ; ctx . fb . scene = null ; gl . deleteFramebuffer ( ctx . fb . intermediate ) ; ctx . fb . intermediate = null ; gl . deleteFramebuffer ( ctx . fb . final ) ; ctx . fb . final = null ; ctx . flags . buffersInitialized = false ; ctx . flags . initComplete = false ; ui . display . fps . value = "-" ; ui . display . ms . value = "-" ; } function handleIntersection ( entries ) { entries . forEach ( entry => { if ( entry . isIntersecting ) { if ( ! ctx . flags . isRendering && ! ctx . flags . benchMode ) startRendering ( ) ; } else { stopRendering ( ) ; } } ) ; } let observer = new IntersectionObserver ( handleIntersection ) ; observer . observe ( canvas ) ; }
If you benchmark the performance, you will see a massive performance uplift, as compared to our Gaussian Blur! But there is a trade-off made, that’s not quite obvious. In order to have two passes, we are writing out a new framebuffer. Remember the “modern chips are fast but memory access in relation is not” thing?
With a modern High-res 4k screen video game, multi-pass anything implies writing out 8.2 Million Pixels to memory, just to read them back in. With smaller kernels on high-res displays, a separable kernel may not always be faster. But with bigger kernels, it almost always is. With a massive speed-up gained, how much faster can we go?
The magic of frequency space #
…how about blurs that happen so fast, that they are considered free! We are doing a bit of a detour into Frequency Space image manipulation.
Any 2D image can be converted and edited in frequency space, which unlocks a whole new sort of image manipulation. To blur an image in this paradigm, we perform an image Fast Fourier Transform, then mask high frequency areas to perform the blur and finally do the inverse transformation.
A Fourier Transform decomposes a signal into its underlying Sine Frequencies. The output of an image Fast Fourier Transform are “Magnitude” and “Phase” component images. These images can be combined back together with the inverse image FFT to produce the original image again…
Input image for the following interactive FFT example
The green stripes are not an error, they are baked into the image on purpose.
…but before doing so, we can manipulate the frequency representation of the image in various ways. Less reading, more interaction! In the following interactive visualization you have the magnitude image, brightness boosted into a human visible representation on the left and the reconstructed image on the right.
For now, play around with removing energy. You can paint on the magnitude image with your fingers or with the mouse. The output image will be reconstructed accordingly. Also, play around with the circular mask and the feathering sliders. Try to build intuition for what’s happening.
Upload Image Remove Frequency Energy Add Frequency Energy Reset Magnitude frequencyCutRadius frequencyCutRadius off feather feather 0
The magnitude image represents the frequency make-up of the image, with the lowest frequencies in the middle and higher at the edges. Horizontal frequencies (vertical features in the image) follow the X Axis and vertical frequencies (Horizontal features in the image) follow the Y Axis, with in-betweens being the diagonals.
Repeating patterns in the image lighten up as bright-points in the magnitude representation. Or rather, their frequencies have high energy: E.g. the green grid I added. Removing it in photoshop wouldn’t be easy! But in frequency space it is easy! Just paint over the blueish 3 diagonal streaks.
Removing repeating features by finger-painting black over frequencies still blows me away.
As you may have noticed, the Magnitude representation holds mirrored information. This is due to the FFT being a complex number analysis and our image having only “real” component pixels, leaving redundant information. The underlying number theory was covered in great detail by 3Blue1Brown:
But what is the Fourier Transform? A visual introduction.
YouTube Video by 3Blue1Brown
The underlying code this time is not written by me, but is from @turbomaze’s repo JS-Fourier-Image-Analysis. There is no standard on how you are supposed to plot the magnitude information and how the quadrants are layed out. I changed the implementation by @turbomaze to follow the convention used by ImageMagick.
We can blur the image by painting the frequency energy black in a radius around the center, thus eliminating higher frequencies and blurring the image. If we do so with a pixel perfect circle, then we get ringing artifacts - The Gibbs phenomenon. By feathering the circle, we lessen this ringing and the blur cleans up.
Drawing a circle like this? That's essentially free on the GPU! We get the equivalent of get super big kernels for free!
But not everything is gold that glitters. First of all, performance. Yes, the “blur” in frequency space is essentially free, but the trip to frequency space, is everything but. The main issue comes down to FFT transformations performing writes to exponentially many pixels per input pixel, a performance killer.
And then there's still the inverse conversion!
But our shaders work the other way around, expressing the “instructions to construct an output pixel”. There are fragment shader based GPU implementations, but they rely on many passes for calculation, a lot of memory access back and forth. Furthermore, non-power of two images require a slower algorithm.
This article is in the realm of fragment shaders and the graphics pipeline a GPU is part of, but there are also GPGPU and compute shader implementations with no fragment shader specific limitations. Unfortunately the situation remains: Conversion of high-res images to frequency space is too costly in the context of realtime graphics.
Deleting the frequencies of that grid is magical, but leaves artifacts. In reality it's worse, as my example is idealized. Click Upload Image, take a photo of a repeating pattern and see how cleanly you can get rid of it.
Then there are the artifacts I have glossed over. The FFT transformation considers the image as an infinite 2D signal. By blurring, we are bleeding through color from the neighbor copies. And that’s not to mention various ringing artifacts that happen. None of this is unsolvable! But there a more underlying issue…
What is a Low-Pass filter? #
It's a filter that removes high frequencies and leaves the low ones, easy!
Try the FFT Example again and decrease the frequencyCutRadius to blur. At some point the green lines disappear, right? It is a low pass filter, one where high frequencies are literally annihilated. Small bright lights in the distance? Also annihilated…
Upload Image Remove Frequency Energy Add Frequency Energy Reset Magnitude frequencyCutRadius frequencyCutRadius off feather feather 0
If we were to use this to build an effect like bloom, it would remove small lights that are meant to bloom as well! Our gaussian blur on the other hand, also a low-pass filter, samples and weights every pixel. In a way it “takes the high frequency energy and spreads it into low frequency energy”.
So Low Pass Filter ≠ Low Pass Filter, it depends on context as to what is meant by that word and the reason the article didn’t use it until now. Frequency Space energy attenuations are simply not the correct tool for our goal of a “basic graphics programming building block” for visual effects.
This is a deep misunderstanding I held for year, as in why didn't video games such a powerful tool?
There are other frequency space image representations, not just FFT Magnitude + Phase. Another famous one is Discrete cosine transform. Again, computerphile covered it in great detail in a video. As for realtime hires images, no. DCT conversion is multiple magnitudes slower. Feel free to dive deeper into frequency space…
JPEG DCT, Discrete Cosine Transform (JPEG Pt2)
YouTube Video by Computerphile
…as for this article, it’s the end of our frequency space detour. We talked so much about what’s slow on the GPU. Let’s talk about something that’s not just fast, but free:
Bilinear Interpolation #
Reading from textures comes with a freebie. When reading between pixels, the closet four pixel are interpolated bilinearly to create the final read, unless you switch to Nearest Neightbor mode. Below you can drag the color sample with finger touch or the mouse. Take note of how and when the color changes in the respective modes.
Nearest Neighbor Bilinear Animate
Since reading between pixels gets a linear mix of pixel neighbors, we can linearly interpolate part of our gaussian kernel, sometimes called a Linear Gaussian. By tweaking gaussian weights and reducing the amount of samples we could do a 7 × 7 gaussian kernel worth of information with only a 4 × 4 kernel, as shown in the linked article.
Though mathematically not the same, visually the result is very close. There are a lot of hand-crafted variations on this, different mixes of kernel sizes and interpolation amounts.
Bilinear interpolation allows us to resize an image by reading from it at lower resolution. In a way, it’s a free bilinear resize built into every graphics chip, zero performance impact. But there is a limit - the bilinear interpolation is limited to a 2 × 2 sample square. Try to resize the kiwi below in different modes.
To make this more obvious, the following canvas renders at 25% of native resolution
Nearest Neighbor Bilinear Animate ❌ The browser killed this WebGL Context, please reload the page. If this happened as the result of a long benchmark, decrease the iteration count. On some platforms (iOS / iPad) you may have to restart the browser App completely, as the browser will temporarily refuse to allow this site to run WebGL again. kiwiSize kiwiSize 100 %
WebGL Vertex Shader circleAnimationSize.vs attribute vec2 vtx ; varying vec2 uv ; uniform vec2 offset ; uniform float kiwiSize ; void main ( ) { uv = vtx * vec2 ( 0.5 , - 0.5 ) + 0.5 ; gl_Position = vec4 ( vtx * kiwiSize + offset , 0.0 , 1.0 ) ; } WebGL Fragment Shader simpleTexture.fs precision highp float ; varying vec2 uv ; uniform sampler2D texture ; void main ( ) { gl_FragColor = texture2D ( texture , uv ) ; } WebGL Javascript bilinear.js import * as util from './utility.js' export async function setupBilinear ( ) { const WebGLBox = document . getElementById ( 'WebGLBox-Bilinear' ) ; const canvas = WebGLBox . querySelector ( 'canvas' ) ; const radius = 0.12 ; const resDiv = 4 ; let renderFramebuffer , renderTexture ; let buffersInitialized = false ; const gl = canvas . getContext ( 'webgl' , { preserveDrawingBuffer : false , antialias : false , alpha : true , } ) ; const ctx = { mode : "nearest" , flags : { isRendering : false , initComplete : false } , tex : { sdr : null } , shd : { kiwi : { handle : null , uniforms : { offset : null , kiwiSize : null } } , blit : { handle : null , uniforms : { texture : null } } } } ; const ui = { display : { spinner : canvas . parentElement . querySelector ( 'svg' ) , contextLoss : canvas . parentElement . querySelector ( 'div' ) , } , rendering : { modes : WebGLBox . querySelectorAll ( 'input[type="radio"]' ) , animate : WebGLBox . querySelector ( '#animateCheck' ) , kiwiSize : WebGLBox . querySelector ( '#kiwiSize' ) , } } ; ui . rendering . modes . forEach ( radio => { if ( radio . value === "nearest" ) radio . checked = true ; radio . addEventListener ( 'change' , ( event ) => { ctx . mode = event . target . value ; if ( ! ui . rendering . animate . checked ) redraw ( ) ; } ) ; } ) ; const circleAnimationSize = await util . fetchShader ( "shader/circleAnimationSize.vs" ) ; const simpleTexture = await util . fetchShader ( "shader/simpleTexture.fs" ) ; const simpleQuad = await util . fetchShader ( "shader/simpleQuad.vs" ) ; ui . rendering . kiwiSize . addEventListener ( 'input' , ( ) => { if ( ! ui . rendering . animate . checked ) redraw ( ) } ) ; ui . rendering . animate . addEventListener ( "change" , ( ) => { if ( ui . rendering . animate . checked ) startRendering ( ) ; else { ctx . flags . isRendering = false ; redraw ( ) } } ) ; canvas . addEventListener ( "webglcontextlost" , ( ) => { ui . display . contextLoss . style . display = "block" ; } ) ; ctx . shd . kiwi = util . compileAndLinkShader ( gl , circleAnimationSize , simpleTexture , [ "offset" , "kiwiSize" ] ) ; ctx . shd . blit = util . compileAndLinkShader ( gl , simpleQuad , simpleTexture , [ "texture" ] ) ; gl . useProgram ( ctx . shd . kiwi . handle ) ; util . bindUnitQuad ( gl ) ; function loadSVGAsImage ( blob ) { return new Promise ( ( resolve ) => { const img = new Image ( ) ; const url = URL . createObjectURL ( blob ) ; img . onload = ( ) => { URL . revokeObjectURL ( url ) ; resolve ( img ) ; } ; img . src = url ; } ) ; } async function setupTextureBuffers ( ) { ui . display . spinner . style . display = "block" ; buffersInitialized = true ; ctx . flags . initComplete = false ; gl . deleteFramebuffer ( renderFramebuffer ) ; renderFramebuffer = gl . createFramebuffer ( ) ; gl . bindFramebuffer ( gl . FRAMEBUFFER , renderFramebuffer ) ; gl . deleteTexture ( renderTexture ) ; renderTexture = gl . createTexture ( ) ; gl . bindTexture ( gl . TEXTURE_2D , renderTexture ) ; gl . texParameteri ( gl . TEXTURE_2D , gl . TEXTURE_MIN_FILTER , gl . NEAREST ) ; gl . texParameteri ( gl . TEXTURE_2D , gl . TEXTURE_MAG_FILTER , gl . NEAREST ) ; gl . texParameteri ( gl . TEXTURE_2D , gl . TEXTURE_WRAP_S , gl . CLAMP_TO_EDGE ) ; gl . texParameteri ( gl . TEXTURE_2D , gl . TEXTURE_WRAP_T , gl . CLAMP_TO_EDGE ) ; gl . texImage2D ( gl . TEXTURE_2D , 0 , gl . RGBA , canvas . width / resDiv , canvas . height / resDiv , 0 , gl . RGBA , gl . UNSIGNED_BYTE , null ) ; gl . framebufferTexture2D ( gl . FRAMEBUFFER , gl . COLOR_ATTACHMENT0 , gl . TEXTURE_2D , renderTexture , 0 ) ; buffersInitialized = true ; let base = await fetch ( "img/kiwi4by3.svg" ) ; let baseBlob = await base . blob ( ) ; let baseImage = await loadSVGAsImage ( baseBlob ) ; let baseBitmap = await createImageBitmap ( baseImage , { resizeWidth : canvas . width / resDiv , resizeHeight : canvas . height / resDiv , colorSpaceConversion : 'none' , resizeQuality : "high" } ) ; ctx . tex . sdr = util . setupTexture ( gl , null , null , ctx . tex . sdr , gl . NEAREST , baseBitmap , 4 ) ; baseBitmap . close ( ) ; ctx . flags . initComplete = true ; ui . display . spinner . style . display = "none" ; } async function redraw ( ) { if ( ! buffersInitialized ) await setupTextureBuffers ( ) ; if ( ! ctx . flags . initComplete ) return ; gl . viewport ( 0 , 0 , canvas . width / resDiv , canvas . height / resDiv ) ; if ( ! renderFramebuffer ) return ; gl . bindFramebuffer ( gl . FRAMEBUFFER , renderFramebuffer ) ; gl . clear ( gl . COLOR_BUFFER_BIT ) ; gl . useProgram ( ctx . shd . kiwi . handle ) ; gl . bindTexture ( gl . TEXTURE_2D , ctx . tex . sdr ) ; gl . texParameteri ( gl . TEXTURE_2D , gl . TEXTURE_MIN_FILTER , ctx . mode == "nearest" ? gl . NEAREST : gl . LINEAR ) ; gl . texParameteri ( gl . TEXTURE_2D , gl . TEXTURE_MAG_FILTER , ctx . mode == "nearest" ? gl . NEAREST : gl . LINEAR ) ; let radiusSwitch = ui . rendering . animate . checked ? radius : 0.0 ; let speed = ( performance . now ( ) / 10000 ) % Math . PI * 2 ; const offset = [ radiusSwitch * Math . cos ( speed ) , radiusSwitch * Math . sin ( speed ) ] ; gl . uniform2fv ( ctx . shd . kiwi . uniforms . offset , offset ) ; gl . uniform1f ( ctx . shd . kiwi . uniforms . kiwiSize , ui . rendering . kiwiSize . value ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; gl . bindFramebuffer ( gl . FRAMEBUFFER , null ) ; gl . useProgram ( ctx . shd . blit . handle ) ; if ( ! renderTexture ) return ; gl . bindTexture ( gl . TEXTURE_2D , renderTexture ) ; gl . drawArrays ( gl . TRIANGLE_FAN , 0 , 4 ) ; } let animationFrameId ; function nativeResize ( ) { const [ width , height ] = util . getNativeSize ( canvas ) ; if ( width && canvas . width !== width || height && canvas . height !== height ) { canvas . width = width ; canvas . height = height ; gl . viewport ( 0 , 0 , canvas . width , canvas . height ) ; stopRendering ( ) ; startRendering ( ) ; if ( ! ui . rendering . animate . checked ) redraw ( ) ; } } nativeResize ( ) ; let resizePending = false ; window . addEventListener ( 'resize' , ( ) => { if ( ! resizePending ) { resizePending = true ; requestAnimationFrame ( ( ) => { resizePending = false ; nativeResize ( ) ; } ) ; } } ) ; function renderLoop ( ) { if ( ctx . flags . isRendering && ui . rendering . animate . checked ) { redraw ( ) ; animationFrameId = requestAnimationFrame ( renderLoop ) ; } } function startRendering ( ) { ctx . flags . isRendering = true ; renderLoop ( ) ; } function stopRendering ( ) { ctx . flags . isRendering = false ; cancelAnimationFrame ( animationFrameId ) ; gl . finish ( ) ; gl . deleteTexture ( ctx . tex . sdr ) ; ctx . tex . sdr = null ; gl . deleteTexture ( renderTexture ) ; renderTexture = null ; gl . deleteFramebuffer ( renderFramebuffer ) ; renderFramebuffer = null ; buffersInitialized = false ; ctx . flags . initComplete = false ; } function handleIntersection ( entries ) { entries . forEach ( entry => { if ( entry . isIntersecting ) { if ( ! ctx . flags . isRendering ) startRendering ( ) ; } else { stopRendering ( ) ; } } ) ; } let observer = new IntersectionObserver ( handleIntersection ) ; observer . observe ( canvas ) ; }
Nearest Neightbor looks pixelated, if the size is not at 100% size, which is equivalent to 1:1 pixel mapping. At 100% it moves “jittery”, as it “snaps” to the nearest neighbor. Bilinear keeps things smooth, but going below 50%, especially below 25%, we get exactly the same kind of aliasing, as we would get from nearest neighbor!
You may have noticed similar aliasing when playing YouTube Videos at a very high manually selected video resolution, but in a small window. Same thing!
With 2 × 2 samples, we start skipping over color information, if the underlying pixels are smaller than half a pixel in size. Below 50% size, our bilinear interpolation starts to act like nearest neighbor interpolation. So as a result, we can shrink image in steps of 50%, without “skipping over information” and creating aliasing. Let’s use that!
One fundamental thing thing you can do in post-processing is to shrink “downsample” first, perform the processing at a lower resolution and upsample again. With the idea being, that you wouldn’t notice the lowered resolution. Below is the Separable Gaussian Blur again, with a variable downsample / upsample chain.
Each increase of downSample adds a 50% scale step. Let’s visualize the framebuffers in play, as it gets quite complex. Here is an examp