Visual Computing Lab

Ray Casting in Web Assembly Part I

Peter Trier Mikkelsen — Mon, 01 Apr 2019 12:40:30 +0000

Computer graphics programmers hello world

Software Ray Casting.

New tutorial page on both web Assembly and Ray Casting. Both on Medium and here!

WebGL Virtual Texturing

Thomas Kim Kjeldsen — Fri, 18 Nov 2016 14:05:36 +0000

We visualize Denmark’s Digital Elevation Model in real-time, directly in a browser using WebGL. A virtual texturing technique is applied which enables us to handle a virtual raster size of 1048576 x 1048576 pixels. Hence, it is possible to cover the whole country, except Bornholm, with 40 cm horizontal resolution.

Online demo:

http://denmark3d.alexandra.dk

References:

Making Digital Elevation Models Accessible, Comprehensible, and Engaging through Real-Time Visualization

Sparse Solid Mesh Voxelization Tutorial

Peter Trier Mikkelsen — Tue, 09 Feb 2016 12:48:37 +0000

“Welcome computer graphics interested stranger!”

This is the first installment of a my little tutorial series on mesh to sparse voxel conversion. Conversions between different representations are well described in articles and could be considered trivial. But there are always some challenges involved when trying to implement these kind of things. And it is often the case that reference implementations require a lot of dependencies. So i have tried to compile something that could proves useful into a small an detached code chunk.

Voxelization gone wrong

The first step will be on the topic of converting a triangle mesh into a solid voxel model.

Next we will have a look at how to implement fast marching that produces high quality distance fields. Distance fields require a lot of memory to be accurate, so to the third tutorial will be on the subject of creating adaptive distance fields that are super cool. In the end I will show some code from other guys in our team, where we go full circle and produce triangle and tetrahedral meshes based on the distance fields. So it is quite some work, but it is has proven extremely useful and it is fun to code. And if you are like me, then your also need to see the actual code in order to comprehend all the important details.

In the following i will assume that you have a bit of XP in computer graphics. So the first thing is to load the mesh, and to create an acceleration structure we use to decide where the voxels are. To do this i decided to use a good ol Octree, but with some additional information store. If you are unsure what an octree is you can read more here https://en.wikipedia.org/wiki/Octree.

I guess there is a fair chance that you have used an Octree before, for culling or ray tracing. And guess what! we are going to do exactly that in this tutorial.

Bunny hand subdivided, but unharmed!

Octree voxelization pseudo code.

Input your favorite mesh, flattened into a vertex list ( (v0,v1,v2), (v0,v1,v2).. ), and the voxelsize. Which will determine the number of subdivisions of the Octree.
Then calculate the enclosing power of two bounding box. Because we always subdivide each axis into two equal sized parts.
Each Octree node is subdivided into 8 children blocks, like seen in the bunny 2d example above.
For each of there children we check if there are any intersection between the bounding box and any mesh triangle. This is done using some Thomas Akenine-Möller code
If there are intersection in step 4.) then we either subdivide this INTERNAL node further or if the bounding box is voxel sized, we have a LEAF node.
If not intersection we flag the node as EMPTY_LEAF.

So again pretty basic stuff. But to avoid to much triangle aabb intersection checking, we create a triangle index list with all the intersecting triangles that we submit to the recursive subdivision.

To make the octree more useful later on each LEAF is updated with the distance to the minimum distance to the mesh surface. This will prove very handy when we going to create a signed distance field. Because the voxels contains subvoxel distance values.

float fMinDist = numeric_limits::max();
for(uint v = 0; v < kNodeTriangleIndices.size(); v++)
{
  RFVector3f kTriangleVerts[3];
  uint uiTriIndex  = kNodeTriangleIndices[v];
  kTriangleVerts[0] = kMeshVertices[uiTriIndex];
  kTriangleVerts[1] = kMeshVertices[uiTriIndex+1]; 
  kTriangleVerts[2] = kMeshVertices[uiTriIndex+2];
  RFVector3f kNormal = VectorCross(kTriangleVerts[2]-kTriangleVerts[0],kTriangleVerts[1]-kTriangleVerts[0]);
  RFVector3f kClosestPoint = ClosestPointOnTriangle(kTriangleVerts,pkLeafNode->m_kWorldAabb.GetCenter());
  RFVector3f kDelta  = kClosestPoint - pkLeafNode->m_kWorldAabb.GetCenter();
  float fSign = (VectorDot(kDelta,kNormal) < 0.0f)? -1.0f : 1.0f;
  float fDist = VectorLength(kDelta);
  if(fDist < abs(fMinDist)) fMinDist = fDist*fSign;
}
pkLeafNode->m_fDistanceToSurface = fMinDist;

Above is the code that finds the min dist between intersected triangles and the LEAF bounding box.

So to explain what is going on is that the function ClosestPointOnTriangle(Triangle, pos)[3], returns the closest point inside the triangle. And using this information we create the Delta vector where using a dot product can determine on which side of the triangle the LEAF’s center is. Again stuff needed for the signed distance fun..

This concludes the subdivision, not so hard right?

But to make a solid voxel model we now need to know i the if the EMPTY_LEAF’s are inside or outside the mesh. My solution is as usual to use ray tracing

So basically for each EMPTY_LEAF we shoot a bunch of rays in all directions and determine a voxel coverage value, which tell you how much an EMPTY_LEAF voxel is occlude. And given a use specified Coverage value, we flag a EMPTY_LEAF as inside if the calculated occlusion is above the input coverage value. This is quite clever because the EMPTY_LEAF can consists of many voxels, so we can decide for all of them in one go.

So what remains to be described is the ray tracing of an Octree, which fortunately is extremely simple to do.

bool RayOctreeInterSection(const RFRay3f &kRay, const RFVoxelOctreeNode *pkNode)
{
   if(pkNode == nullptr) return false;
   if(pkNode->m_eNodeType == NT_LEAF)
   {
     return RayBoxIntersection(kRay, pkNode->m_kWorldAabb);
   }
   if(pkNode->m_eNodeType == NT_INTERNAL)
   {
      if(RayBoxIntersection(kRay,pkNode->m_kWorldAabb))
      {
         for(uint i = 0; i < 8 ; i++)
         {
           if(RayOctreeInterSection(kRay,pkNode->m_pkChildren[i]))
              return true;
         }
      }
      else
      {
        return false;
      }
     }
return false;
}

Hey that is it! just some recursive traversal and a bunch of ray box intersections and then we know if something has been hit.

To conclude the Coverage test, we use a Hammersley uniform spherical sample generator [6] . In the example code 200 rays are shot for each EMPTY_LEAF node to determine coverage.

uint uiCoverage  = 0;
uint uiMisses    = 0;
for(uint j = 0; j < m_kSampleDirections.size(); j++)
{
  RFRay3f kRay(pkNode->m_kWorldAabb.GetCenter(),m_kSampleDirections[j]);
  if(RayOctreeInterSection(kRay,pkOctreeRoot))
  {
    uiCoverage++;
  }
   else
  {
    uiMisses++;
  }
}

float fCoverage = static_cast(uiCoverage)/static_cast(m_kSampleDirections.size());
if(fCoverage >= fMinSolidCoverage)
{
   pkNode->m_fDistanceToSurface = -1.0f; // Inside == Negative
}
else
{
    pkNode->m_fDistanceToSurface = 1.0f; // Outside == Positive
}

Below is the result of an Armadillo voxelization. The green voxels are the leafs and the red and yellow are the empty internals. Where the red are large than voxelsize.

The remaining step is to subdivide the red voxels into smaller and out put the complete voxel data in your favorite format.

That is it for now, below is link to a zipped folder containing the necessary code, please feel free to comment on the article and the code.

Roger over for now Peter!

Click here to download the zipped code.

(Download link updated 12/10 2019) Please contact me if it does not work!

To build the code use cmake. There are no dependencies beside our small RFMath lib which is included. Otherwise it is plain c++ code. I decided not to do a visualization which would blur the code. So this is left to the reader

The demo code loads the armadillo and creates a voxelization based on the above mentioned parameters. It outputs an ply file with the voxelized mesh as colored vertices. But you could of course easily change this to output to some voxel format of your choice. But i have tried to make it really simple, and to view the result you can download the cool and very handy http://meshlab.sourceforge.net/

The sparse voxelization inside meshlab, the black to white are the leaf nodes with the distance values. And blue are voxel sized Empty leafs. The green are the sparse big Empty leaf voxels.

References:

http://www.realtimerendering.com/intersections.html (Mother of all intersection collections)
http://cs.lth.se/tomas_akenine-moller (Father of a lot of intersection test )
https://en.wikipedia.org/wiki/Octree
http://www.gamedev.net/topic/552906-closest-point-on-triangle/ (Link to the closest point code origin)
http://meshlab.sourceforge.net/
http://holger.dammertz.org/stuff/notes_HammersleyOnHemisphere.html

Boat-tracking at “Sejerøbugten”

Lee Lassen — Fri, 13 Feb 2015 12:55:20 +0000

As part of our collaboration with the Institute of Bioscience. We have been asked to create a small program that is able to track and register the amount of boats passing through part of Sejerøbugten. There is a radar placed on Nekselø which generates pictures like this every 3 minutes.

As you might notice there is a lot of noise in the image so a basic mask is created to subtract the ground. To help identify moving objects a trail is added between signals (If you look at the center of the radar and to the left, you should see a red dot, with a blue “tail” – this is a boat). By filtering out the noise and masking the image correctly, in this case using OpenCV – we can track overlaps of these “tails” and in that way show the path that the boat traveled.

While we wait – Approaching Zero Driver Overhead

Jesper Mosegaard — Sun, 11 Jan 2015 20:07:47 +0000

As I (red: Jesper Børlum previous employee), was looking through the presentations from Siggraph Asia 2014, one presentation in particular caught my eye. Tristan Lorachs presentation on Nvidias upcoming manual Command-List OpenGL extension. With all the focus on reducing the CPU-side driver overhead in the current graphics APIs this last year, and the upcoming new rendering APIs (AMDs Mantle, Microsoft DirectX 12, Apple Metal), I decided to make an overview of the current recommendations for scene rendering using core OpenGL and take a poke at Nvidias new extension. This first article is going to look at the core OpenGL recommendations, and the next article is going to be on Nvidias new extension. I am writing this article because I wanted to get a better grasp on the implementation details in the excellent GTC / Siggraph performance presentations found here:
http://on-demand.gputechconf.com/gtc/2013/presentations/S3032-Advanced-Scenegraph-Rendering-Pipeline.pdf
http://on-demand.gputechconf.com/gtc/2014/presentations/S4379-opengl-44-scene-rendering-techniques.pdf
http://www.slideshare.net/tlorach/opengl-nvidia-commandlistapproaching-zerodriveroverhead

For performance results and shader code please refer to the Nvidia presentations.

Disclaimer – This post is a simplification of a complex topic. If you feel I have left out important details, please add them to the comments at the end or write me.

Modern GPUs are absolute beasts. It never ceases to amaze me how much raw processing power they can handle. Even standard gaming hardware. However, scenes requirements are getting increasingly complex. They contain more geometry, more different types of materials used, and new and complex render effects. The GPU driver often ends up being a serious performance bottleneck handling this complexity. This means that no matter how much GPU power you throw at the rendering the overall performance is not going to increase.
A lot of stuff eats up CPU performance. Scenegraph traversal, animation, renderlist generation, sorting by state and all driver interactions etc.
Current driver performance culprits are:

Frequent GPU state changes (shader, parameters, textures, framebuffer etc.).
Draw commands.
Geometry stream changes.
Data transfers (uploads / read-backs).

All of these boils down to the driver eating up your precious CPU clockcycles.
Using the techniques below most of this CPU driver overhead can be reduces to almost zero. In the following sections, I will be looking at several methods for reducing the overhead. Most achieve this simply by calling the driver less. Seams simple enough, but handling material changes, texture changes, buffer changes, state changes between the draw calls can get tricky. Also, note that most of these methods require a newer version of OpenGL. Some of the functions only just made it into the core specifications (OpenGL 4.4 / 4.5).

A scene, in the context of this post, is a collection of objects, each consisting of sub-objects. A sub-object is a material and a draw command. Objects are logical collections of sub-objects each with their own world transform matrix. A material is a collection consisting of a shader program, parameters for the shader program and an OpenGL render state collection.

I have provided two naïve approaches to scene rendering and uploading of shader parameters – The two areas we will be focusing on.

Naïve scene rendering
This will act as the baseline for performance, and is what each improvement will try to improve on.

foreach(object in scene)
{
    foreach(subobject in object)
    {
        // Attaches the vertex and index buffers to the pipeline.
        SetupGeometry(subobject.geometry);

        // Updates active shaders if changed.
        // Uploads the material parameters.
        // Uploads the world transform parameter.
        SetMaterialIfChanged(subobject.material, object.transform);

        // Dispatch the draw call.
        Draw();
    }
}

This method imposes a large number of driver interactions:

Geometry streams are changed per sub-object.
Shaders are changed per sub-object, if different from current.
Shader parameters are uploaded per draw.
A draw call per sub-object.

Naïve parameter update
Uploading parameters, also known as uniform parameters, to shaders can impose a significant number of driver calls – especially if uploaded “the old fashioned way” where each parameter upload is a separate call to glUniform. This will act as the baseline for performance, and is what each improvement will try to improve on.

foreach(object in scene)
{
    ...
    foreach(batch in object.materialBatches)
    {
        if (batch.material != currentMaterial)
        {
            // Apply the active program to the pipeline.
            glUseProgram(batch.material.program);

            // Uniforms are program object state which needs to be updates for each program!
            glUniform(transformLoc, object.tranform);
            glUniform(diffuseColorLoc, batch.material.diffuseColor);
            glUniform(...);
            ...
        }

        // Dispatch draw.
    }
}

This technique has several weaknesses. Its many separate driver calls, which the driver cannot predict. To make it even worse, we need to re-upload all the parameters each time we change the shader program. Shader program objects contain the parameter values – Not the general OpenGL state. In the past, I have solved this by maintaining CPU-side parameter state cache per shader program. The proxy is then responsible for re-uploading if the uniform becomes dirty. This is a workable solution if you cannot use buffer objects, which trivializes the sharing of parameter data across shader programs as seen later in this post.

Improvement 1 – Single buffer per object
The obvious improvement to the naïve scene rendering is to move the buffers from the sub-objects into a collection of collapsed buffers in the containing object. This will allow us to move the buffer bind call from the inner loop to the outer loop. This will dramatically lower the number of geometry driver calls in a scene were each object contains many sub-objects. Each sub-object will now need to know the correct stream offset into the collapsed buffers to be able to draw correctly. When loading geometry you will need to collapse all sub-object buffers and offset the vertex indices to reflect the new position in the collapsed buffer.

foreach(object in scene)
{
    // Attaches the vertex and index buffers to the pipeline.
    SetupGeometry(object.geometry);

    foreach(subobject in object)
    {
        // Updates active shaders if changed.
        // Uploads the material parameters.
        // Uploads the world transform parameter.
        SetMaterialIfChanged(subobject.material, object.transform);

        // Dispatch the draw call.
        Draw();
    }
}

Improvement 2 – Sort sub-objects by material
Sorting by complete materials (same shaders, render state and material parameters – for now) achieves two things. We can now draw several sub-objects at a time and avoid costly shader changes.
The main difference to the render loop is that instead of looping over each sub-object, we now loop over a material batch. A material batch contains the material information, along with information about which parts of the geometry is to be rendered using that material setup.
During geometry load, you will need to sort by materials so that each batch contains enough information to render all sub-objects it contains.
You can opt to rearrange the vertex buffer data so that the draw command ranges can be “grown” to draw several sub-objects in a single command.
When drawing you can choose between two different ways:

Using a loop over each of the sub-object buffer ranges in the batch drawing each with glDrawElements.
Submitting all draw calls in one call using the slightly improved glMultiDrawElements.

The second multi draw approach will execute the loop for you inside the driver – hence only a slight improvement.

foreach(object in scene)
{
    // Attaches the vertex and index buffers to the pipeline.
    SetupGeometry(object.geometry);

    foreach(batch in object.materialBatches)
    {
        // Updates active shaders if changed.
        // Uploads the material parameters.
        // Uploads the world transform parameter.
        SetMaterialIfChanged(batch.material, object.transform);

        // Dispatch the draw call.
        foreach(range in batch.ranges)
            glDrawElements(GL_TRIANGLES, range.count, GL_UNSIGNED_INT, range.offset);
    }
}

Improvement 3 – Buffers for uniforms
Instead of uploading each uniform separately as shown in the naïve parameter update, OpenGL allows you to store uniform in objects. So called Uniform Buffer Objects (UBO). Instead of having a glUniform call per object, you can upload a chunk of uniforms using a buffer upload like glBufferData or glBufferSubData. It is important to group uniforms according to frequency of change, when uploading data into buffers. A practical grouping of uniforms could look something like the following:

Scene globals – camera etc.
Active lights.
Material parameters.
Object specifics – transform etc.

Grouping parameters allows you to leave infrequently changed data on the GPU, while the only the dynamic data is re-uploaded. A key UBO feature is that they allow parameter sharing across shader programs unlike glUniform. I am not going to write a full usage guide on UBOs – one can be found here.
There are different ways to use Uniform Buffer Objects. They recommended way changes according to if the data you are using is fairly static or dynamic. Below are examples of both. Note – You can mix the methods as best fit your use case.

Static buffer data:
If the data changes infrequently, upload parameters for all the sub-objects in one go into a large UBO. Then target the correct parameters by using the glBindBufferRange calls as shown below:

#define UBO_GLOBAL_SLOT 0
#define UBO_TRANS_SLOT 1
#define UBO_MAT_SLOT 2

// Update combined uniform buffers for all objects.
UpdateUniformBuffers();

// Bind global uniform buffers.
glBindBufferBase(GL_UNIFORM_BUFFER, UBO_GLOBAL_SLOT, uboGlobal);

foreach(object in scene)
{
    ...
    // Bind object uniform buffer.
    glBindBufferRange(GL_UNIFORM_BUFFER, UBO_TRANS_SLOT, uboTransforms, object.transformOffset, matrixSize);

    foreach(batch in object.materialBatches)
    {
        // Bind material uniform buffer.
        glBindBufferRange(GL_UNIFORM_BUFFER, UBO_MAT_SLOT, uboMaterials, batch.materialOffset, mtlSize);

        if (batch.material.program != currentProgram)
        {
            // Apply the active program to the pipeline.
            glUseProgram(batch.material.program);
        }

        // Draw.
    }
}

Dynamic buffer data:
If data change frequently, upload parameters into a small UBO for each material batch. The example below takes advantage of the new direct state methods (DSA) introduced in OpenGL 4.5. The below shows how such a render loop could look.

#define UBO_GLOBAL_SLOT 0
#define UBO_TRANS_SLOT 1
#define UBO_MAT_SLOT 2

// Bind buffers to their respective slots.
glBindBufferBase(GL_UNIFORM_BUFFER, UBO_GLOBAL_SLOT, uboGlobal);
glBindBufferBase(GL_UNIFORM_BUFFER, UBO_TRANS_SLOT, uboTransforms);
glBindBufferBase(GL_UNIFORM_BUFFER, UBO_MAT_SLOT, uboMaterials);

foreach(object in scene)
{
    ...
    // Upload object transform.
    glNamedBufferSubData(uboTransforms, 0, matrixSize, object.transform);

    foreach(batch in object.materialBatches)
    {
        // Upload batch material.
        glNamedBufferSubData(uboMaterials, 0, mtlSize, batch.material);

        if (batch.material.program != currentProgram)
        {
            // Apply the active program to the pipeline.
            glUseProgram(batch.material.program);
        }

        // Draw.
    }
}

Note – Upload of scattered data changes to static buffer using compute + SSBO
Nvidia mentioned a cute way to scatter data into a buffer. Normally you need to upload using a series of smaller glBufferSubData calls if the changes are non-continuous in memory. Alternatively, you could re-upload the entire buffer from scratch. This could potentially degrade performance significantly. They suggests placing all changes in a SSBO and perform the scatter-write using a compute shader. A shader storage buffer object (SSBO) is just a user-defined OpenGL buffer object that can be read/written using compute shaders. I have yet to try this technique out so I cannot comment if the performance makes it feasible. I really like the idea though.

Improvement 4 – Shader-based material / transform lookup
Improvement 3 introduces the notion of using UBOs to improve the uniform communication performance. Unfortunately, there are still many glBindBufferRange operations. It is possible to remove those binds by binding the entire buffer and then have the shader index the information. Communication of the index is done through a generic vertex attributes as shown below.

#define UBO_GLOBAL_SLOT 0
#define UBO_TRANS_SLOT 1
#define UBO_MAT_SLOT 2

// Update combined uniform buffers for all objects.
UpdateUniformBuffers();

// Bind buffers to their respective slots.
glBindBufferBase(GL_UNIFORM_BUFFER, UBO_GLOBAL_SLOT, uboGlobal);
glBindBufferBase(GL_UNIFORM_BUFFER, UBO_TRANS_SLOT, uboTransforms);
glBindBufferBase(GL_UNIFORM_BUFFER, UBO_MAT_SLOT, uboMaterials);

foreach(object in scene)
{
    ...

    foreach(batch in object.materialBatches)
    {
        if (batch.material.program != currentProgram)
        {
            // Apply the active program to the pipeline.
            glUseProgram(batch.material.program);
        }

        // Set buffer indices - shader program specific location!
        glVertexAttribI2i(indexAttribLoc, object.transformLoc, batch.materialLoc);

        // Draw.
    }
}

You use a generic vertex attribute as any other vertex attribute from inside the shader.

Improvement 5 – Bindless resources
Changing texture state have up to recently been a major headache when it came to batching efficiently. Sure, it is possible to store several textures inside an array texture and then index into the different layers, but there are several limitations and it is generally a pain to work with. OpenGL requires the application to bind textures to the texture slots prior to dispatching the draw calls. Textures are merely CPU-side handles as all other OpenGL object, but the new extension ARB_bindless_texture allows the application to retrieve a unique 64-bit GPU handle that the shader can use to lookup texture data without binding first. It is possible to store these new GPU handles in uniform buffers, unlike the CPU-side handles. GPU handles can be set like any other uniform using glUniformHandleui64, but it is strongly recommended to use UBOs (or similar – see Improvement 3). It is the applications responsibility to make sure textures are resident before dispatching the draw call. More information regarding this can be found in the extension spec here.
Nvidia has an extension that allows bindless buffers as well – More information can be found here. This is something we will have a look at when looking at the new Nvidia commandlist extension in the next article.

Improvement 6 – The indirect draw commands
A new addition to the numerous ways to draw in OpenGL is the indirect draw commands. Rather than submitting each draw call from the CPU, it is now possible to store all the draw information inside a buffer, which the GPU then loops through when drawing. The buffer contains an array of predefined structures, which in the case of glMultiDrawElementsIndirect looks like this:

typedef struct
{
    uint count;
    uint instanceCount;
    uint firstIndex;
    uint baseVertex;
    uint baseInstance;
} DrawElementsIndirectCommand;

Using an indirect draw command works much like the glMultiDrawElements described in Improvement 2 works. An added benefit is that you can create your GPU worklist directly on the GPU. You can e.g. use this to cull your scene from a compute shader rather than use the CPU.

There is a special bind target for indirect buffers called GL_DRAW_INDIRECT_BUFFER. The driver uses bound buffer to read the draw data. It is illegal to submit an indirect draw call using client memory.
Using indirect draw you will not need a separate draw command for each sub-object in a material batch as described in Improvement 2. To draw efficiently you will only have to create a buffer filled with the structs that describe the ranges of the objects you wish to draw using the active shader. This can be a huge draw command improvement. I have yet to test if you get an improved performance by growing the draw ranges by physically rearranging the vertex buffers.
Which material parameters and matrix to use when drawing each of the sub-objects can be handled much like in Improvement 4. Through a matrix / material array index. However, the method is a bit different as we are no longer able to set a generic vertex between each drawn sub-object. The indirect struct contains a lot of information, not all of which we need to use. The baseInstance member for example. By using this, we can communicate both the material and matrix index, so the shader program can get the data it needs. How you choose to split the bits all comes down to how much you need to draw.

#define UBO_GLOBAL_SLOT 0
#define UBO_TRANS_SLOT 1
#define UBO_MAT_SLOT 2

// Update combined uniform buffers for all objects.
UpdateUniformBuffers();

// Bind buffers to their respective slots.
glBindBufferBase(GL_UNIFORM_BUFFER, UBO_GLOBAL_SLOT, uboGlobal);
glBindBufferBase(GL_UNIFORM_BUFFER, UBO_TRANS_SLOT, uboTransforms);
glBindBufferBase(GL_UNIFORM_BUFFER, UBO_MAT_SLOT, uboMaterials);

// Bind indirect buffer for entire scene.
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, scene.indirectBuffer);

foreach(object in scene)
{
    ...
    
    foreach(batch in object.materialBatches)
    {
        if (batch.material.program != currentProgram)
        {
            // Apply the active program to the pipeline.
            glUseProgram(batch.material.program);
        }

        // Draw batch.
        glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, object->indirectOffset, object->numIndirects, 0);
    }
}

Unfortunately, it is not yet possible to change state (renderstate and shaders) using the indirect draw commands. This is something I am going to look at in the next article on the Nvidia CommandList extension.

This post turned out to be bigger than I had first anticipated, but efficient drawing is tricky. If you made it this far – Good for you! I hope to get time to write the follow up article as soon as real life allows me.

Classifying birds from above

Jens Rimestad — Thu, 04 Dec 2014 12:46:55 +0000

We are about to start the second stage of a collaboration with Institute of Bioscience, aiming at classifying flying and sitting birds in aerial survey images.

In the first stage we focused on prototyping segmentation and classification methods and in the second part we will improve and help integrate the classification into their existing pipeline.

Papers published in the 2014 IEEE International Ultrasonics Symposium proceedings

Thomas Kim Kjeldsen — Thu, 27 Nov 2014 08:08:10 +0000

In our Advanced Technology Foundation project “FutureSonic”, we recently presented two papers at the 2014 IEEE International Ultrasonics Symposium together with our research partners at the Technical University of Denmark.

The first paper [1] presents how ultrasound images can be computed efficiently on GPUs and on multicore CPUs that support Single Instruction Multiple Data (SIMD) extensions. We were able to accelerate a reference implementation in C from around 700 ms/frame to 5.4 ms/frame using the same multicore CPU. The speedup was achieved primarily by optimizing the memory access patterns and by utilizing AVX instructions. On a high-end GPU the fastest computation time was less than 0.5 ms/frame.

The results obtained above were utilized in the second paper [2] where the GPU implementation was ported to mobile devices. We showed that modern mobile GPUs provide enough computing power to produce ultrasound images in real-time. Furthermore, we showed that the WiFi throughput is sufficient for real-time reception of raw data from a wireless ultrasound trandsucer.

References
[1] Synthetic Aperture Sequential Beamforming implemented on multi-core platforms
IEEE International Ultrasonics Symposium (IUS), p2181 – 2184 (2014)

[2] Implementation of synthetic aperture imaging on a hand-held device
IEEE International Ultrasonics Symposium (IUS), p2177 – 2180 (2014)

Communicating through visual effects

Jesper Mosegaard — Wed, 26 Nov 2014 07:55:39 +0000

Communication is hard – Especially communicating an artistic vision for visual effects. The Shareplay foundation sponsored a project investigating new software-based approaches to aiding artists in such communication – and this post explains our findings.

Together with our project partners Ja Film, Sunday Studio, Redeye Film and WilFilm, we have come up with ways to improve the communication of animated volumetric effects like smoke, fumes, explosions, fluids, etc. The production of shots for film, commercials and TV goes though several phases – each of these increase the visual quality while the artistic choices get locked down. Very early, the camera movement, animation timing and effect timing are locked down. This typically involves very crude assets which are then later on replaced by more detailed assets. This is fairly straightforward for normal “surface assets” like a typical character and scene prop. It is much more difficult to do this when it involves simulation-based volumetric effects like smoke, fumes, explosions or fire. These type of effects are very expensive to produce due to computationally intensive simulations and long rendering times – as well as the high level of artist experience required to make them look good. Because of this, the early pre-visualization often looks very crude – and poorly communicates the artistic visions of the creative artist. Take for example an explosion – to save time, the general timing of the explosion is often done by a rapidly expanding phong-shaded sphere. This is great for communicating the animation timing, but does not apply to the visual aesthetics of the explosion (rolling smoke, balls of fire, pressure wave, contrast between fire and smoke etc).

We have identified two themes for our experiments:

Pushing the visual decisions much further down the shot production pipeline.
Procedural animation of volumetric effects.

We go into each of these themes under the next two headlines

Pushing the visual decisions much further down the shot production pipeline.

Last year, we did a project on how fast procedural volumetric effects could empower the artist and re-envision the way artists work with volumetric effects. This project was also funded by the Shareplay Foundation with participation from our Computer Graphics Lab and Sunday Studio. More information on this project here. In the current project on visual communication we chose to build on the experiments and knowledge from that project.

Volumetric effects generated directly within Adobe AfterEffects

One of the major issues when it comes to using volumetric effects in production is the workflow. The effects are generally made in isolation, pre-rendered and then later integrated into the final shot. We wanted to keep the flexibility all the way into the composting programs, without locking down the special effect. Our post processing program of choice was Adobe AfterEffects. We realized a prototype implementation in AfterEffects, and found that it makes perfect sense to be able to make procedural volumetric effects as late as post-processing. A perfect example is a cloud backdrop, that would usually be composited from photographs of clouds – whereas our approach would allow for custom generation of this basic content within the compositing program itself. Some challenges also arose that could be a source of future research. First of all, it turns out to be very difficult to impose a 3D workflow to a predominantly 2D application. All our previous experiments had been done in Autodesk Maya where all tools are meant to work in 3D – and this was naturally not the case for AfterEffects. This difference is especially obvious in the handling of camera and construction of base geometry for the special effect. At a later point we would like to investigate efficient user interfaces to handle the 3D navigation in a post-processing program – and to efficiently be able to generate base geometry within AfterEffects.

Procedural animation of volumetric effects.

Special effects such as smoke or fluids often evolve over time; splashing water, drifting clouds or expanding explosions – and timing plays a major role in the communication. In a traditional workflow the timing is decided in the input parameterization of the underlying simulations. Experimenting with those parameters is a very time consuming process and marred by trial-and-error. We wanted to do a procedural animation system for quick-and-dirty volumetric animations without any wait. Specifically we wished a system where the artist is able to;

Key-frame the shape of effects with exact timing.
Control the shape of the effect through geometry
Change every key frame at any time without waiting for a simulation.
Create the rolling motions of smoke.

All of these requirements are designed to make it easy for the artists to get the results they intend – quickly.

We needed to find a way to calculate the intermediate frames in-between the artist defined key frames. Our first approach was to do a full flow based registration between shapes. This had to be done each time the geometries were updated – but only once for each pair of key frames. This unfortunately didn’t turn out as good as we would have wished it to. The calculations needed to derive a good quality registration were too time consuming. The vector field between two shapes can be seen below, using the Demons framework for flow registration.

Instead, we came up with the idea to convert the geometry from each keyframe into a signed distance field and do a standard linear interpolation (or morphing) between those distance fields. In order to have control of the behavior, we defined two positions in each keyframe – an “entry” and an “exit”. The entry and exit of two consecutive keyframes would overlap in the interpolation itself – but moved by the true offset in world space. The results of that interpolation scheme for volumetric effects can be seen in the below animation.

Next we approached the problem of rendering believable animated rolling smoke, without any underlying simulation. The basic idea is to distribute points inside a volume. Then for each point splat a volumetric particle into a common field. Instead of using fields, we decided to do this implicitly using a variant of world-space Worley noise to distribute the points. For each ray step we evaluate the distance to each point and use that to evaluate the particle splat data. The rolling motion of the particles is done by rotating the particle noise splat lookup according to a vector field from the interpolation routine – in our test case below it was simply rotating away from the middle axis. This gives a believable rolling motion. It is also crucial to note that the particles are stationary in world space and the notion of movement is done by moving the particle noise field in the opposite direction of the motion. The effect can be seen on the follow video;

This scheme does indeed give a believable smoke motion which allows the artists to control the animation precisely as intended. Unfortunately, it doesn’t perform as well as we had hoped. The implicit evalulation of the closest particles along with the evaluation of particle noise made it very slow. Future work would be pre-calculating the particle noise to alleviate slow runtime performance.

Conclusion

The project ended in November 2014 and has pointed out some important trends in the search for quality improvements with smaller budgets within special effects. Through experiments and prototypes we have found that current commercial software are viable as frameworks for special effects design late in the production pipeline – and that there is reason to believe that we can replace at least part of the simulation based tools with pure procedural tools for animating visual effects. Furthermore we have established a good model for project collaboration between creative animation companies and R&D institutes such as the Alexandra Institute.

Xcelgo case – Custom real-time rendering optimization

Jesper Mosegaard — Tue, 25 Nov 2014 14:24:51 +0000

In this project we helped Xcelgo with a brand new custom DirectX 11 renderer as a replacement of their existing fixed function DirectX 9.0 renderer.

Xcelgo provides virtual automation software for 3D modeling along the cycle of automated material handling systems – Like airport baggage handling or larger warehouse storage systems. The purpose of their product is to eliminate the risks involved in building these large and very expensive systems, by allowing simulation and modelling of the system up front.

Experior, the 3D modelling system by Xcelgo, is built around a fixed function DirectX 9.0 pipeline programmed in C# though wrapper code. DirectX 9.0 is characterized by a lack of scalability because of the driver overhead imposed by the dated rendering paradigm. The 3D simulation is built from a large number of user generated primitives which are able to freely move around the scene. Each of these are being rendered individually which causes the GPU and CPU to lockstep.

The fixed function rendering pipeline supports only very limited lighting techniques, hence limiting the visual appeal of their presented scenes. And even in engineering type visualizations, the visual quality gets attention and opens for expanding the customer base.

Xcelgo wanted to prepare for future scenarios with larger models and a more easily maintained rendering framework – and decided to update the rendering pipeline to a modern shader-based DirectX 11 pipeline. In close collaboration we have designed and implemented a completely new rendering pipeline.

The new pipeline supports a lot of features which will help Xcelgo further push the limits of virtual automation:

* DirectX 11 rendering pipeline written from the bottom up based on Xcelgos domain knowledge about their customers wishes.
* Intelligent optimization of scene rendering to avoid expert rendering knowledge when designing the scene geometry.
* Threaded rendering freeing the rest of the workstation to do simulation.
* Massive increase in number of dynamic objects that the system can handle. Hundreds primitives -> Tens of thousands skinned and textured models.
* Support for instanced rendering of skinned robots.
* Support for fully detailed CAD line renderings in full resolution to better guide modelling engineers when building systems.
* Modern cascaded shadow mapping solution which fully envelops the scene in crisp shadows.
* Rasterization-based pixel perfect picking of objects in the scene vastly improving runtime performance when selecting objects.
* Modern surface shading much improving the visual aesthetics of the scene.

The project is now completed and Xcelgo is hard at work finishing the integration of the new rendering which should be complete in time for Experior 6.0.

Denmark’s new elevation model visualized in WebGL

Thomas Kim Kjeldsen — Tue, 18 Nov 2014 12:03:27 +0000

Recently, the Danish Geodata Agency released a new high-resolution LiDAR pointcloud dataset of parts of Denmark.
We have developed a real-time terrain visualization that runs entirely in a web-browser using WebGL. The terrain model was generated from the pointcloud (17.6 billion points) to a raster map with 40 cm lateral and longitudinal resolution and 1 cm height resolution. In the final visualization, we have added overlays from satellite photos and from OpenStreetMap as shown in the screenshots.

A demo video is available here.

Unfortunately, the interaction is not perfectly smooth at the moment when the camera is moved around and the map updates. We expect to address this issue in the near future.