embree: photo-realistic ray tracing kernelsembree: photo-realistic ray tracing kernels manfred ernst...

Intel Labs

www.intel.com/software/siggraph

Embree: Photo-Realistic Ray Tracing Kernels

Manfred Ernst Intel Labs

Intel Corporation August 10, 2011

Intel Labs

Monte Carlo Ray Tracing

Light

Pixel

2

INTEL CONFIDENTIAL 3

INTEL CONFIDENTIAL 4

Model courtesy of Martin Lubich www.loramel.net

Intel Labs

Progressive Monte Carlo Ray Tracing

• Computes preview images at interactive frame rates

• Progressively refines the quality until convergence

1000 x 1200 pixel, rendered on four Intel® Xeon® Processor E7-4860 Model courtesy of Martin Lubich, www.loramel.net

72 milliseconds 1 second 1 minute

5

Intel Labs

Two Kinds of Ray Distributions

6

Incoherent Rays (typical for Monte Carlo)

Coherent Rays

Intel Labs

Implementing a Fast Ray Tracer is Difficult

• Requires deep knowledge about hardware architecture

• Parallelization is easy; efficient vectorization is hard

• Many developers do not want to make this effort

• We decided to do that for you!

7

Intel Labs

What is Embree?

• A collection of high-performance ray tracing kernels, designed for Monte Carlo ray tracing on the latest Intel® CPUs

• An example photo-realistic rendering engine

• Embree is not a complete rendering solution for end users

Professional Graphics Application CAD, DCC, visualization, movie production, …

Rendering Engine Distributed ray tracing, path tracing, photon mapping, …

Ray Tracing Kernel Fast acceleration structure build and traversal

Embree

8

Intel Labs

Multiple Usage Scenarios

Usage Scenarios

• Integrate Embree ray tracing kernels into existing renderer

• Improve existing renderer with concepts and ideas from Embree

• Use Embree as a benchmark

• Use Embree as a starting point to implement a new renderer

• Jump start rendering research projects

Licensing

• Published as open source (Apache license) on the ISN web site http://software.intel.com/en-us/articles/embree-photo-realistic-ray-tracing-kernels/

9

Intel Labs

Architecture of a Monte Carlo Ray Tracer

Integrator

Renderer

Material Light Acceleration

Structure

Camera

Sampler

Image

10

Intel Labs

BVH Acceleration Structure

11

Intel Labs


12

Intel Labs


13

Intel Labs


14

Intel Labs


15

Intel Labs

Solution Space for Vectorized Ray Tracing

Single Ray SIMD

Traversal

Scalar Traversal

Packet Traversal

Independent Ray Traversal

Multi Ray

Single Ray

Single Box Multi Box

16

Intel Labs

We have tried many algorithms …

• Binary BVH with scalar traversal

• Binary BVH with 4-wide and 8-wide packet traversal

• Binary BVH traversing 4 independent rays


• Binary BVH with single ray 4-wide SIMD traversal

• 4-wide BVH with single ray 4-wide SIMD traversal


• 4-wide BVH traversing two independent rays

• 4-wide BVH with stream traversal


• Kd-tree with scalar traversal

17

Intel Labs

... and put the fastest kernels into Embree

• Binary BVH with scalar traversal

• Binary BVH with 4-wide and 8-wide packet traversal



• Binary BVH with single ray 4-wide SIMD traversal



• 4-wide BVH traversing two independent rays

• 4-wide BVH with stream traversal


• Kd-tree with scalar traversal

18

Intel Labs

Acceleration Structure Builders

Object Split Builder

• Top-down builder with SAH binning

• Three stage parallelization

Spatial Split Builder

• Tests spatial splits in the center of each dimension

• Build is about 5x slower, but render performance can be 2x better

19

Intel Labs

BVH2 Memory Layout

BVH2 Layout Traditional BVH Layout

• Store pairs of boxes

• For each dimension: store min and max values of both boxes next to each other

20

Intel Labs

BVH2 Traversal

For each dimension:

• “Sort” planes along ray direction with PSHUFB (1 cycle)

• Compute intersection with near and far plane of 2 boxes in SIMD

• Clip near and far parameter values using min(a,b) = -max(-a,-b)

nearR

farR

farL

nearL

21

Intel Labs

BVH4 Memory Layout

BVH4 Layout Traditional BVH Layout

22

Intel Labs

BVH4 Traversal

For each dimension:

• Intersect ray with near plane of each box in SIMD

• Intersect ray with far plane of each box in SIMD

• Clip the near and the far parameters

near4

near1

near2

near3

23

Intel Labs

BVH4 Traversal Optimizations

Observations

• Probability of hitting N children is very non-uniform:

• Probability of hitting a specific child is very uniform

Optimization

• Use bit count and bit scan to determine N and the hit children

• Makes branches easier to predict

• Specialized implementation for all values of N

0 Hits 1 Hit 2 Hits 3 Hits 4 Hits

20% 50% 20% 8% 2%

24

Intel Labs

Performance Analysis 2-bounce path tracing in a triangulated sphere (1 thread)

cycle

s p

er

ray

triangles

25

Intel Labs

Performance Comparisons

26

Intel Labs

Comparing Different Architectures

• Do your scenes fit into memory?

• How does the architecture perform with a full-featured renderer and real world data sets?

• How easy is it to develop large scale software, not just a kernel?

• Power matters: Rays per Joule!

27

Intel Labs

Further Information

Download Embree http://software.intel.com/en-us/articles/embree-photo-realistic-ray-tracing-kernels/

Support

[email protected]

Contact

[email protected]

28

Intel Labs 29

Intel Sessions – Wednesday, August 10

2:00-3:00pm Increase your FPS with CPU Onload

3:15-4:15pm Optimization Strategies for Intel HD Graphics

4:30-5:30pm Visual Computing Performance Optimization:

Tools and Strategies

Please turn in your evaluation forms

Intel Labs

Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS

AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.

Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.

Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.

Intel may make changes to specifications, product descriptions, and plans at any time, without notice.

The Intel processor and/or chipset products referenced in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

All dates provided are subject to change without notice. All dates specified are target dates, are provided for planning purposes only and are subject to change.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

* Other names and brands may be claimed as the property of others.

Copyright © 2010, Intel Corporation. All rights reserved.

Intel Labs

Optimization Notice Optimization Notice

Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel® Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.

Notice revision #20101101

Intel Labs

embree: photo-realistic ray tracing kernelsembree: photo-realistic ray tracing kernels manfred ernst...

Documents