Benchmarks: NVIDIA Kepler vs. Fermi

The Xcelerit platform ensures future proof applications – that is, applications implemented using the Xcelerit platform scale to new processor architectures and devices without the need to change any source code – in fact, even without recompilation. Now, with the exiting release of NVIDIA’s new flagship GPU, the Tesla K20 (Kepler architecture), we put this to the test. This post compares the performance achieved with the Xcelerit platform on the Tesla M2050 (Fermi architecture) to the Tesla K20 (Kepler architecture). As an example application, we use the Monte-Carlo LIBOR swaption portfolio pricer algorithm, a real-world computational finance algorithm that we’ve already used in other benchmarks blog posts.

NVIDIA Tesla K20 GPU Accelerator

NVIDIA Tesla K20 GPU Accelerator (Kepler Architecture)

Monte-Carlo LIBOR Swaption Portfolio Pricing

Details of this algorithm have been described here. For convenience, we will briefly summarize it here.

A Monte-Carlo simulation is used to price a portfolio of LIBOR swaptions. Thousands of possible future development paths for the LIBOR interest rate are simulated using normally-distributed random numbers. For each of these Monte-Carlo paths, the value of the swaption portfolio is computed by applying a portfolio payoff function. The equations for computing the LIBOR rates and payoff are given here. Furthermore, the sensitivity of the portfolio value with respect to changes in the underlying interest rate is computed using an adjoint method. This sensitivity is a Greek, called λ, as detailed here. Both the final portfolio value and the λ value are obtained by computing the mean of all per-path values.

The figure below illustrates the algorithm:

LIBOR Swaption Portfolio Pricer algorithm

Benchmark Setup

We compare the same application, implemented using the Xcelerit SDK 2.0.3, on two different systems. Their configuration is given in the following table:

Fermi System Kepler System
CPU 2 Intel Xeon E5620 2 Intel Xeon E5-2670
GPU 2 NVIDIA Tesla M2050 2 NVIDIA Tesla K20Xm
OS RHEL 5.4 (64bit) RHEL 6.2 (64bit)
RAM 24GB 64GB
GPU driver 304.22 304.47.06
CUDA Toolkit 4.2 4.2
Host Compiler GCC 4.4 GCC 4.4

Note that we are only comparing the GPU performance, so the difference in the used CPUs has no significant effect on the outcome.

Performance

We measured the computation times for the Monte-Carlo LIBOR swaption portfolio pricer on one GPU of each system, pricing a portfolio of 15 swaptions over 80 time steps and using varying numbers of Monte-Carlo paths. The run time of the full algorithm – including random number generation, data transfers, core computation, and reduction – is visualized for single and double precision in the graph below. Note that all these computation steps are running on the GPU, so the difference in the used CPUs does not affect the benchmark results.

Kepler vs. Fermi for LIBOR Swaption Portfolio Pricer

As we can see, there is a significant speedup when using the new K20Xm GPU (up to 1.9x). The table below shows the speedup factors of the Kepler vs. the Fermi GPU for different numbers of paths for better comparison:

Paths Speedup (single) Speedup (double)
16K 1.34x 1.61x
64K 1.51x 1.76x
256K 1.78x 1.86x
1024K 1.86x 1.89x

It is apparent that NVIDIA’s new Tesla K20Xm GPU gives a huge performance improvement for real-world applications – up to 1.9x in this example (very close to the theoretical peak: 2x). It can also be seen that the improvement for double precision is better than for single precision – something financial institutions will be pleased to hear.

More details can be found in the Monte-Carlo methods white paper available for download here.

Jörg Lotze

Technical Lead and Co-Founder at Xcelerit

Leave a Reply

Your email address will not be published. Required fields are marked *


six × = 48