The Xcelerit platform ensures future proof applications – that is, applications implemented using the Xcelerit platform scale to new processor architectures and devices without the need to change any source code – in fact, even without recompilation. Now, with the exiting release of NVIDIA’s new flagship GPU, the Tesla K20 (Kepler architecture), we put this to the test. This post compares the performance achieved with the Xcelerit platform on the Tesla M2050 (Fermi architecture) to the Tesla K20 (Kepler architecture). As an example application, we use the Monte-Carlo LIBOR swaption portfolio pricer algorithm, a real-world computational finance algorithm that we’ve already used in other benchmarks blog posts.
Monte-Carlo LIBOR Swaption Portfolio Pricing
Details of this algorithm have been described here. For convenience, we will briefly summarize it here.
A Monte-Carlo simulation is used to price a portfolio of LIBOR swaptions. Thousands of possible future development paths for the LIBOR interest rate are simulated using normally-distributed random numbers. For each of these Monte-Carlo paths, the value of the swaption portfolio is computed by applying a portfolio payoff function. The equations for computing the LIBOR rates and payoff are given here. Furthermore, the sensitivity of the portfolio value with respect to changes in the underlying interest rate is computed using Adjoint Algorithmic Differentiation (AD). This sensitivity is a Greek, called λ, as detailed here. Both the final portfolio value and the λ value are obtained by computing the mean of all per-path values.
The figure below illustrates the algorithm:
Benchmark Setup
We compare the same application, implemented using the Xcelerit SDK 2.0.3, on two different systems. Their configuration is given in the following table:
Fermi System | Kepler System | |
CPU | 2 Intel Xeon E5620 | 2 Intel Xeon E5-2670 |
GPU | 2 NVIDIA Tesla M2050 | 2 NVIDIA Tesla K20Xm |
OS | RHEL 5.4 (64bit) | RHEL 6.2 (64bit) |
RAM | 24GB | 64GB |
GPU driver | 304.22 | 304.47.06 |
CUDA Toolkit | 4.2 | 4.2 |
Host Compiler | GCC 4.4 | GCC 4.4 |
Note that we are only comparing the GPU performance, so the difference in the used CPUs has no significant effect on the outcome.
Performance
We measured the computation times for the Monte-Carlo LIBOR swaption portfolio pricer on one GPU of each system, pricing a portfolio of 15 swaptions over 80 time steps and using varying numbers of Monte-Carlo paths. The run time of the full algorithm – including random number generation, data transfers, core computation, and reduction – is visualized for single and double precision in the graph below. Note that all these computation steps are running on the GPU, so the difference in the used CPUs does not affect the benchmark results.
As we can see, there is a significant speedup when using the new K20Xm GPU (up to 1.9x). The table below shows the speedup factors of the Kepler vs. the Fermi GPU for different numbers of paths for better comparison:
Paths | Speedup (single) | Speedup (double) |
16K | 1.34x | 1.61x |
64K | 1.51x | 1.76x |
256K | 1.78x | 1.86x |
1024K | 1.86x | 1.89x |
It is apparent that NVIDIA’s new Tesla K20Xm GPU gives a huge performance improvement for real-world applications – up to 1.9x in this example (very close to the theoretical peak: 2x). It can also be seen that the improvement for double precision is better than for single precision – something financial institutions will be pleased to hear.
More details can be found in the Monte-Carlo methods white paper available for download here.
Jörg Lotze
Latest posts by Jörg Lotze (see all)
- Benchmarks: NVIDIA Tesla K80 vs. K40 GPU - November 17, 2014
- Benchmarks: Intel Haswell vs. Xeon Phi - September 24, 2014
- Benchmarks: Haswell vs. Ivy Bridge for Financial Analytics - September 17, 2014