NVIDIA freshly released their new flagship Tesla GPU, the Tesla K40. This GPU features more memory, higher clock rates, and more CUDA cores than the previous top-end card, the K20X. But what performance improvements can we expect for financial applications? We’ve put the new card to the test and compared it to the K20X using a Monte-Carlo LIBOR swaption portfolio pricer, a real-world financial algorithm that we’ve already used in other benchmarks.
Archive for Jörg Lotze
Accelerators battle for compute-intensive analytics in Finance
At Xcelerit, people often ask us: “Which is better, Intel Xeon Phi or NVIDIA Kepler?” The general answer has to be “it depends,” as this is heavily application-dependent. But what if we zoom-in on real-world problems in computational finance? The kinds of problems that quants in investment banks and the financial industry are dealing with every day. Let’s analyse two different financial applications and see how they perform on each platform. To cover different types of algorithms often found in finance, we chose an embarrassingly parallel Monte-Carlo algorithm (with full independent paths) for the first test application, and a Monte-Carlo algorithm with cross-path dependencies with iterative time-stepping for the second.
[Update 1-Oct-2013: (American Monte-Carlo application only)]
- Algorithm update and avoiding temporary storage: affects GPU and Xeon Phi heavily, updated the performance numbers
- Updated performance figures for Ivy-Bridge CPU (Xeon E5-2697 v2) and Xeon Phi Processor (Xeon Phi 7120P)
- Replaced absolute times with speed-ups vs. sequential for better readability
Although in the TOP500 Supercomputing Sites list (November 2012) 5 out of the top 10 systems are based on IBM POWER processors, this platform is rarely used in the financial industry for compute-intensive analytics. We believe that with increasing computation demands and more and more data to be processed, e.g. in risk computations, the POWER platform might gain popularity in financial applications.
IBM’s POWER7+ Processor
There is a large number of high performance processors available these days, each with its own characteristics, and the landscape is quickly changing with new processors being released. There are CPUs, GPUs, FPGAs, Xeon Phi, DSPs – to name just a few. How should one decide which of these processors to use for a particular task? Or should even a combination of these processors be used jointly to get the best performance? And then, how to manage the complexity of handling these devices? In the following, we’ll attempt to answer these questions, in particular for users and applications in the financial services domain.
Large financial services companies have vast compute resources available, organised into computing grids (i.e., federations of compute resources to reach a common goal). But, as Murex’ Pierre Spatz explains, they don’t use them as supercomputers. “They use them as grids of small computers. It means that most of their code is in fact mono-threaded.” In contrast, the world’s top supercomputing sites often use clusters of machines in a very different – and more efficient – way. In the following, we will explain and demonstrate why – and illustrate how financial services firms can improve the efficiency of their existing compute grids. We only look at compute-intensive workloads, for example found in risk management and derivatives pricing.
There are many studies and publications claiming large speedups on High Performance Computing (HPC) hardware. The reality is that pretty much all kinds of speedups can be reported depending on the baseline used… What does this mean in practice? How should one interpret these numbers? And – more importantly – what should be the baseline to compare to? To help clear the confusion we will try to shed some light on this. We will also give our recommendations on how to choose a “fair baseline” and how to report benchmark results.
The Xcelerit platform ensures future proof applications – that is, applications implemented using the Xcelerit platform scale to new processor architectures and devices without the need to change any source code – in fact, even without recompilation. Now, with the exiting release of NVIDIA’s new flagship GPU, the Tesla K20 (Kepler architecture), we put this to the test. This post compares the performance achieved with the Xcelerit platform on the Tesla M2050 (Fermi architecture) to the Tesla K20 (Kepler architecture). As an example application, we use the Monte-Carlo LIBOR swaption portfolio pricer algorithm, a real-world computational finance algorithm that we’ve already used in other benchmarks blog posts.
Having looked at the GPU performance of Xcelerit-enabled applications, this post provides benchmarks and comparisons on multi-core CPUs. As a baseline, we will use OpenMP and compare to an Xcelerit implementation of the same algorithm. The application used for benchmarking is the same as in the CUDA vs. Xcelerit SDK blogpost: pricing a LIBOR swaption portfolio.
LIBOR Swaption Portfolio Pricing
Details of this algorithm have been described before. For convenience, we will briefly summarize it here.
A Monte-Carlo simulation is used to price a portfolio of LIBOR swaptions. Thousands of possible future development paths for the LIBOR interest rate are simulated using normally-distributed random numbers. For each of these Monte-Carlo paths, the value of the swaption portfolio is computed by applying a portfolio payoff function. The equations for computing the LIBOR rates and payoff are given here. Furthermore, the sensitivity of the portfolio value with respect to changes in the underlying asset price is computed using an adjoint method. This sensitivity is a Greek, called λ, as detailed here. Both the final portfolio value and the λ value are obtained by computing the mean of all per-path values.
The figure below illustrates the algorithm:
Having previously discussed the core Xcelerit SDK, this series of posts will look into an addon targeted at a specific application domain: Quantitative Finance. It is aimed at helping quantitative analysts and financial engineers with their day-to-day work. The Xcelerit Quant addon consists of a set of provided statistics functions and extensions for commonly used software packages:
- Statistics Extension (random number generators, statistical reductions, common statistical functions),
- MATLAB Extension (for interfacing to MATLAB), and
- Excel Extension (for interfacing to Microsoft Excel)
We will look into each of these extensions, starting with the Statistics Extension.
In computational finance, almost all models and algorithms have a statistical component. For example, Monte-Carlo simulations require random numbers and statistical reductions (such as mean and variance computation), and other algorithms use statistical distribution functions (cummulative distribution or probability density functions, quantiles, etc). The Xcelerit Statistics Extension provides built-in optimised functions for the convenience of users. Note that developers can also choose to implement their own components.
As an example application, the following figure shows an Xcelerit dataflow graph for a general Monte-Carlo simulation, as it is often found in financial algorithms:
Having discussed the programming and performance aspects of the Xcelerit SDK, this post will focus on the day-to-day development experience. This aspect is equally important as the programming interface and performance.
The typical development process involves the following steps:
- Setup the project and build process in an IDE,
- Implement the application using the Xcelerit SDK API and provided helpers,
- Debug the application to ensure correct behaviour, and
- Profile to find bottlenecks and tune the performance.
This post covers the above steps for Xcelerit SDK projects.
The Xcelerit SDK workflow integrates well into existing IDEs. Both Windows and Linux development enviroments are supported, and we’ll look into each operating system in more detail in the following.
In Windows, the de-facto standard is the Microsoft Visual Studio IDE. It provides support for easy graphical configuration of the complete build process, compiler and linker flags, properties, target platforms, etc. The Xcelerit SDK provides a plugin for this IDE which integrates seamlessly and calls the required tools under the hood. As soon as a file containing code for Xcelerit dataflow actors (file extension .xapp) is added to the project, a new property group for its compilation appears in the project. The screenshot below shows this: