In our previous blog post, we have seen that Haswell (Xeon E5 v3 series) achieves a significant performance boost compared to the previous generation Ivy Bridge processor (Xeon E5 v2 series). But let’s see how its performance compares to Intel’s flagship accelerator processor, the Xeon Phi, for a popular application in computational finance.
Archive for Jörg Lotze
Intel just released its new Haswell server processor line (Xeon E5 v3 series), promising significant performance gains over previous-generation Ivy-Bridge processors. In this post, we will compare the two generations for a popular financial application – a Monte-Carlo LIBOR Swaption Porfolio pricer.
We’ve just published a new white paper which covers how to cope with the computational complexity involved in calculating various valuation adjustments, such as Credit Valuation Adjustment (CVA), Debit Valuation Adjustment (DVA), and Funding Valuation Adjustment (FVA).
These adjustments are commonly referred to as xVA. This white paper gives an overview of the different xVA adjustments, shows
how they are typically computed, and outlines where the computational complexities lie. We give recommendations on how to achieve high performance, portability, and scalability for centralised in-house xVA implementations. We show how, by careful software design, we can easily harness, not only the power of multi-core CPUs, but also accelerator co-processors such as graphic processing units (GPUs) and the Intel Xeon Phi.
You can download the paper here.
NVIDIA freshly released their new flagship Tesla GPU, the Tesla K40. This GPU features more memory, higher clock rates, and more CUDA cores than the previous top-end card, the K20X. But what performance improvements can we expect for financial applications? We’ve put the new card to the test and compared it to the K20X using a Monte-Carlo LIBOR swaption portfolio pricer, a real-world financial algorithm that we’ve already used in other benchmarks.
Accelerators battle for compute-intensive analytics in Finance
At Xcelerit, people often ask us: “Which is better, Intel Xeon Phi or NVIDIA Kepler?” The general answer has to be “it depends,” as this is heavily application-dependent. But what if we zoom-in on real-world problems in computational finance? The kinds of problems that quants in investment banks and the financial industry are dealing with every day. Let’s analyse two different financial applications and see how they perform on each platform. To cover different types of algorithms often found in finance, we chose an embarrassingly parallel Monte-Carlo algorithm (with full independent paths) for the first test application, and a Monte-Carlo algorithm with cross-path dependencies with iterative time-stepping for the second.
[Update 1-Oct-2013: (American Monte-Carlo application only)]
- Algorithm update and avoiding temporary storage: affects GPU and Xeon Phi heavily, updated the performance numbers
- Updated performance figures for Ivy-Bridge CPU (Xeon E5-2697 v2) and Xeon Phi Processor (Xeon Phi 7120P)
- Replaced absolute times with speed-ups vs. sequential for better readability
Although in the TOP500 Supercomputing Sites list (November 2012) 5 out of the top 10 systems are based on IBM POWER processors, this platform is rarely used in the financial industry for compute-intensive analytics. We believe that with increasing computation demands and more and more data to be processed, e.g. in risk computations, the POWER platform might gain popularity in financial applications.
IBM’s POWER7+ Processor
There is a large number of high performance processors available these days, each with its own characteristics, and the landscape is quickly changing with new processors being released. There are CPUs, GPUs, FPGAs, Xeon Phi, DSPs – to name just a few. How should one decide which of these processors to use for a particular task? Or should even a combination of these processors be used jointly to get the best performance? And then, how to manage the complexity of handling these devices? In the following, we’ll attempt to answer these questions, in particular for users and applications in the financial services domain.
Large financial services companies have vast compute resources available, organised into computing grids (i.e., federations of compute resources to reach a common goal). But, as Murex’ Pierre Spatz explains, they don’t use them as supercomputers. “They use them as grids of small computers. It means that most of their code is in fact mono-threaded.” In contrast, the world’s top supercomputing sites often use clusters of machines in a very different – and more efficient – way. In the following, we will explain and demonstrate why – and illustrate how financial services firms can improve the efficiency of their existing compute grids. We only look at compute-intensive workloads, for example found in risk management and derivatives pricing.
There are many studies and publications claiming large speedups on High Performance Computing (HPC) hardware. The reality is that pretty much all kinds of speedups can be reported depending on the baseline used… What does this mean in practice? How should one interpret these numbers? And – more importantly – what should be the baseline to compare to? To help clear the confusion we will try to shed some light on this. We will also give our recommendations on how to choose a “fair baseline” and how to report benchmark results.
The Xcelerit platform ensures future proof applications – that is, applications implemented using the Xcelerit platform scale to new processor architectures and devices without the need to change any source code – in fact, even without recompilation. Now, with the exiting release of NVIDIA’s new flagship GPU, the Tesla K20 (Kepler architecture), we put this to the test. This post compares the performance achieved with the Xcelerit platform on the Tesla M2050 (Fermi architecture) to the Tesla K20 (Kepler architecture). As an example application, we use the Monte-Carlo LIBOR swaption portfolio pricer algorithm, a real-world computational finance algorithm that we’ve already used in other benchmarks blog posts.