Intel just released its new Haswell server processor line (Xeon E5 v3 series), promising significant performance gains over previous-generation Ivy-Bridge processors. In this post, we will compare the two generations for a popular financial application – a Monte-Carlo LIBOR Swaption Porfolio pricer.

## Archive for Jörg Lotze

## White Paper: xVA – Coping with the Tsunami of Compute Load

We’ve just published a new white paper which covers how to cope with the computational complexity involved in calculating various valuation adjustments, such as Credit Valuation Adjustment (CVA), Debit Valuation Adjustment (DVA), and Funding Valuation Adjustment (FVA).

These adjustments are commonly referred to as xVA. This white paper gives an overview of the different xVA adjustments, shows

how they are typically computed, and outlines where the computational complexities lie. We give recommendations on how to achieve high performance, portability, and scalability for centralised in-house xVA implementations. We show how, by careful software design, we can easily harness, not only the power of multi-core CPUs, but also accelerator co-processors such as graphic processing units (GPUs) and the Intel Xeon Phi.

You can download the paper here.

## Benchmarks: NVIDIA Tesla K40 vs. K20X GPU

NVIDIA freshly released their new flagship Tesla GPU, the Tesla K40. This GPU features more memory, higher clock rates, and more CUDA cores than the previous top-end card, the K20X. But what performance improvements can we expect for financial applications? We’ve put the new card to the test and compared it to the K20X using a Monte-Carlo LIBOR swaption portfolio pricer, a real-world financial algorithm that we’ve already used in other benchmarks.

## Benchmarks: Intel Xeon Phi vs. NVIDIA Tesla GPU

*Accelerators battle for compute-intensive analytics in Finance*

At Xcelerit, people often ask us: “Which is better, Intel Xeon Phi or NVIDIA Kepler?” The general answer has to be “it depends,” as this is heavily application-dependent. But what if we zoom-in on real-world problems in computational finance? The kinds of problems that quants in investment banks and the financial industry are dealing with every day. Let’s analyse two different financial applications and see how they perform on each platform. To cover different types of algorithms often found in finance, we chose an embarrassingly parallel Monte-Carlo algorithm (with full independent paths) for the first test application, and a Monte-Carlo algorithm with cross-path dependencies with iterative time-stepping for the second.

[**Update 1-Oct-2013:** (American Monte-Carlo application only)]

- Algorithm update and avoiding temporary storage: affects GPU and Xeon Phi heavily, updated the performance numbers
- Updated performance figures for Ivy-Bridge CPU (Xeon E5-2697 v2) and Xeon Phi Processor (Xeon Phi 7120P)
- Replaced absolute times with speed-ups vs. sequential for better readability

## Financial Applications on IBM’s POWER7+

Although in the TOP500 Supercomputing Sites list (November 2012) 5 out of the top 10 systems are based on IBM POWER processors, this platform is rarely used in the financial industry for compute-intensive analytics. We believe that with increasing computation demands and more and more data to be processed, e.g. in risk computations, the POWER platform might gain popularity in financial applications.

### IBM’s POWER7+ Processor

## The Future is Hybrid – Trends in HPC

There is a large number of high performance processors available these days, each with its own characteristics, and the landscape is quickly changing with new processors being released. There are CPUs, GPUs, FPGAs, Xeon Phi, DSPs – to name just a few. How should one decide which of these processors to use for a particular task? Or should even a combination of these processors be used jointly to get the best performance? And then, how to manage the complexity of handling these devices? In the following, we’ll attempt to answer these questions, in particular for users and applications in the financial services domain.

## Efficiently Using Computing Grids in the Financial Industry

Large financial services companies have vast compute resources available, organised into computing grids (i.e., federations of compute resources to reach a common goal). But, as Murex’ Pierre Spatz explains, they don’t use them as supercomputers. “They use them as grids of small computers. It means that most of their code is in fact mono-threaded.” In contrast, the world’s top supercomputing sites often use clusters of machines in a very different – and more efficient – way. In the following, we will explain and demonstrate why – and illustrate how financial services firms can improve the efficiency of their existing compute grids. We only look at compute-intensive workloads, for example found in risk management and derivatives pricing.

## HPC Benchmarks: What is a Fair Baseline?

There are many studies and publications claiming large speedups on High Performance Computing (HPC) hardware. The reality is that pretty much all kinds of speedups can be reported depending on the baseline used… What does this mean in practice? How should one interpret these numbers? And – more importantly – what should be the baseline to compare to? To help clear the confusion we will try to shed some light on this. We will also give our recommendations on how to choose a “fair baseline” and how to report benchmark results.

## Benchmarks: NVIDIA Kepler vs. Fermi

The Xcelerit platform ensures future proof applications – that is, applications implemented using the Xcelerit platform scale to new processor architectures and devices without the need to change any source code – in fact, even without recompilation. Now, with the exiting release of NVIDIA’s new flagship GPU, the Tesla K20 (Kepler architecture), we put this to the test. This post compares the performance achieved with the Xcelerit platform on the Tesla M2050 (Fermi architecture) to the Tesla K20 (Kepler architecture). As an example application, we use the Monte-Carlo LIBOR swaption portfolio pricer algorithm, a real-world computational finance algorithm that we’ve already used in other benchmarks blog posts.

## Benchmarks: Xcelerit SDK vs. OpenMP

Having looked at the GPU performance of Xcelerit-enabled applications, this post provides benchmarks and comparisons on multi-core CPUs. As a baseline, we will use OpenMP and compare to an Xcelerit implementation of the same algorithm. The application used for benchmarking is the same as in the CUDA vs. Xcelerit SDK blogpost: pricing a LIBOR swaption portfolio.

### LIBOR Swaption Portfolio Pricing

Details of this algorithm have been described before. For convenience, we will briefly summarize it here.

A Monte-Carlo simulation is used to price a portfolio of LIBOR swaptions. Thousands of possible future development paths for the LIBOR interest rate are simulated using normally-distributed random numbers. For each of these Monte-Carlo paths, the value of the swaption portfolio is computed by applying a portfolio payoff function. The equations for computing the LIBOR rates and payoff are given here. Furthermore, the sensitivity of the portfolio value with respect to changes in the underlying asset price is computed using an adjoint method. This sensitivity is a Greek, called λ, as detailed here. Both the final portfolio value and the λ value are obtained by computing the mean of all per-path values.

The figure below illustrates the algorithm: