Archive for Jörg Lotze

Benchmarker Beware!

Gigantic speedups are often reported from porting existing CPU applications to GPUs and other accelerators (e.g. Intel Xeon Phi, FPGAs etc), but very often little thought is given to the baseline of the comparison. Developers operating in the real world have to deal with large legacy code bases which are often hurriedly designed and poorly maintained. The performance of these code bases is often directly compared against code produced by specialists who are experts at squeezing the very last drop of performance from their chosen accelerator device. The resulting comparison – while factually based – may deliver a misleading result.

stamp beware

Read more

Benchmarks: NVIDIA Tesla K80 vs. K40 GPU

Today NVIDIA announced the release of their fastest ever accelerator for scientific computing – the Tesla K80. It has more raw compute power than any other GPU card available on the market, featuring two of the new GK210 GPUs. But what performance improvements can we expect for typical financial applications? Xcelerit have been given early access to the card and put it to the test, comparing it to their previous flagship, the Tesla K40.

NVIDIA Tesla K80 GPU Accelerator Card

Read more

Benchmarks: Intel Haswell vs. Xeon Phi

In our previous blog post, we have seen that Haswell (Xeon E5 v3 series) achieves a significant performance boost compared to the previous generation Ivy Bridge processor (Xeon E5 v2 series). But let’s see how its performance compares to Intel’s flagship accelerator processor, the Xeon Phi, for a popular application in computational finance.

Intel Xeon Phi 5110P PCIe Card

Intel Xeon Phi 7120P

Xeon E5 v3

Intel Xeon E5 v3 (aka Haswell)

Read more

Benchmarks: Haswell vs. Ivy Bridge for Financial Analytics

Intel just released its new Haswell server processor line (Xeon E5 v3 series), promising significant performance gains over previous-generation Ivy-Bridge processors. In this post, we will compare the two generations for a popular financial application – a Monte-Carlo LIBOR Swaption Porfolio pricer.

Haswell Processor Die

Haswell Processor Die

Read more

White Paper: xVA – Coping with the Tsunami of Compute Load

We’ve just published a new white paper which covers how to cope with the computational complexity involved in calculating various valuation adjustments, such as Credit Valuation Adjustment (CVA), Debit Valuation Adjustment (DVA), and Funding Valuation Adjustment (FVA).

These adjustments are commonly referred to as xVA. This white paper gives an overview of the different xVA adjustments, shows
how they are typically computed, and outlines where the computational complexities lie. We give recommendations on how to achieve high performance, portability, and scalability for centralised in-house xVA implementations. We show how, by careful software design, we can easily harness, not only the power of multi-core CPUs, but also accelerator co-processors such as graphic processing units (GPUs) and the Intel Xeon Phi.

You can download the paper here.

Benchmarks: NVIDIA Tesla K40 vs. K20X GPU

NVIDIA freshly released their new flagship Tesla GPU, the Tesla K40. This GPU features more memory, higher clock rates, and more CUDA cores than the previous top-end card, the K20X. But what performance improvements can we expect for financial applications? We’ve put the new card to the test and compared it to the K20X using a Monte-Carlo LIBOR swaption portfolio pricer, a real-world financial algorithm that we’ve already used in other benchmarks.

NVIDIA Tesla K40 GPU Accelerator

Read more

Benchmarks: Intel Xeon Phi vs. NVIDIA Tesla GPU

Accelerators battle for compute-intensive analytics in Finance

At Xcelerit, people often ask us: “Which is better, Intel Xeon Phi or NVIDIA Kepler?” The general answer has to be “it depends,” as this is heavily application-dependent. But what if we zoom-in on real-world problems in computational finance? The kinds of problems that quants in investment banks and the financial industry are dealing with every day. Let’s analyse two different financial applications and see how they perform on each platform. To cover different types of algorithms often found in finance, we chose an embarrassingly parallel Monte-Carlo algorithm (with full independent paths) for the first test application, and a Monte-Carlo algorithm with cross-path dependencies with iterative time-stepping for the second.

[Update 1-Oct-2013: (American Monte-Carlo application only)]

  • Algorithm update and avoiding temporary storage: affects GPU and Xeon Phi, updated the performance numbers
  • Updated performance figures for Ivy-Bridge CPU (Xeon E5-2697 v2) and Xeon Phi Processor (Xeon Phi 7120P)
  • Replaced absolute times with speed-ups vs. sequential for better readability
Intel Xeon Phi 5110P PCIe Card

Intel Xeon Phi 5110P

NVIDIA Tesla K20 GPU Accelerator


Read more

Financial Applications on IBM’s POWER7+

Although in the TOP500 Supercomputing Sites list (November 2012) 5 out of the top 10 systems are based on IBM POWER processors, this platform is rarely used in the financial industry for compute-intensive analytics. We believe that with increasing computation demands and more and more data to be processed, e.g. in risk computations, the POWER platform might gain popularity in financial applications.

IBM’s POWER7+ Processor

POWER7 Layout

Read more

The Future is Hybrid – Trends in HPC

There is a large number of high performance processors available these days, each with its own characteristics, and the landscape is quickly changing with new processors being released. There are CPUs, GPUs, FPGAs, Xeon Phi, DSPs – to name just a few. How should one decide which of these processors to use for a particular task? Or should even a combination of these processors be used jointly to get the best performance? And then, how to manage the complexity of handling these devices? In the following, we’ll attempt to answer these questions, in particular for users and applications in the financial services domain.

The Future Is Hybrid

Read more

Efficiently Using Computing Grids in the Financial Industry

Large financial services companies have vast compute resources available, organised into computing grids (i.e., federations of compute resources to reach a common goal). But, as Murex’ Pierre Spatz explains, they don’t use them as supercomputers. “They use them as grids of small computers. It means that most of their code is in fact mono-threaded.” In contrast, the world’s top supercomputing sites often use clusters of machines in a very different – and more efficient – way. In the following, we will explain and demonstrate why – and illustrate how financial services firms can improve the efficiency of their existing compute grids. We only look at compute-intensive workloads, for example found in risk management and derivatives pricing.

Read more