Although in the TOP500 Supercomputing Sites list (November 2012) 5 out of the top 10 systems are based on IBM POWER processors, this platform is rarely used in the financial industry for compute-intensive analytics. We believe that with increasing computation demands and more and more data to be processed, e.g. in risk computations, the POWER platform might gain popularity in financial applications.
IBM’s POWER7+ Processor
There is a large number of high performance processors available these days, each with its own characteristics, and the landscape is quickly changing with new processors being released. There are CPUs, GPUs, FPGAs, Xeon Phi, DSPs – to name just a few. How should one decide which of these processors to use for a particular task? Or should even a combination of these processors be used jointly to get the best performance? And then, how to manage the complexity of handling these devices? In the following, we’ll attempt to answer these questions, in particular for users and applications in the financial services domain.
Xcelerit, in association with the Wilmott Forum, recently held a seminar in London city on the topic of accelerating CVA computations. The event was over-subscribed and even though we maintained a waiting list, many people were still left disappointed. We are happy to announce that a video of the event is now being made available on our website. If you missed it, click here to gain access to the material.
There are three main talks of between 15 and 30 minutes duration.
- The renowned Justin Clarke from Edu-Risk international gives a tutorial introduction to CVA algorithms, their structure and how banks are using them today
- John Ashley – an NVIDIA Solutions Architect specializing in finance describes how GPUs can economically provide the compute power needed to run these calculation and finally
- Hicham Lahlou from Xcelerit shows the software dimension to this, taking you through worked examples of CVA calculations using the streamlined techniques available through the Xcelerit SDK
We hope you enjoy the videos as much as our live audience did. Don’t hesitate to follow-up with us with questions or to discuss your CVA performance challenges.
Traders at HSBC need to monitor their level of counter-party risk exposure on an ongoing basis. Increasing scrutiny from regulators under the Basel III framework and other regulations means that so called Credit Value Adjustment (CVA) analysis must be carried out regularly over their entire portfolio to compute their risk exposure and consequently their regulatory capital requirements.
Traditionally, this kind of calculation takes many hours to run on a bank’s grid facility. Eurico Covas, Head of QRVG Development and Hedge Accounting Systems – Quantitative Risk and Valuation Group (QRVG) – wanted to see whether it was possible to run this calculation on an intra-day rather than an overnight basis.
He put together an experimental facility involving a couple of GPUs from NVIDIA. He turned to Xcelerit for the software part of the picture. “We already had extensive in-house libraries to carry out CVA and other calculations, and we had heard that Xcelerit offered an easy way to get our existing code to drive GPUs at their maximum speeds”, he said.
FPGAs are programmable hardware devices, traditionally used in the signal processing domain for real-time number-crunching where high performance and low power consumption are paramount. For financial services, their deterministic high performance and low latency makes FPGAs a perfect fit for high-frequency trading – and that’s where FPGAs are typically used in banks and hedge funds. However, the complexity to develop in VHDL or Verilog has been a major barrier for adoption in derivatives pricing and risk management applications, i.e. compute-intensive analytics. In this blog post, we will look into using FPGAs for this type of algorithms, using the OpenCL-enabled PCIe-385N FPGA board from Nallatech that we’ve just received. It features the powerful Altera Stratix V A7 FPGA. We’ve put it to the test using the example of a complex derivative pricing algorithm.
[Update 29-Apr-2012: The application that follows uses a prototype version of the Xcelerit SDK for FPGA support. The SDK's OpenCL backend is used to generate OpenCL code from a high-level source, which is then translated into FPGA logic using Altera's SDK for OpenCL.]
This board comes in a PCI express form factor which can be plugged into workstations easily. It is small (Low Profile, Half Length PCIe) and low power (tens of watts). It can be configured with 8 or 16GB of memory, leaving enough head room for most financial applications. The board supports Altera’s SDK for OpenCL – allowing it to be programmed using higher-level software tools. This SDK automatically compiles and synthesizes the OpenCL kernel code into FPGA logic, creating deep parallel pipelines and adding the interfacing logic to control the execution via the host CPU.
Nallatech PCIe-385N FPGA Card with Altera Stratix V FPGA
Large financial services companies have vast compute resources available, organised into computing grids (i.e., federations of compute resources to reach a common goal). But, as Murex’ Pierre Spatz explains, they don’t use them as supercomputers. “They use them as grids of small computers. It means that most of their code is in fact mono-threaded.” In contrast, the world’s top supercomputing sites often use clusters of machines in a very different – and more efficient – way. In the following, we will explain and demonstrate why – and illustrate how financial services firms can improve the efficiency of their existing compute grids. We only look at compute-intensive workloads, for example found in risk management and derivatives pricing.
As a consequence of the global financial crisis in 2007/2008, regulators worldwide put in place a set of regulations aimed to overhaul the financial industry and prevent a similar crisis in the future. The third instalment of the Basel accords (Basel III), Dodd-Frank in the US, and EMIR in Europe all have similar goals. The regulatory measures will considerably increase the capital that banks must hold, impose major short-term costs, and change the way the banks manage themselves. Let’s look into the impact those increased regulatory requirements have on Risk Management and the associated computational complexity.
Let’s look into Intel’s brand-new high performance computing co-processor – the Xeon Phi. We will compare its performance for a financial application to the latest Xeon “Sandy Bridge” server processors.
The Intel Xeon Phi Co-processor
The Xeon Phi 5110P is an x86 architecture many-core processor that has 60 cores with 4x hyperthreading, i.e., 240 logical cores. It comes as a PCIe-16x extension card, runs at 1.053GHz, and is equipped with 8GB high-bandwidth memory (320GB/s). The extension card runs it’s own Linux operating system and can be accessed by the host system either as a separate system, or via offloading sections of the code to the co-processor. Intel claims it delivers up to 1 Teraflops (double-precision). More information about the Xeon Phi can be found on the Intel website.
Intel Xeon Phi 5110P PCIe Co-processor Card
Excelian, the well known Financial Services technology specialist, decided this week to put the Xcelerit SDK through its paces. This company, headquartered in the UK, offers services around turn-key financial software as well as advice on grid and high performance computing. Their consultants maintain a blog on new technologies and this week tried out the Xcelerit SDK on a sample Monte-Carlo application.
Overall they gave our SDK a thumbs-up: “Is it easy to use the Xcelerit SDK? Yes it is…”. They managed to port an existing application within a few hours which is great feedback coming from the developers themselves. Their team started with a sequential Monte-Carlo application and using its execution time as a base-line, they managed to achieve a 9x speedup with just CPUs (8 cores) and a blistering 92x speedup with the help of 2 GPUs. Once again, their rating on performance was very positive: “Does it improve performance? Yes it does especially when GPU kicks in!.”
Being interested in a cluster environment, they also tried integrating the Xcelerit SDK with IBM Platform Symphony and we were pleased to see that they found the process “relatively straightforward”. Everyone here at Xcelerit was delighted to hear about their positive experience and we hope we now have gained another important supporter in this influential organization.
We’ve just published a white paper entitled “Accelerating CVA Computation the Easy Way”:
The risks associated with over-the-counter (OTC) derivatives were the key contributing factor to the 2007/2008 global financial crisis. Therefore financial institutions worldwide have drastically shifted the focus of their risk management towards counterparty credit risk (CCR), i.e., the risk that a counterparty defaults before the end of an OTC contract.
Several CCR measures are in use in practice, e.g., credit value adjustment (CVA), and international regulatory frameworks (i.e. Basel III) introduced more measures and increased the computational complexity. In addition, banks are striving to quickly respond to market and regulatory changes through flexible in-house software. This makes fast and maintainable software implementations essential.