Let’s look into Intel’s brand-new high performance computing co-processor – the Xeon Phi. We will compare its performance for a financial application to the latest Xeon “Sandy Bridge” server processors.

### The Intel Xeon Phi Co-processor

The Xeon Phi 5110P is an x86 architecture many-core processor that has 60 cores with 4x hyperthreading, i.e., 240 logical cores. It comes as a PCIe-16x extension card, runs at 1.053GHz, and is equipped with 8GB high-bandwidth memory (320GB/s). The extension card runs it’s own Linux operating system and can be accessed by the host system either as a separate system, or via offloading sections of the code to the co-processor. Intel claims it delivers up to 1 Teraflops (double-precision). More information about the Xeon Phi can be found on the Intel website.

### Test Application: Monte-Carlo LIBOR Swaption Portfolio Pricing

Details of this algorithm have been previously described. For convenience, we briefly summarise it below:

A Monte-Carlo simulation is used to price a portfolio of LIBOR swaptions. Thousands of possible future development paths for the LIBOR interest rate are simulated using normally-distributed random numbers. For each of these Monte-Carlo paths, the value of the swaption portfolio is computed by applying a portfolio payoff function. The equations for computing the LIBOR rates and payoff are given in Prof. Mike Giles’ notes. Furthermore, the sensitivity of the portfolio value with respect to changes in the underlying interest rate is computed using Adjoint Algorithmic Differentiation (AD). This sensitivity is a Greek, called λ, and its computation is detailed in the paper Monte Carlo evaluation of sensitivities in computational finance. Both the final portfolio value and the λ value are obtained by computing the mean of all per-path values.

### Benchmark Setup

We run the same application on both the Intel Xeon Phi processor and a Intel “Sandy Bridge” server CPU and compare their performance. The application uses offload mode, i.e., the main executable runs on the host CPU and the Monte-Carlo computation is offloaded to the Phi processor. The test system has the following configuration:

**CPU:**2 Intel Xeon E5-2670 processors, 8 cores each, hyperthreading disabled**Co-Processor:**Intel Xeon Phi 5110P**OS:**RedHat Enterprise Linux 6.2 (64bit)**RAM:**64GB**Compiler:**Intel Composer XE 2013**Server:**HP ProLiant SL250s Gen8

**Note: we are comparing two “Sandy Bridge” processors to a single Xeon Phi.**

### Performance

We measured the computation times of an optimized parallel version of the Monte-Carlo LIBOR swaption portfolio pricer. It is executed once on the two “Sandy Bridge” host CPUs (multi-threaded) and compared to the the Xeon Phi co-processor in offload mode. The execution time of the full application is measured, including data transfers, random number generation, and reduction. **All these steps are running on the target processor.** Below is a plot of the speedups achieved on the Xeon Phi co-processor vs. the “Sandy Bridge” computation – for single and double precision.

As we can see, from about 100K paths onwards, the Intel Xeon Phi becomes faster than the 2 “Sandy Bridge” processors, reaching nearly 3x at 1M paths. With lower numbers of paths, the “Sandy Bridge” processors outperform the Phi. This can be explained by the added data transfers and the comparably low level of parallelism for a low number of paths (considering both vectorization and multi-threading). The setup time for the random number generator also becomes more dominant on the Xeon Phi when there is relatively little computation performed.

It is interesting to see that double precision performance is better than single precision when compared to “Sandy Bridge”.

The table below shows the speedups vs. “Sandy Bridge” for different numbers of paths, for easier comparison:

Paths |
Speedup (single) |
Speedup (double) |

32K | 0.33x | 0.52x |

128K | 1.05x | 1.24x |

1024K | 2.6x | 2.92x |

Note that currently the Xeon Phi Co-processor is only supported with the Linux operating system. Given that most financial institutions use Windows for software development and production grids, we believe that this is a major limitation to its adoption.

#### Paul Sutton

#### Latest posts by Paul Sutton (see all)

- Derivative Pricing on Altera’s FPGAs - April 9, 2013
- Benchmarks: Intel Xeon Phi vs. “Sandy Bridge” - March 5, 2013
- Benchmarks: Xcelerit SDK vs. CUDA - September 18, 2012