I’m working through some basic performance testing and comparisons for cloud compute and this post is intended to provide a bit of detail regarding the test platform I’m using for conducting these tests. I should also be clear that the test platforms below are not designed to be perfect-world hardware platforms, but rather proof-of-concept and suitable for order-of-magnitude testing.

General Code Base

For this aspect of testing I have been working with the Asian Options Pricing WCF sample provided by Microsoft on the HPC resource kit site (http://resourcekit.windowshpc.net/). This is a simple VSTO-enabled worksheet that calculates the price of an option on the Asian market. I’m by no means a market analyst nor can a vouch for the accuracy of the calculation, but it does provide a CPU-intensive operation that can be parallelized easily making it a good candidate for scale-out compute. The sample is designed to be illustrative of submitting WCF jobs (‘micro jobs’) to a Windows HPC cluster and comes with source and instructions for getting it setup and running.

In the default configuration, the worksheet performs 100 iterations of Monte Carlo pricing runs using a function called PriceAsianOptions (code listed at the bottom of this post). The basic theory may be understood better by reading this article. The function takes the following input parameters:

  • Up – the specific factor by which the underlying instrument may move up per step of the binomial price tree
  • Down – the specific factor by which the underlying instrument may move down per step of the binomial price tree.
  • Interest – the continuously compounded, risk-free interest rate. This number doesn’t affect the complexity of the calculation so I left it constant.
  • Initial Price – the starting or current price of the stock. I left this constant as it doesn’t affect the complexity of the calculation
  • Periods – the number of periods (trading days) before the option expires. The default was 20 and I left this constant. Technically it could increase the complexity of the calculation but changing the value of the Runs parameter does a suitable job of this.
  • Exercise – the price at which the call should be exercised
  • Runs – within a given calculation, how many times should the calculation be done prior to averaging and returning the results. This defaulted to 1,000,000 and is the prime value that I altered during the runs to increase the duration of the calculation.

Once the worksheet has finished calculating, a batch (by default 100 such calls to PriceAsianOptions) it calculates and displays the Average of the Monte Carlo runs, the Min, Max, Standard Deviation, Standard Error, and Execution Time in seconds.

NOTE: this test platform is much less about the specific problem being solved and more about taking a CPU/Compute intensive problem and expressing it on different platforms.

Windows HPC Server Environment

Hardware: My test environment consists of a small cluster with one head node and two compute nodes. The head node is a single-proc box running with 2 GB of RAM and 2 NICs. Besides the cluster Head Node role, it is also running DNS, DHCP, AD, SQL and has the WCF broker role (it does *not* have the compute role). Each compute node is a dual core Intel box with 4 GB RAM running Windows 2008 HPC Server. The cluster is configured such that the head node has one leg on the “enterprise network” (in my corner of the universe this is simply my lab network) and another leg on the private network shared with the compute nodes. The individual compute nodes are not accessible from outside this private network.

Software: While the instructions and download would lead you to believe that the code is ready to run out-of-the-box (OOTB) that is not quite true. Beyond the changes detailed in the instructions, I had to make some changes in additional locations for the name of the cluster’s head node as well as that to the dll as seen from the compute nodes. Beyond those changes however, the code used in the tests is identical to what is available on the resource kit site. The main logic is a loop to submit n jobs, with each request having an asynchronous event to process the results.

Local Compute Environment

Hardware: The platform I’m using for the “local machine” compute tests is a Windows Server 2008 Standard box (64 bit) with 4 GB of RAM and a single 2.4 Ghz processor. This certainly isn’t anything fancy but should provide a middle-of-the-road hardware spec for comparison to the other platforms.

Software: I started by taking the worksheet used for the HPC environment and added another button for the “local” runs. Since the hardware I’m running on only has a single processor, I adjusted the logic loop to not be parallelized (additional threads would simply harm performance) but rather sequential. The work is performed on a background thread and as soon as an individual computation is completed the UI is notified/updated.

Azure Environment

Hardware: The configuration for the test bed as hosted in Azure is split over two projects. The first project is the data project (used for queues and tables) and is not part of any affinity group and is configured for a geographic location of USA-Anywhere.  The second project is the compute project and is configured with one web role and two worker roles. It is also not part of an affinity group and is configured with a geographic location of USA-Anywhere. NOTE: the lack of specifically selecting an affinity group or geographic location is to provide a sort of worst-case scenario assuming that selecting either of those would, if anything, only improve the performance.

Software: There was a good bit more coding to do here as I attempted to mimic (in theoretical approach at least) the general implementation of WCF services on Windows HPC. The web role is host to a WCF service that allows a client (in this case the Excel worksheet) to submit a single pricing request (providing the parameters explained above). This pricing request and parameters is serialized and placed on the Azure queue to be picked up by one of the running worker roles (incidentally, one nice side affect of this approach is that from a coding standpoint it makes no difference how many worker roles exist – if more are needed they can simply be configured and items will be processed off the queue in a quicker fashion). Once the worker role picks up the pricing request/data, it processes the request and places the resulting price (and a the request identifier) into a row in an Azure table. At this point it checks the queue again for the next pricing request. An additional method exists in the WCF service that accepts a request ID and then looks at the azure table for a result matching that request ID. If one is found, the result is returned and the data removed from the azure table.

On the Excel worksheet side, I modified the same worksheet as before with yet another button for submitting the work to the Azure-hosted WCF service. I took a similar loop to the other two scenarios but made the adjustment that it will first submit all of the requests and only then will it begin asking for the results. The idea is to avoid having the requests for results clogging the network and slowing down the submission of jobs – I wanted to keep the worker roles as busy as possible until the work was done. I initially implemented this using a similar approach to the HPC code (async anonymous methods handling the results) but this failed as the tests grew bigger because of the default web server timeouts (i.e. I’d have 100 requests queued and it was likely that more than 90 seconds would pass before the results were finished).

Sample Code

private double PriceAsianOptions(double initial, 
    double exercise, double up, double down, double interest, 
    int periods, int runs)
{
    double[] pricePath = new double[periods + 1];

    // Risk-neutral probabilities
    double piup = (interest - down) / (up - down);
    double pidown = 1 - piup;

    double temp = 0.0;

    Random rand = new Random();
    double priceAverage = 0.0;
    double callPayOff = 0.0;

    for (int index = 0; index < runs; index++)
    {
        // Generate Path
        double sumPricePath = initial;

        for (int i = 1; i <= periods; i++)
        {
            pricePath[0] = initial;
            double rn = rand.NextDouble();

            if (rn > pidown)
            {
                pricePath[i] = pricePath[i - 1] * up;
            }
            else
            {
                pricePath[i] = pricePath[i - 1] * down;
            }
            sumPricePath += pricePath[i];
        }

        priceAverage = sumPricePath / (periods + 1);
        callPayOff = Math.Max(priceAverage - exercise, 0);

        temp += callPayOff;
    }
    return (temp / Math.Pow(interest, periods)) / runs;
}

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
1 Comments