When I was trying to solve Euler Project 205 using F#, I was thrilled to find an opportunity to use GPGPU to solve this problem. The correct solution for this problem is to use statistic method, but I would like to simulate the process and see what I can get. Again, I do not intend to solve this problem by this approach, just to learn GPGPU.

Simulating this process is simple, I make a 9*1000 size matrix A with random number from 1 to 4. Similarly, matrix B is 6 * 1000 with element ranged from 1 to 6. The Accelerator v2 from Microsoft Research provides a good managed platform for GPGPU. The following is the code:

let dxTarget = new Microsoft.ParallelArrays.

DX9Target()

let getWin index (gridSize:int) =

let shape = [| gridSize; gridSize; |]

let resultShape = [|gridSize|]

let zero, one = new FPA(0.0f, resultShape), new FPA(1.0f, resultShape)

let generatePrymaid i j = float32(random.Next(1,4))

let generateDice i j = float32(random.Next(1,6))

let px = new FPA(Array2D.init 9 gridSize generatePrymaid)

let py = new FPA(Array2D.init 6 gridSize generateDice)

let pxSum = PA.Sum(px, 0)

let pySum = PA.Sum(py, 0)

let cond = FPA.op_GreaterThan(pxSum, pySum)

let gpu = PA.Sum(PA.Cond(cond, one, zero), 0)

let a = dxTarget.ToArray1D(gpu)

let result = Seq.head a

tempResult <- tempResult + int64(result)

if (index%500=0) then

let total = int64(index) * int64(gridSize) + totalTest

printfn "Time = %A; Result = %A" index (float(tempResult) / float(total))

appendToFile file (System.String.Format("{0}\t{1}", tempResult, total))

result

let compute n = Seq.init n (fun i->getWin i gridSize) |> Seq.sum

let n = 250000;

let mutable start = System.DateTime.Now;

printfn "%A" ((compute n) / float32(gridSize*n))

printfn "%A" (System.DateTime.Now - start);

The result is interesting:

- on my powerful desktop: the GPU version is slower than CPU version.
- on my laptop, the GPU version is 2 time faster than CPU version. The CPU fan is quiet when I do the computation.