the algorithm first try to fill the result set whose size is resultSize. Once successfully get resultSize elements, the rest element is selected randomly based on a uniform distribution from 0 to current. If the generated random number is in the range of [0, resultSize), replace the result set's element with the new element. So the new element has the probability resultSize/currentIndex.
let rand = System.Random() let (|InReserviorRange|_|) i resultSize = if i < resultSize then Some() else None type ResultList<'a> = System.Collections.Generic.List<'a> let result = ResultList<_>() let reserviorSampling data resultSize = let (|InReserviorRange|_|) i = (|InReserviorRange|_|) i resultSize data |> Seq.iteri (fun index n -> match index with | InReserviorRange-> result.Add(n) | _ -> let newIndex = rand.Next(index) match newIndex with | InReserviorRange -> result.[newIndex] <- n | _ -> ()) let seq = seq { for i=0 to 100 do yield i } reserviorSampling seq 5 result |> Seq.iter (printfn "%A")
Note: For a real world application, you have to rewrite the random number generator for number bigger than int32's max value.
In this code, I also want to show the power of active pattern. The InReserviorRange is an active pattern. It make sure the i
I had had made some bugs because ignore the ELSE scenario or think ELSE scenario is not necessary. The match + active pattern can prevent this kind of error. The active pattern just another way to define a function, no much extra cost than define a function. And it gives extra check with match and improves the code readability.
The active pattern can also use the higher order function idea, look at the second InReserviorRange pattern, that is the syntax how to define second pattern based on current pattern.
No comments:
Post a Comment