Sunday, September 25, 2011

Sample in Detail - How to write a regex type provider

How to write a your own type provider
Regular expression type provider is a good sample to show how to write a type provider from scratch. The sample code is publish can be viewed here.

Some feedback points the sample cannot be compiled using .fs file. I cannot get a repro. Here is the link to the new project file with .fs file inside.

Setup Environment
First setup the development environment:
1.            Create an F# solution with two projects: an F# console project and an F# library project.
a.            The console project is used for test
b.           The library project is the type provider project
2.            Set the console project as startup project
3.            In the console project, reference to F# library project
Because the library project is referenced and locked by the Visual Studio, so please close all the files from the test project and restart the solution before you want to compile your change to the library project. Other document suggests opened two visual studio instances. It is totally your choice.
The following code is how this type provider is going to be invoked.
Figure 1 Regular expression type provider
type T = RegExProviderType<"(? < AreaCode > \d{3})-(? < PhoneNumber >\d{7}$)">
let reg = T()
let result = T.IsMatch("425-1232345")
let r = reg.Match("425-1232345").Group_AreaCode.Value //r = 425
·        RegExProviderType is the type provider name.
·        The string between angle brackets is the input to the type provider. This string is used to compute the properties and methods.
·        IsMatch is a RegExProviderType static method.
·        Match method on the last line is method take input string and return another generated type which contains property generated from the input string.
·        If the pattern is invalid according, the type provider should be smart enough to give a notification.
The method “IsMatch” and “Match” and property “Group_AreaCode” are generated from the input string.
Type Provider Template and Add Method and Property
The following table is a code snippet template to write a type provider.
Figure 2 Template for Type Provider

open System
open System.Linq.Expressions
open System.Reflection
open Microsoft.FSharp.Core.CompilerServices
open Microsoft.FSharp.Collections
open System.Collections.Generic
open Microsoft.FSharp.TypeProvider.Emit
open System.Text.RegularExpressions

type HelperClass() =    
type DebugType() 1=
    inherit System.Object()

type public RegExTypeProvider() as this =
    inherit TypeProviderForNamespaces()
    let thisAssembly = Assembly.GetExecutingAssembly()
    let rootNamespace = "
    let baseType = typeof

    let regexType =
        let t = ProvidedTypeDefinition2(thisAssembly, rootNamespace, "", Some(baseType))
                                    fun theTypeNameOfThisInstantiation args ->
                                        match args with
                                        | [| :? string as template |]4 -> 
                                        | _ -> failwith "need generic definition"

       //add a method or property
    do this.AddNamespace(rootNamespace, [regexType] )

1.       A DebugType is defined and this can help to identify the problem. Before the code is released, be sure to make it System.Object or other type. If the base type is set to object, it is easily confusing you when you see something like “expect type A while given System.Object”. You won’t be able to identify where to find the type A.
2.       Define the type provider type, this type is called “t”
3.       Define the input string. Its type is string and name is “pattern”.
4.       Take the value of input string as variable “template”. This is where we can access the pattern.
The type defined above is no use without any meaningful methods or property. Here is the way to add method:
Figure 3 Add Method
//add isMatch function
let parameters = [ ProvidedParameter("data", typeof) ]
let mi = ProvidedMethod("IsMatch", parameters, typeof)
mi.AddXmlDoc("Test if the input string matches the pattern")
mi.IsStaticMethod <- true
mi.InvokeCode <- fun args -> <@@ (Regex.IsMatch((%%args.[0]:string), template)) @@>
t.AddMember mi

The above code is very straightforward. A method named “IsMatch” is defined. It takes string as input and returns typeof.
The only tricky part is the %%args notation. If the method is a static method, the args is the parameter list. The first element in the args is the first parameter. If the method is a non-static method, the first argument is “this” pointer. The args is a list of “this” + parameter list. Please make sure the InvokeCode’s return type agrees to the type provided to ProvidedMethod.
If you want to add a property, there is something called ProvidedProperty type.
Figure 4 Add Property
//add RawRegex property
let pi = ProvidedProperty("RawRegex", typeof)
mi.AddXmlDoc("Return raw regex")
pi.IsStatic <- true
pi.GetterCode <- (fun args -> <@@ Regex(template) @@>)
t.AddMember pi

The code above defined a Regex type property called “RawRegex”. It return the Regex instance initialized from the variable “template”.
Because the regular expression type provider gets the pattern input string from outside. It is important to remember the #4 in the Figure 1 Template for Type Provider. The variable called “template” is the input string; this is a critical channel for a type provider to get information from outside world.
Embedded Type
At this point it is enough to write a simple type provider. We can easily finish 3 ½ line of code in Figure 1 Regular expression type provider code. The tricky part is how to get Group_AreaCode from the function call.
Because the Group_AreaCode is a property generated from the input string, this property must be the property from a generated type like RegExTypeProviderType. At this point, your intuition might already tell you that another ProvidedTypeDefinition is needed. Your intuition is correct.
Figure 5 MatchEx type
    //generate new type from pattern
    let matchType =
        let t = ProvidedTypeDefinition("MatchEx", Some(typeof))

The new type is defined as “MatchEx” and based on Match. The reason we define this type is because we want to add group name to this type. The only place we can access input string is inside the “DefineStaticParameters” function. It makes sense we add the property over there.
Figure 6 Add property to the MatchEx type
//add properties for matchEx
let reg = Regex(template)
  for name in reg.GetGroupNames() do
    if name <> "0" then
       let propertyName = sprintf "Group_%s" name
       let pi = ProvidedProperty(propertyName, typeof)
       pi.GetterCode <- fun args -> <@@ ( (%%args.[0]:Match).Groups.[name] ) @@>

The code in Figure 5 Add property to the MatchEx type is to add property in the MatchEx type. Now we need to hook the MatchEx to its container type:
1.       Put Figure 4 MatchEx type inside the DefineStaticParameters.
2.       Add “t.AddMember(matchType)” at the end of the “fun theTypeNameOfThisInstantiation args” function. This line will link the embedded type.
Almost there! Next step is to change the Match function’s return type to this newly created type “MatchEx”.
Figure 7 MatchEx type with Match function
//add match function
let parameters = [ ProvidedParameter("data", typeof)]
let mi = ProvidedMethod("Match", parameters, matchType)
mi.AddXmlDoc("Match function")
mi.InvokeCode <- fun args -> <@@ (Regex(template).Match((%%args.[1]:string))) @@>
t.AddMember mi

If the highlighted part is how to change let a return type to be “MatchEx”. The matchType variable is defined in Figure 5 MatchEx type.

If you are thinking about using Activator.CreateInstance, you can use the above technique to make a function to return your embedded type. 

No comments: