Saturday, December 8, 2018

Automation with NodeJS-Puppeteer & Visual Studio Code

I saw many attempts to automate the processing using commercial software. However, I do not see this is necessary if this task is not that huge. When more and more application is moving to the web-based. Automation of the Chrome browser will automate most of the scenarios. Therefore, I feel 90% of those cases won't need to pay and still get the job done.

First, we need to set up IDE. There are three files needs to be updated and placed in the place. Please make sure you put launch.json and tasks.json are under .vscode folder. 

  • launch.json: it is used to launch the application with predefined tasks such as Typescript build. 
  • tsconfig.json is to used to build the application using Puppeteer
  • tasks.json is to define the Typescript build process

The content of these three files
  • launch.json

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "type": "node",
            "request": "launch",
            "name": "Launch Program",
            "sourceMaps": true,
            "preLaunchTask": "TSC",
            "program": "${workspaceFolder}/app.ts",
            "outFiles": [
                "${workspaceFolder}/**/*.js"
            ]
        }
    ]
}'


  • tasks.json

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
    // See https://go.microsoft.com/fwlink/?LinkId=733558
    // for the documentation about the tasks.json format
    "version": "2.0.0",
    "tasks": [
        {
            "label": "TSC",
            "type": "typescript",
            "tsconfig": "tsconfig.json",
            "problemMatcher": [
                "$tsc"
            ],
            "group": {
                "kind": "build",
                "isDefault": true
            }
        }
    ]
}


  • tsconfig.json


1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
    "compilerOptions": {
        "target": "es5",
        "module": "commonjs",        
        "lib": ["es2015", "dom"],
        "sourceMap": true
    },
    "include": [
        "src",
        "node_modules/@types/puppeteer/index.d.ts"
      ]
}


The node_modules folder is created from the npm command provided by the NodeJS framework. The command to run is listed the following. The first line is to install TypeScript compiler which is not installed by default with VS Code. The second line is to install Puppeteer. The last line is to install the library for Typescript to work with Puppeteer.


1
2
3
npm i -g typescript
npm i --save puppeteer
npm i --save-dev @types/puppeteer

The last file is the Puppeteer file:


1
 2
 3
 4
 5
 6
 7
 8
 9
10
import * as puppeteer from 'puppeteer'
 
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://www.google.com');
  await page.screenshot({path: 'example.png'});
 
  await browser.close();
})();


Sunday, November 11, 2018

Move NLP IKVM to F# .NET-Friendly

Since the last post, I read Sergey's code. Then I decided to work on refactoring the code to store the data into the .NET and F# format. Stanford NLP does provide a server. I still want to make it .Net friendly and also get myself familiar with the NLP core.

I prefer the project-based solution than the interactive solution is because it can have Visual Studio's watch, immediate windows, and debug visualizer. Once I got the information into a comfortable environment, it will be easy to move forward. 

The 200-line code is to build up a structure like the following:

I call multiple sentences a story. A story contains (1) sentences and (2) cross-references. The sentence structure shows NLP info about a sentence. The sentence includes a token list, tree, and dependency graph. The cross-references maintain the relationship among elements from different sentences. The first file is the main file. It shows how to invoke the underlying functions and show the structures. 


1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Learn more about F# at http://fsharp.org
// See the 'F# Tutorial' project for more help.

open System
open System.IO
open java.util
open java.io
open edu.stanford.nlp.pipeline
open edu.stanford.nlp.ling
open Utils.NLPUtils
open Utils.NLPExtensions
open edu.stanford.nlp.util
open edu.stanford.nlp.trees
open edu.stanford.nlp.semgraph
open Utils.NLPStructures
open edu.stanford.nlp.coref

[<EntryPoint>]
let main argv = 
    let text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply email.";

    // Annotation pipeline configuration
    let props = Properties()
    props.setProperty("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref") |> ignore
    props.setProperty("ner.useSUTime","0") |> ignore

    let pipeline = StanfordCoreNLP(props)

    // Annotation
    let annotation = Annotation(text)
    pipeline.annotate(annotation)

    //get annotation info
    let keys = annotation.GetToken<HashMap>(typeof<CorefCoreAnnotations.CorefChainAnnotation>)
    let mentions = keys |> Seq.exactlyOne |> getMentions        

    let sentences = 
        [
            let sentences = annotation.GetToken<CoreMap>(typeof<CoreAnnotations.SentencesAnnotation>)

            for s in sentences do
                let tokens = s.GetToken<CoreLabel>(typeof<CoreAnnotations.TokensAnnotation>)
                let words = getWords tokens

                let t = s.GetToken<Tree>(typeof<TreeCoreAnnotations.TreeAnnotation>)  
                let tree = t |> Seq.exactlyOne |> buildTree words
       
                let deps = s.GetToken<SemanticGraph>(typeof<SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation>)
                let relationships = deps |> Seq.exactlyOne |> getDependencyGraph words
                let sentence = { Words = words; Dependency = relationships; Tree = tree; }
                yield sentence
            ]

    let story = 
        {
            CrossLinks = mentions;
            Sentences = sentences;
        }

    printfn "%O" story

    0 // return an integer exit code


The second file is the library file. I do not think the structure will stay same after two weeks. I might decide to add more fields. But currently the foundation is there.



1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
namespace Utils

module NLPStructures = 

    type Word = string
    type Ner = string
    type POS = string
    type Index = int
    type Relationship = string
    type Span = int * int
    type Head = int * string
    type SentenceIndex = int
    
    type WordType = 
        {
          Word : Word
          Ner: Ner
          POS: POS
          Index: Index
        }

        member this.IsSame word = 
            match this with
            | { Word = w; } -> w = word

        member this.IsSame index = 
            match this with
            | { Index = i } -> i = index

    type MentionEntity =
            {
                Index: Index
                Relationship: Relationship
                Span : Span
                Head : Head
                SentenceIndex : SentenceIndex
            }
    
    type RepresentiveMention = MentionEntity

    type CrossLinkType =
        {
            Index: Index
            RepresentiveMention : RepresentiveMention
            Mentions : MentionEntity list
        }
    
    type DependencyGraph = 
        | Link of Relationship * WordType * WordType
        | CrossLink of CrossLinkType
        
    type TreeNode = 
        | Node of string * WordType
        | SubNodes of POS * TreeNode list
    
    type SentenceType =
        {
            Words: WordType list
            Dependency: DependencyGraph list
            Tree: TreeNode
        }

    type StoryType = 
        {
            CrossLinks : CrossLinkType list
            Sentences : SentenceType list
        }

module NLPUtils =
    open edu.stanford.nlp.trees
    open edu.stanford.nlp.ling
    open edu.stanford.nlp.semgraph

    open NLPStructures

    let toEnumerable<'T> (obj:obj) = 
        let l = obj :?> java.util.ArrayList
        l |> Seq.cast<'T>    
            
    let toJavaClass (t:System.Type) = java.lang.Class.op_Implicit(t)

    let findWord (words:WordType list) (word:Word) = 
        words |> Seq.find (fun n -> n.IsSame(word))
    let findIndex (words:WordType list) (i:Index) = 
        words |> Seq.find (fun n -> n.IsSame(i))

    let inline getObjFromMap (x:^T) t = 
        let key = t |> toJavaClass
        (^T : (member get : java.lang.Class -> obj) (x, key) )

    let getWords (tokens:seq<CoreLabel>) = 
        [ 
            for token in tokens do
                let word = typeof<CoreAnnotations.TextAnnotation> |> getObjFromMap token :?> Word
                let pos  = typeof<CoreAnnotations.PartOfSpeechAnnotation> |> getObjFromMap token :?> POS
                let ner  = typeof<CoreAnnotations.NamedEntityTagAnnotation> |> getObjFromMap token :?> Ner
                let index = token.index()
                let word = { Word = word; Ner = ner; POS = pos; Index = index }
                yield word
        ]

    let getDependencyGraph words (deps:SemanticGraph)  = 
        [
            for edge in deps.edgeListSorted().toArray() |> Seq.cast<SemanticGraphEdge> do
                let gov = edge.getGovernor()
                let dep = edge.getDependent()

                let govEntity = findIndex words (gov.index())
                let depEntity = findIndex words (dep.index())

                let e = Link(edge.getRelation().getLongName(), govEntity, depEntity)
                yield e
        ]
    
    let rec buildTree words (tree:Tree)  = 
        let label = tree.value()
        let children = tree.children()
        if children.Length = 0 then
            let x = tree.label() :?> CoreLabel
            let i = x.index()

            let entity = findIndex words i
            Node(label, entity)
        else
            let nodes = children |> Seq.map (fun tree -> buildTree words tree) |> Seq.toList
            SubNodes(label, nodes)

    let getMention (mention:edu.stanford.nlp.coref.data.CorefChain.CorefMention) = 
        let mentionId = mention.mentionID
        let span = (mention.startIndex, mention.endIndex)
        let relation = mention.animacy.name()
        let head = (mention.headIndex, mention.mentionSpan)
        let sentenceIndex = mention.sentNum
        let m = { Index = mentionId; Relationship = relation; Span = span; Head = head; SentenceIndex = sentenceIndex; }
        m

    let getMentions (keys:java.util.HashMap) = 
        [
            for key in keys.keySet().toArray() do
                let v = keys.get(key) :?> edu.stanford.nlp.coref.data.CorefChain
                let representiveMention = v.getRepresentativeMention()
                let m = getMention(representiveMention)

                let index = v.getChainID()
                let mentions = v.getMentionsInTextualOrder().toArray()
                let ms = mentions 
                         |> Seq.cast<edu.stanford.nlp.coref.data.CorefChain.CorefMention> 
                         |> Seq.map getMention
                         |> Seq.toList
                let r = { Index = index; RepresentiveMention = m; Mentions = ms; }
                yield r
        ]
    
    let returnSeq<'T> (x:obj) = 
        if x :? java.util.ArrayList then
            toEnumerable<'T> x
        else
            Seq.singleton (x :?> 'T)
    
module NLPExtensions = 
    open NLPUtils
    open edu.stanford.nlp.util

    type CoreMap with
        member this.GetToken<'T> (t:System.Type) = 
            t |> getObjFromMap this |> returnSeq<'T> 

The execution result shows below:




Saturday, November 3, 2018

SelfNote: Stanford NLP

I create this page as the master page for using F# on Stanford NLP.

Monday, October 29, 2018

F# Stanford NLP is running

After some configuration, I can successfully run the first NLP project with F#. Special thanks to Sergey's post! The post is very informative. His solution is based on the F# interactive while I prefer to use the project-based solution.

Sergey points out that one of the common problems to setup is the path problem. His claim is so true. I had stuck in this problem for days. Here is the process I followed.


  • Open Visual Studio 2017 and create an F# console application. 
    • I tried .net core app; it does not work as the IKVM has the dependency on the .NET framework
  • compile the F# console application and remember the debug folder location
  • Open NuGet and retrieve Stanford NLP CoreNLP. The current version is 3.9.1
    • Current Stanford NLP is 3.9.2. I suggest you download 3.9.1 version
  • download the Standard NLP 3.9.1 zip file
  • unzip the 3.9.1 file to the F# console app debug folder
  • go the unzipped folder and find the model JAR file

  • download WINRAR to unzip the JAR file to a folder, this folder should contain a folder called "EDU"
  • copy the "EDU" folder up to debug folder, so the structure in the "DEBUG" folder is like the following.
  

The F# file I was using is listed below. Set the "EDU" folder to the debug folder can save you from setting the CurrentDirectory. 


1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Learn more about F# at http://fsharp.org
// See the 'F# Tutorial' project for more help.

open System
open System.IO
open java.util
open java.io
open edu.stanford.nlp.pipeline

[<EntryPoint>]
let main argv = 
    let text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply.";

    // Annotation pipeline configuration
    let props = Properties()
    props.setProperty("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref") |> ignore
    props.setProperty("ner.useSUTime","0") |> ignore

    let pipeline = StanfordCoreNLP(props)

    // Annotation
    let annotation = Annotation(text)
    pipeline.annotate(annotation)

    // Result - Pretty Print
    let stream = new ByteArrayOutputStream()
    pipeline.prettyPrint(annotation, new PrintWriter(stream))
    printfn "%O" <| stream.toString()
    stream.close()

    printfn "%A" argv
    0 // return an integer exit code

Executing the NLP program seems taking a lot of memory. My program uses 2G memory and takes a while to show the result. Hopefully, your computer is faster enough. :)

Thursday, October 25, 2018

F# Enum usage II

I want to take a quick note on the F# enum usage again. The supporting of the space and unique character in F# language and editor is a great feature can make your development work much more comfortable. I am in the Web API front these days.

One of the requirement is to provide options to end users. If an application only takes "excellent choice", "good option", "ok choice", and "bad and never go there" as options, I'd like to offload these values check to the type system instead of handling the error in my code.

If I can declare the enum like the following

type enum MyEnum =
    ``excellent choice`` = 0
    | ``good option`` = 1
    ....

After declaring the enum-as-string attribute on the attribute [JsonConverter(typeof(StringEnumConverter))], the output and input validation is solved in one shoot.

Monday, September 24, 2018

F# Enum's Usage

I have a web service project and I found my team constantly needs to convert enum to a string. The only conversion is adding space. It wastes lots of time and involves in reflection which slows down the run time performance.

Enum type is one of the favorite types in F#. C#'s enum type does not have the ability to define a value with space. Having an enum value with space is very important because this feature can save me lots of time to output enum value as string.

For example, if the enum value can be "Post Release", the ToString function can output a nice string and won't have to use attribute and reflection to do the job. C# does have the ability to have space in enum but it will need to use reflect and emit to generate the type. If you can use F#, the problem can be easily solved. Please check the following F# code:

namespace ClassLibrary1

type public EnumEng =
    Registration = 0
    | ``Under Review``=1
    | Approval = 2
    | Release = 3
    | ``Post Release`` = 4

type public EnumChn =
    注册 = 0
    | 审批 = 1
    | 批准 = 2
    | 发布 = 3
    | 发布后 = 4

type public EnumIndex =
    Reg = 0
    | Review = 1
    | Approval = 2
    | Release = 3
    | PostRelase = 4


module Test =
    let a = EnumEng.``Under Review``
    let b = EnumChn.审批

In the C# side, the intellisense won't display the enum value if it contains space. However, it will be displayed in debug mode. You can execute the following code and stop at the end of the function.

        static void Main(string[] args)
        {
            //get all string from enum0
            var strs = Enumerable.Range(0, 5)
                                 .Select(n => (ClassLibrary1.EnumEng)n)
                                 .Select(n => n.ToString())
                                 .ToList();

            // get all string from enum2
            var i = Enumerable.Range(0, 5)
                              .Select(n => (ClassLibrary1.EnumChn)n)
                              .Select(n => n.ToString())
                              .ToList();

            // parse string to enum
            var v = strs.Select(n => Enum.Parse(n))
                        .ToList();

            var x = ClassLibrary1.EnumIndex.PostRelase;

            var str = (ClassLibrary1.EnumChn)x;
        }

Both # and C# support non-English variable name, it will provide a way to localize the output as well. From the sample above, the EnumIndex is the type can used in C#/F# code. Once the value needs to be output as string, it can be convert to EnumEnglish (English string) or EnumChinese(Chinese string).



Is that convenient?

Thursday, September 13, 2018

SelfNote: WebAssembly With Blazor & .NET Core Upgrade

After set up the HTML5/Typescript roadmap for my group, I started to move my interest to other UI/visualization technology. With the Blazor is in the starting phase, I feel this is an opportunity to get real-time web rendering and .net core in one shoot. The get start part in Blazor is very helpful. Only one small bug for the project creation if you upgrade .NET core.

From PowerShell window, you can find the .Net core version by using "dotnet --info". The Blazor service generates the global.json file, which sits beside the solution file. This file is required to load the .Net core. The default value in the file is 2.1.300. 

{
  "sdk": {
    "version": "2.1.300"
  }
}
My computer is new and I directly installed 2.1.402 version. Now I got the error complains about "cannot import package". After change that version to 402, I can now manually add those created projects (Blazor Server and Client) to the solution.