Monday, January 6, 2020

Standford NLP Quick Setup on Win10 with WSL

After setting up the Stanford NLP on my PC, I was struggling with how to run it faster. Then I ran into Windows Subsystem for Linux (WSL). I realize that the modification of Stanford NLP is not an option for me. I can use WSL to quickly start an NLP web service and start my work.

I ordered a more powerful VM from Azure and use PowerShell to set up the NLP environment. 

  1. Enable WSL

    Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux

    It will trigger a reboot if WSL is not enabled
  2. Add Ubuntu disco to WSL

    # download ubuntu 18.04 as save it as Ubuntu.appx at local directory
    Invoke-WebRequest -Uri https://aka.ms/wsl-ubuntu-1804 -OutFile Ubuntu.appx -UseBasicParsing

    # add Ubuntu.appx to WSL
    Add-package Ubuntu.appx
  3. Download Standard NLP zip file and unzip it to the current folder

    # download the Stanford NLP and save the zip file locally as "corenlp.zip"
    Invoke-WebRequest -uri http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip -outfile corenlp.zip -UseBasicParsing

    # unzip the corenlp.zip
    Expand-Archive corenlp.zip -DestinationPath .\CoreNlp\
  4. install Java in WSL. Since Stanford NLP does not require Oracle Java, so I use Open Java to make the command shorter

    wsl sudo apt-get update
    wsl sudo apt-get install default-jdk
I go into WSL from PowerShell and launch Stanford NLP from WSL. 

  • go to WSL from Powershell by using "wsl"
  • go to the folder which stores the unzipped Stanford NLP files in step 3
  • run java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
Open Edge and go to http://localhost:9000/, it will show similar UI like http://corenlp.run.

The PowerShell script used to access the localhost 9000 ports are listed below:

$data = "The quick brown fox jumped over the lazy dog."
$url2 = 'http://localhost:9000/?properties={"annotators":"tokenize,ssplit,pos,lemma,ner, entitymentions,depparse,parse,relation,openie,dcoref,kbp","outputFormat":"json"}'
$r = Invoke-RestMethod -Uri $url2 -Method post -Body $data

The annotators are listed here in case you need it.


No comments: