Sunday, January 19, 2020

SelfNote: Setup Spacy Module in Visual Studio

I guess not so many people use Visual Studio doing NLP with Python. I have been searching for a solution on how to add a Spacy module in Visual Studio. I kept getting cannot find en_core_web_sm module error and the only solution I could find is to run python -m spacy download en_core_web_sm. I was struggling to find where in the UI to run the Python command. And finally, the solution came.

you noticed the "Admin" icon? yes, that is the key to solve the problem. With admin permission, you can open PowerShell and run python -m spacy download en_core_web_sm.

I installed en_core_web_sm, but it shows a warning message. And the en_core_web_lg module can eliminate the warning.




Monday, January 6, 2020

Standford NLP Quick Setup on Win10 with WSL

After setting up the Stanford NLP on my PC, I was struggling with how to run it faster. Then I ran into Windows Subsystem for Linux (WSL). I realize that the modification of Stanford NLP is not an option for me. I can use WSL to quickly start an NLP web service and start my work.

I ordered a more powerful VM from Azure and use PowerShell to set up the NLP environment. 

  1. Enable WSL

    Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux

    It will trigger a reboot if WSL is not enabled
  2. Add Ubuntu disco to WSL

    # download ubuntu 18.04 as save it as Ubuntu.appx at local directory
    Invoke-WebRequest -Uri https://aka.ms/wsl-ubuntu-1804 -OutFile Ubuntu.appx -UseBasicParsing

    # add Ubuntu.appx to WSL
    Add-package Ubuntu.appx
  3. Download Standard NLP zip file and unzip it to the current folder

    # download the Stanford NLP and save the zip file locally as "corenlp.zip"
    Invoke-WebRequest -uri http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip -outfile corenlp.zip -UseBasicParsing

    # unzip the corenlp.zip
    Expand-Archive corenlp.zip -DestinationPath .\CoreNlp\
  4. install Java in WSL. Since Stanford NLP does not require Oracle Java, so I use Open Java to make the command shorter

    wsl sudo apt-get update
    wsl sudo apt-get install default-jdk
I go into WSL from PowerShell and launch Stanford NLP from WSL. 

  • go to WSL from Powershell by using "wsl"
  • go to the folder which stores the unzipped Stanford NLP files in step 3
  • run java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
Open Edge and go to http://localhost:9000/, it will show similar UI like http://corenlp.run.

The PowerShell script used to access the localhost 9000 ports are listed below:

$data = "The quick brown fox jumped over the lazy dog."
$url2 = 'http://localhost:9000/?properties={"annotators":"tokenize,ssplit,pos,lemma,ner, entitymentions,depparse,parse,relation,openie,dcoref,kbp","outputFormat":"json"}'
$r = Invoke-RestMethod -Uri $url2 -Method post -Body $data

The annotators are listed here in case you need it.