Monday, May 2, 2016

Word2Vec with Apache Spark

I tried to use Word2Vec with Apache Spark. Used the first Harry Potter book as Corpus.

Some Interesting results 

Similarities for "Ron"
Hermione 0.8892348408699036
watch 0.8258942365646362
"and 0.7972607016563416

Similarities for "Hermione"
Ron 0.9096277952194214
"and 0.8301450610160828
Hooch 0.829563319683075

Ron and Hermione end up getting married in the last book.  

Similarities for "Voldemort"
Quirrell's 0.9311719536781311
laughing 0.9307642579078674

Similarities for "Harry"
Hermione 0.7319877743721008
George 0.7252205014228821



Harry Potter Corpus

API Refrence


So what is word2vec. It is a shallow neural network model. In short it tries to predict the contextual word from its surroundings and vice - versa.