Tuesday, April 29, 2014

Crawling - Scrapy

What is Scrapy?

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.


OS Ubuntu

Installing Dependencies 
sudo apt-get install build-essential libssl-dev libffi-dev python-dev

Install scrapy
sudo pip install Scrapy

The above scripts will install scrapy

Monday, April 28, 2014


I am taking algorithms course in coursera.

Wednesday, April 23, 2014

Python week 4

Week 4 of coursera.
Been busy with odd jobs just not able to finish the assignments and be done with it.
I hope to complete all video lectures and assignments today.

Sunday, April 20, 2014

Julia Meetup

I had an amazing experience organizing the first Julia meetup in Inmobi with Abhijit and Kiran.
Gave my first formal open source talk and it felt great.
Link to my slides - http://www.slideshare.net/KrishnaKalyan3/julia-meetup-bangalore

Friday, April 18, 2014

Distributed Cache - Pig

I had been trying to use Distributed-Cache in Pig.
After a lot of trial and errors behold SUCCESS!
Lets get to the meat.

Lets go through the steps.
a)Create an Eval UDF
b)Initialize Distributed Cache using getCachedFiles()
c)Initialize the Data Structure using step b.
d)Finally apply your logic on the data.

Saturday, April 12, 2014

Python Week 3

Week 3 was easy. I also managed to score a whooping 92% in the test.
I am enjoying the mini assignments. Hope to complete every thing.


Tuesday, April 8, 2014

Python Week 2

I completed the mini project however i forgot to give my weekly quiz :( . I was mad at my self for doing this after long research i found that i would be loosing around ~2% from my final score.