Thursday, June 11, 2015

Data Mining with Weka

My Rating 4/5
The course was easy enough. Concepts were explained well. What i really liked about this course was that the pace was good not too dull and not to fast/ difficult. I would recommend this course to anyone who wants to get started with AI/ ML.

Its a good over view however the mathematical depth was missing.

Image result for weka

I am looking forward to to the advanced version of this course.

Sunday, May 17, 2015

Installing Apache Zeppelin (Local Mode)

Hi Folks,
Playing around with Apache Zepplin today.

Installation Steps Involved
a) Install Maven
b) Install Git
c) Clone the Zeppelin Repository (https://github.com/apache/incubator-zeppelin)
d) mvn install -DskipTests

And vola you have Apache Zeppelin installed


Start/Stop Zeppelin
bin/zeppelin-daemon.sh start
bin/zeppelin-daemon.sh stop

Wednesday, May 6, 2015

Installing Hadoop 2.x on MAC (Yosemite)

After breaking my head on several videos and blogs i finally got it right.
To install hadoop 2.x on mac i would recommend you have
a) Java Installed (If java is not installed install JAVA JDK)
b) Password-less ssh
(Check by typing down below)

ssh localhost


If password-less ssh is not enabled
Ensure Remote Login under System Preferences -> Sharing is checked to enable SSH.


ssh-keygen -t rsa -P ""


cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys


ssh localhost

(Make sure you are able to ssh password less and while generating keys make sure its blank)
(For this tutorial we will configure Hadoop in psedu0-distributed mode -- http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation)

Download Hadoop
Extract Hadoop

tar -xvzf ~/Downloads/hadoop-2.7.0.tar.gz


And edit the following in the configuration files

edit: etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
edit: etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
view raw gistfile1.txt hosted with ❤ by GitHub

format namenode:
bin/hdfs namenode -format
start dfs:
sbin/start-dfs.sh
Create user directories
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/<username>
Testing MapReduce
bin/hdfs dfs -put etc/hadoop input
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'
Examimne Output
bin/hdfs dfs -cat output/*
Namenode UI: http://localhost:50070/
RM UI : http://localhost:8088
view raw gistfile1.txt hosted with ❤ by GitHub


And Voilà you have a petit Hadoop cluster for yourself.


Saturday, March 21, 2015

Building Pig from Source

Also refer https://cwiki.apache.org/confluence/display/PIG/How+to+set+up+Eclipse+environment
git clone -b spark https://github.com/apache/pig
ant -Dhadoopversion=23 jar
ant -Dhadoopversion=23 eclipse-files
view raw gistfile1.txt hosted with ❤ by GitHub