Launching an Amazon EC2 Server – The easy and fast way – AWS!


This is a very quick tutorial on how to setup your first Amazon EC2 Linux instance without much hassles. Please note that this is a very INSECURE way but it is fine as long a you understand how is it insecure. For example, when inbound IP is it allows all the IPv4 Addresses to access your instance from outside via the corresponding protocol.
Important Note : To create an AWS Account you have to provide your credit card or debit card details to Amazon but as long as you are using its free tier it won’t charge you at all. If I am not wrong it is for a year. But please check their policies before proceeding but I have some credit in my AWS account but have never been charged yet. So, nothing to worry. Just choose the right FREE TIER for your First or Trial usage of AWS EC2 Instance.


It’s been assumed that you have an AWS Account ready for usage!

  1. Login to your AWS Account

Screen Shot 2015-04-11 at 2.28.37 PM

2. Choose EC2 as the option

Screen Shot 2015-04-11 at 12.37.16 PM

3. Create a Key Pair by Clicking on Key Pair in the left column

Screen Shot 2015-04-11 at 12.46.33 PM

4. You will see the following screen. Follow the steps accordingly

Screen Shot 2015-04-11 at 12.46.42 PM

Screen Shot 2015-04-11 at 12.46.56 PM

This will create a key and the private key will be added to your Downloads folder. It is a PEM or PPK format file. If you are using Mac OS you will get a txt file downloaded. For example FunnyName.pem.txt. In Windows OS you may get a PPK file which has to be converted to PEM using PuttyGen or something similar. you can google for more details.

5. Create an Instance with the following steps

Screen Shot 2015-04-11 at 12.37.38 PM Launch Instance

Screen Shot 2015-04-11 at 12.37.56 PM First Option РAmazon Linux

Screen Shot 2015-04-11 at 12.38.23 PM First Option and then, the Blue Tab

Screen Shot 2015-04-11 at 12.38.39 PMNothing and the Blue Tab

Screen Shot 2015-04-11 at 12.38.53 PMNothing and the Blue Tab

Screen Shot 2015-04-11 at 12.40.01 PMYou can add rules for traffic here

Screen Shot 2015-04-11 at 12.40.25 PMLaunch

Screen Shot 2015-04-11 at 1.07.12 PMSelect the Pair you created and Launch Instance

6. Login using the tutorial for Mac Users. There will be a slightly different login way for Windows users but not very tough if you google. All you gotta do is use Putty or WinSCP for access after generating your PEM key from PPK key.

Screen Shot 2015-04-11 at 1.18.32 PM



In this article I would like to introduce the usage of Hadoop in your local machine. I will also give a hint about how to use it in a cluster provided you have access to the same in my future post in a similar topic. First, you have to Login to a Unix or Linux machine. If you have one, good and great. Else you can use Amazon’s Linux Server for free if you choose its free tier machine. I will write another tutorial on the usage and access to AWS Linux Server in a future post. The following are the Steps to follow to SETUP your system.

  • Install Java Developer Kit latest Suite – Java 8 by first downloading the tar file
  • tar zxvf jdk-8.xx-linux-x64.tar.gz or YOUR Tar File
  • wget
  • tar zxvf hadoop-2.6.0.tar.gz
  • cd ~
  • vi .bashrc
  • Paste the following to the .bashrc file
    export JAVA_HOME=/usr/lib/jvm/java-1.8.0
    export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
    export HADOOP_INSTALL=
    export HADOOP_USER_NAME= any name
  • Go to any terminal window and type hadoop
  • If you’re not getting any error you are good wit the installation!!! Congrats!

Local Usage

When you are using Hadoop using something other then Java a good way would be to use the Streaming Mode. It would generally take inputs in the form of standard user input or output which can be provided indirectly. What that exactly means will be explained soon. Just keep in mind that the Hadoop Streaming process uses two distinct¬†programs¬†having your Map and Reduce functions. ¬†Unlike mincemeat’s Map function it has actually a dedicated program¬†which performs the Map task. You can check mincemeat’s Map Reduce implementation here. Similarly, it has a dedicated file to perform the Reduce task.

Please note that in real world with multiple machines in clusters to perform your task, you can also use one Map and more than one Reduce implementing files. 

So, now you’re ready with your Hadoop. What next? Yup, you gotta write your Map implementer and Reduce implementer as well. In this case we will assume that we need only one Reduce implementer and the problem to solve will be to print the sum of a given set of numbers from a list with every number in a newline and also, print the count of the numbers. Let’s break this solution¬†into easy verbal steps as follows :

  1. Write a Map function(program) that will print “1 <number>” for every number that it encounters in every line and not just distinct numbers but all occurrences.
  2. Write a Reduce function(program) that will read every line of output from the Map function’s(program’s) in the form “Key Value” where Key will be 1 and the Value will be the number.¬†
  3. The Reduce function(program), as the next step, will aggregate all the unique Values and add up their count of 1s which are their Keys.
  4. Once the Keys are counted for every distinct Value all you need to do is display the SUM of the Keys and the SUM of the (Values*Keys). 

While, the former will give you the count of all the numbers, the latter will give you the total of all the values. You can also count all the occurrences of all numbers before making them distinct pairs to print the sum before even counting the number of numbers. This is so simple right? So, let’s get our hands dirty on the code.


Map Program:

#!/usr/bin/env python
import sys
import math
dp = {}
listNums = list(sys.stdin)
for number in listNums:
print “1\t%s%(number.strip())

Reduce Program:

#!/usr/bin/env python
import sys
count = 0
sumNum = 0
for number in sys.stdin:
(key,val) = number.strip().split(\t,1)
if(int(key) == 1):
count += int(key)
sumNum += int(val)
print “count\t\t\t%s\nsum\t\t\t%s%(count,sumNum)

Programming part is over. What next ? Running the programs!!! How ? Not directly with console input and console output but THROUGH HADOOP. This can be done by writing a small command in bash or writing a Script instead for the same. Let’s see what it is!

  • hadoop jar $HADOOP_INSTALL/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar
    -input YOUR_INPUT_FILE/DIRECTORY -output YOUR_OUTPUT_DIRECTORY -mapper -reducer
    – Ctrl+C

Copy and paste this in a file with the extension sh and change privileges of the file with the following commands :

  • touch

Paste your code in the file by :

  • vi – Ctrl+V :wq Return
  • chmod 755
  • ./

Your Program should run properly with the output as :
count – your count
sum – your sum

Wow!!!! You are a Hadoop Rookie now ūüėÄ

Tips :

To keep testing your program during the development phase you can check for the correctness by

  • You ALSO need to keep removing the OUTPUT_DIRECTORY for every execution or you can use a new one instead. Otherwise you will get bad errors!