So you wanna GIT ?

Why GIT ?


Every time you’re asked to submit that crucial programming assignment what comes to your mind after completion of the program ? Uh, not a movie. It’s GIT I know ! So, that’s exactly how you proceed towards storing your lovely little program forever so that you can hug your code every time you miss your ex right?

Git is a terrific way to store not just code but almost every kind of file you can think of that can be stored online. It employs altogether a different approach of controlling versions of your files. While other similar tools like SVN are Centralized Revision Control Systems (CRVS), Git employs Distributed Revision Control System(DRVCS). So, every person having access to your repository can clone your code and maintain a local copy of exactly the same data as in GitHub and can make changes locally with full control. So, in some catastrophical situation, god forbid, some client who cloned your code from Peru can help you to recover your data !!!

There can be nothing better than Git’s own website but I am here to help you skip some contrived details and dive right into basic usage of Git.


Basic Commands

        • git init – Initializes a local Git repository in your current local directory
        • git clone <remote_repository_name> – Clones a remote repository to your local git repository
        • git add <file_name1,file_name2...> or git add * or git add . – Stages (consider it analogically as a local buffer where you store files that are to be committed) your marked files or all for commit
        • git commit -m "<message>" – Commits the staged files with ‘message’ as commit message
        • git push <remote_name> <remote_branch_name> – Pushes your code to the remote named ‘origin’ and its branch ‘master’
        • git remote add <remote_name> <remote_repository_URL> – Adds a remote repository of given name and given URL
        • git remote -v – Displays all remotes with URLs
        • git checkout -b feature_x – Creates a new branch ‘feature_x’ and switches to the branch
        • git branch -a or git branch -r or git branch– Shows branches ; ‘r’ for remote only
        • git checkout -- <file_name> – Discards changes to file
        • git log – Gives commit history
        • git status – Gives status of staging area and working directory
        • git checkout <branch_name> – Moves the HEAD to the specified branch
        • git pull – Pulls code from remote repository’s tracker branch (default /master) to current local branch
        • git fetch origin – Fetches code first from Remote repository’s tracker branch to local branch without merging th code. Gives a chance to check the code before merging
        • git merge <branch_name> – Merges fetched code from specified branch

Note : I will be covering a topic on Basic Branching and Merging in GIT including Merge conflicts soon


Don’t play around with the commands below !!!

      • git reset --soft HEAD~ – Move HEAD to previous commit, Staging Area stays the same
      • git reset --mixed HEAD~ – Move HEAD to previous commit, Staging area also gets erased, Working Directory unaffected
      • git reset --hard HEAD~ – Move HEAD to previous commit, Staging area erased, Working directory moved to previous commit DANGEROUS !!!
      • git reset HEAD <file_name>– Unstages the specified file

A Simple MD5 Password Cracking Program using Python


While on my way to completion of my program in Information Systems at University of Cincinnati, I stumbled upon this very interesting assignment in my Cloud Computing course offered by the computer science department. It was a simple 4 or less character strings password breaker that attacks a given 32 or less characters’ hex string and provides the strings that are in its VALUE BUCKET. For example we have the sample execution :

Attacking d077f…

{‘found’: [‘cat’, ‘gkf9’]}

In the aforementioned example we are attacking the first 5 characters of a 32 digit hashed hex string where the values collide. That’s another topic of interest that I will discuss later.

The program uses mincemeat.py module from https://github.com/bigsandy/mincemeatpy. This is a Python 2.7.x Map Reduce library that can divide map and reduce tasks to distributed clients to make tasks faster. In my upcoming posts I will write about Map Reduce and Hadoop.

Logic

  • Generate all possible strings of size 1 to 4 using (0-9) and (a-z)
  • This can be done in various ways like using pre-built libraries or by some fresh logic like generating first the two character strings and then looping them and appending the same two character strings to them. Once ready, we can choose any series from the list starting with any value from 0 to z ,say , 0000 to 0zzz and consider their last three characters as another addition to our main list. Once done, we can take two character strings and append to the main list and finally, one character strings. This way, we have a total list generating all possibilities of strings form 1 to 4 characters of {0-9 and a-z} in any combination.

  • Build grains using modulus technique and send to map function.
  • In the original list ‘bigdata’ we find the length of the list as len(bigdata) and find all its factors. Once found, we can think of the possible number of clients that will execute the map functions and divide the list accordingly in a dictionary of lists , say, {[‘0’, ‘list-chunk1’],[‘1′,’list-chunk2’]…} and build a datasource dictionary using this to be sent to the servers.

  • Since the map function and reduce function cannot use global variables from the parent program we have to pass the input hashed hex string in the datasource itself by a simple technique of sending a dictionary within a dictionary. So, instead of {[‘0’, ‘list-chunk1’],[‘1′,’list-chunk2’]…} the datasource looks like {[‘0’, {‘d077f’,’list-chunk1′}],[‘1’,{‘do77f’,’list-chunk2′]}…} etc where every key value pair is being sent to a separate map function or a different client. This can be unwrapped in the map function to obtain the hashed hex string ‘d077f’ and the list that has to be hashed string wise to check if its first five characters match ‘d077f’ (example).

  • Send output from map to reduce function
  • If a match occurs, send the hashed query string ‘d077f’ (example) and the values that hash to it to reduce function.

  • Send output from reduce function to the parent program
  • If map sends a match, capture the results and aggregate all such results into a single list. Example, {‘d077f’, [‘cat’,’wtf’]} send it to the parent program.

  • Capture reduce functions output
  • Once the parent function receives data from the reduce function the data can be displayed.

Program : 

import hashlib
import string
import itertools
import sys
import mincemeat

inputx = sys.argv[1]
deadlist=[]
deadlist1=[]
deadlist2=[] #Final List
deadlist1string = []
deadlist2string = []
print "Attacking %s..."%sys.argv[1]
m = range(0,10)
for num in m:
	deadlist1string.append(str(num))		
for char in list(string.lowercase):
	deadlist1string.append(char)
for char in deadlist1string:
	for inchar in deadlist1string:
		#First two chars
		deadlist1.append(char)
		deadlist1.append(inchar)
		deadlist.append(''.join(deadlist1))
		deadlist1=[]

#second two chars
for stringx in deadlist:
	deadlist2string.append(stringx)
	for stringx2 in deadlist:
		deadlist2.append(stringx+stringx2)

length3char = len(deadlist2)/(36)
listFor3Digits = deadlist2[:length3char]

#print listFor3Digits
for stringx in listFor3Digits:
	singlestring = stringx
	deadlist2.append(''.join(list(singlestring)[-3:]))
deadlist2+=deadlist2string+deadlist1string

'''listx = []
haha = len(deadlist2)
for i in xrange(1,haha+1):
	if (haha%i == 0):
		listx.append(i)'''

bigdata = []
#print deadlist2[::5188]
for i in xrange(0,333):
		loldict = {}
		loldict[inputx] = deadlist2[(5188*i):(5188*(i+1))]
		bigdata.append(loldict)
		loldict = {}

datasource = dict(enumerate(bigdata))#333*5188 - 333 key value pairs where values are lists
#chunkData = list(itertools.islice(datasource.items(), 1,2)) 


def mapfn(k, v):
	for key in v.keys():
		for w in v[key]:
			if hashlib.md5(w).hexdigest()[:len(key)] == key:
				yield hashlib.md5(w).hexdigest()[:len(key)], w

def reducefn(k, vs):
	result = vs
	return result

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="changeme")
for mm in results:
	found = {}
	found["found"] = results[sys.argv[1]]
	print found
	found = {}