Friday 13 July 2012

Hadoop - What is it and Why do I care? Or how to explain Hadoop to a girl in the pub.

Pretty Girl: What Do You do. Cameron: I work with Large computer integration problems. I'm currently looking at Hadoop. PG: Whats that? Cameron: Hadoop is the Open Source version of the Google Map/Reduce algorithm. Pretty Girl: My brain hurts. Cameron: Well imagine if you had to download the internet every night and index it and make it searchable. That would be pretty hard to do right? Girl: Well I guess. Why do I want to do that? Cameron: Well that’s how Google makes the Internet searchable every time you want to find something like Britney Spear’s New Song or where to find those really cool Black Milk Stockings.
Google do the indexing for you beforehand so you get your answers back really quick. Don’t you reckon that’s cool? Pretty Girl: Well yeh, I wonder how they do that? Cameron: Good question. Well Google could write a check with lots of zeros on it and give it to Oracle, but Oracle could never make a database big enough and fast enough to solve this problem. So what Google did is they decided to solve the problem in a different way. They split the job up into hundreds and thousands of little jobs. Send a piece of the problem to all of these little cheap computer boxes. Wait for them to work out their little piece of the problem. Once they had done their bit they would send it back to the answer box and it would collate all of their bits of the data and give you the answer.
Pretty Girl: That’s clever. Cameron: Yeh, really clever, and really simple (in hindsight). The first bit is called “Map” where they send the data off to the little boxes. The second bit is called “reduce” where they bring the answer back and give it to you. This is what they call Map/Reduce. Pretty girl: Oh. Cameron: And now Google has made this clever trick available to all of us. And that is called Hadoop.
We can use it to solve problems that we couldn’t solve before. Pretty Girl: Like what? Cameron: Well for example the Beijing Genomic Institute is mapping the DNA of millions of people.
They are developing “personal medicine” which will target the DNA of an individual person and the medicine will work just for them. Currently cancer medicine is called cytotoxic and this means it kills cells – cancer cells and healthy cells. Yucky stuff. With Personal medicine they wont have to do that. . Or what about seeing what people are saying about your brand on twitter (remember Qantas twitter fail.
), Facebook, and linked-in. Or detecting credit card fraud, or … This is only possible with the massive computing power available in the cloud. This is called Big Data. We are seeing the impossible become possible. This is why we should all be aware of what Hadoop, Big Data and Cloud are all about. Pretty girl: What’s the cloud? Cameron: That’s a talk for another day.