Monday 26 December 2016

Brief History of Hadoop

 Hadoop is concocted by Doug Cutting while he and his friend Mike Cafarella were under development of  Nutch(open source web search engine). Before Hadoop invention, Google released a paper on Google File System(GFS) in 2003 and a paper on Map Reduce in 2004. These papers are best utilized by Doug and helped him in hand pioneering in their own Nutch Distributed File System(NDFS) and Map Reduce.Later on in 2006 he and Mike logged out of Nutch and framed the independent subproject of Lucene(Name of the entire project where all these subprojects like Nutch etc., stays in) called HADOOP. Hadoop name doesn't mean anything. It's just name of a toy(yellow baby elephant) that Doug's son used to play. Hadoop is maintained by Apache Software Foundation.


At around the same time, Doug joined Yahoo!, which provided a dedicated team and the resources to turn Hadoop into a system that ran at web scale. This was demonstrated in February 2008 when yahoo! announced that it's production search index was being generated by a 10,000-core Hadoop cluster.
Currently in 2016, still Yahoo! holds the biggest Hadoop cluster with 43,000 nodes in it.
Hadoop is currently playing a major role as general purpose storage and analysis platform for big data in many other companies besides Yahoo!, such as Microsoft, Last.FM, Facebook, and the New York Times.
Commercial Hadoop support is available from large, established enterprise vendors, including EMC, IBM, Microsoft, Oracle, as well as from specialist Hadoop companies such as Cloudera, Hortonworks, and MapR.
Doug Cutting is currently working in Cloudera.

Time Line of how things are progressed:
  • 2004—Initial versions of what is now Hadoop Distributed Filesystem and MapReduce implemented by Doug Cutting and Mike Cafarella.
  • December 2005—Nutch ported to the new framework. Hadoop runs reliably on 20 nodes.
  • January 2006—Doug Cutting joins Yahoo!.
  • February 2006—Apache Hadoop project officially started to support the standalone development of MapReduce and HDFS.
  • February 2006—Adoption of Hadoop by Yahoo! Grid team.
  • April 2006—Sort benchmark (10 GB/node) run on 188 nodes in 47.9 hours.
  • May 2006—Yahoo! set up a Hadoop research cluster—300 nodes.
  • May 2006—Sort benchmark run on 500 nodes in 42 hours (better hardware than April benchmark).
  • October 2006—Research cluster reaches 600 nodes.
  • December 2006—Sort benchmark run on 20 nodes in 1.8 hours, 100 nodes in 3.3 hours, 500 nodes in 5.2 hours, 900 nodes in 7.8 hours.
  • January 2007—Research cluster reaches 900 nodes.
  • April 2007—Research clusters—2 clusters of 1000 nodes.
  • April 2008—Won the 1 terabyte sort benchmark in 209 seconds on 900 nodes.
  • October 2008—Loading 10 terabytes of data per day on to research clusters.
  • March 2009—17 clusters with a total of 24,000 nodes.
  • April 2009—Won the minute sort by sorting 500 GB in 59 seconds (on 1,400 nodes) and the 100 terabyte sort in 173 minutes (on 3,400 nodes).
For more info on the history of Hadoop. click here.

No comments:

Post a Comment