Saturday, June 11, 2016

Hadoop


Hadoop is a programming framework used to process Big Data over a network. First we need to understand big data.The definition of Big data is as follows:

extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.

Let us classify Big data into 3 types:
  • Structured Data
  • Semi Structured Data
  • Unstructured Data

Structured data is like the data stored in the SQL tables of Relational database.

Semi Structured data is like the data stored in the XMLs.

Unstructured Data is the data stored in word document or pdf .

To process big data over a network google has developed an algorithm known as MapReduce which Hadoop has made use of.

Let's try a Simple Example to understand

References

No comments:

Post a Comment

ec2-user@ec2 Permission denied