Is there a methodology for collecting and analyzing big data?

Jahangir Mohammed provided the most detailed response to our question on Quora – September 2, 2012. We wanted to find out if there was a systematic process/ method involved in the analysis, collection and presentation of big data. Here is Jahangir’s answer:

“I am inclined to say the approach is definitely systematic, but there are lots of options and one needs to figure out what is the best implementation for their specific use case.

 Data collection:

There are various distributed data collection and aggregation frameworks like Flume[1], Chukwa[2] and Scribe[3] which can be leveraged efficiently to collect and aggregrate data in real-time from lots of servers.

If one has the data in some form sitting in RDBMS, they can use sqoop[4] to transfer data between RDBMS and to a big-data framework like Hadoop[5](meant HDFS).

Data analysis:

Hadoop[5] is a well-known framework that allows distributed processing and analysis of big data. There are couple of other frameworks like Cascalog[6], storm[7] – stream processing, some MPI frameworks and some BSP frameworks(like Apache Hama[8]) and Dremel’s open source (is currently being worked on) all of which are created to crunch big data. Also, there is Amazon’s EMR[9] or Google’s big query[10] from a cloud perspective, but to keep it explicit there is nothing stopping to run any open source
implementations on cloud.

Presentation/data visualization:

This can be home-grown to using a commercial product. Some of the offerings out there like Datameer[11] and big query[10] do offer some visualizations, dashboards, excel capabilities and so forth.”

[1]. http://www.cloudera.com/blog/201…

[2]. http://incubator.apache.org/chukwa/
[3]. https://github.com/facebook/scribe
[4]. http://sqoop.apache.org/
[5]. http://hadoop.apache.org/
[6]. https://github.com/nathanmarz/ca…
[7]. https://github.com/nathanmarz/storm
[8]. http://hama.apache.org/
[9]. http://aws.amazon.com/elasticmap…
[10]. https://developers.google.com/bi…
[11]. http://www.datameer.com/

Feel free to leave a comment and add your views in the comments section.

Special thanks to Jahangir Mohammed and Vijay Kamath who both took time out to provide answers to our question.

About these ads

One Response to Is there a methodology for collecting and analyzing big data?

  1. Great information. Lucky me I came across your
    blog by chance (stumbleupon). I have book marked it for later!

    Like

Be sure to leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 278 other followers