Top 5 Posts in 2012: Balcostics’ Year in Review

2012 came with its fair share of challenges and triumphs. We take this opportunity to thank everyone for their encouragement and support. See below some stats on our website and what you our viewers found most interesting in 2012.

The following shows the ‘Top Five Posts‘ that got the most views in 2012:

  1. Top 10 Dancehall/Reggae Artistes on Social Media
  2. Winning Schools: Champs stats from 1910 – 2011
  3. Food for Thought: The Research behind eating Jamaican
  4. Exploiting Jamaica’s Hydroelectric Potential
  5. Is there a methodology for collecting Big Data?

Facebook, reddit, twitter & linkedin were among the top referral sources to balcostics.com.

how did they find balcosticsVisitors to balcostics.com came from across the world, 122 countries in total. The majority of visitors were from Jamaica, The United States & Canada.

Where balcostics visitors

Is there a methodology for collecting and analyzing big data?

Jahangir Mohammed provided the most detailed response to our question on Quora – September 2, 2012. We wanted to find out if there was a systematic process/ method involved in the analysis, collection and presentation of big data. Here is Jahangir’s answer:

“I am inclined to say the approach is definitely systematic, but there are lots of options and one needs to figure out what is the best implementation for their specific use case.

 Data collection:

There are various distributed data collection and aggregation frameworks like Flume[1], Chukwa[2] and Scribe[3] which can be leveraged efficiently to collect and aggregrate data in real-time from lots of servers.

If one has the data in some form sitting in RDBMS, they can use sqoop[4] to transfer data between RDBMS and to a big-data framework like Hadoop[5](meant HDFS).

Data analysis:

Hadoop[5] is a well-known framework that allows distributed processing and analysis of big data. There are couple of other frameworks like Cascalog[6], storm[7] – stream processing, some MPI frameworks and some BSP frameworks(like Apache Hama[8]) and Dremel’s open source (is currently being worked on) all of which are created to crunch big data. Also, there is Amazon’s EMR[9] or Google’s big query[10] from a cloud perspective, but to keep it explicit there is nothing stopping to run any open source
implementations on cloud.

Presentation/data visualization:

This can be home-grown to using a commercial product. Some of the offerings out there like Datameer[11] and big query[10] do offer some visualizations, dashboards, excel capabilities and so forth.”

[1]. http://www.cloudera.com/blog/201…

[2]. http://incubator.apache.org/chukwa/
[3]. https://github.com/facebook/scribe
[4]. http://sqoop.apache.org/
[5]. http://hadoop.apache.org/
[6]. https://github.com/nathanmarz/ca…
[7]. https://github.com/nathanmarz/storm
[8]. http://hama.apache.org/
[9]. http://aws.amazon.com/elasticmap…
[10]. https://developers.google.com/bi…
[11]. http://www.datameer.com/

Feel free to leave a comment and add your views in the comments section.

Special thanks to Jahangir Mohammed and Vijay Kamath who both took time out to provide answers to our question.