Monthly Archives: January 2016

  • 0

Why Learn Big Data and Hadoop?

Category : Bigdata

In my experience, people who do things in their career that they are excited about and have a passion for, can go farther and faster with the self-motivation than if they did something that they didn’t like, but felt like they needed to do it for other reasons.  You are awesome in already taking initiative in your career by doing your research including visiting my blog.

This current wave of “big data” has tremendous opportunities.  The deluge of big data is likely to persist in the future.  Tools to handle big data will eventually become mainstream and commonplace, which is when almost everyone is working with big data.  However, enterprising folks can still get ahead of the mainstream today by investing in skills and career development.  I realize this may sound like hyperbole, but this is the historical pattern that we have seen around how technology gets adopted and the resulting shifts in the workforce (e.g. printing press, radio, television, computers, internet, etc.).

BigData! A Worldwide Problem:
Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” In simpler terms, Big Data is a term given to large volumes of data that organizations store and process. However,  It is becoming very difficult for companies to store, retrieve and process the ever-increasing data. If any company gets hold on managing its data well, nothing can stop it from becoming the next BIG success!

The problem lies in the use of traditional systems to store enormous data. Though these systems were a success a few years ago, with increasing amount and complexity of data, these are soon becoming obsolete. The good news is – Hadoop, which is not less than a panacea for all those companies working with BigData in a variety of applications has become an integral part for storing, handling, evaluating and retrieving hundreds or even petabytes of data.

Apache Hadoop! A Solution for Big Data:
Hadoop is an open source software framework that supports data-intensive distributed applications. Hadoop is licensed under the Apache v2 license. It is therefore generally known as Apache Hadoop. Hadoop has been developed, based on a paper originally written by Google on MapReduce system and applies concepts of functional programming. Hadoop is written in the Java programming language and is the highest-level Apache project being constructed and used by a global community of contributors. Hadoop was developed by Doug Cutting and Michael J. Cafarella. And just don’t overlook the charming yellow elephant you see, which is basically named after Doug’s son’s toy elephant!

Some of the top companies using Hadoop:
The importance of Hadoop is evident from the fact that there are many global MNCs that are using Hadoop and consider it as an integral part of their functioning, such as companies like Yahoo and Facebook! On February 19, 2008, Yahoo! Inc. established the world’s largest Hadoop production application. The Yahoo! Search Webmap is a Hadoop application that runs on over 10,000 core Linux cluster and generates data that is now widely used in every Yahoo! Web search query.

Facebook, a $5.1 billion company has over 1 billion active users in 2012, according to Wikipedia. Storing and managing data of such magnitude could have been a problem, even for a company like Facebook. But thanks to Apache Hadoop! Facebook uses Hadoop to keep track of each and every profile it has on it, as well as all the data related to them like their images, posts, comments, videos, etc.

Opportunities for Hadoopers:
Opportunities for Hadoopers are infinite – from a Hadoop Admin, Developer, to a Hadoop Tester or a Hadoop Architect, and so on. If cracking and managing BigData is your passion in life, then think no more and Join EconITService Hadoop course and carve a niche for yourself! Happy Hadooping!

  • 3

How to start learning hadoop

The easiest way to get started with Hadoop is Sandbox with VM Player or Virtual Box. It is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials. Sandbox includes many of the most exciting developments from the latest CDH/HDP distribution, packaged up in a virtual environment. You can start working on hadoop environment within 10 minutes.

Hadoop Sandbox provides:

  1. A virtual machine with Hadoop preconfigured.
  2. A set of hands-on tutorials to get you started with Hadoop.
  3. An environment to help you explore related projects in the Hadoop ecosystem like Apache Pig, Apache Hive, Apache HCatalog and Apache HBase.

Lets start downloading and installing hadoop into our windows machine in 10 mins

System requirements to run VMplayer:

RAM: at least 8 GB (For a 2-Node Virtual Cluster). Processor: i3 or above. at least 20 GB free disk space.

Step 1. Download and install VMPlayer from the following websites

Or download and install Virtual Box from the below link.

Step 2. Now download the Sandbox from Hortonworks website or Cloudera. I will explain here Hortonwork sandbox process:

The Sandbox download is available for both VirtualBox and VMware Fusion/Player environments. Just follow the instruction to import the Sandbox into your environment.

1. Open the Oracle VM VirtualBox Manager
You can do so by double clicking the icon:


2. Open the Preferences dialog window.
Select File‐>Preferences… within the Oracle VM VirtualBox Manager


3.Uncheck Auto‐Capture Keyboard within the Preferences dialog window.
Select the Input icon button from the left hand pane of the window first
to get to the following window.

Click the OK button once done.  This will close the Preferences window.

4. Open the Import Appliance window.

Select File‐>Import Appliance… within the Oracle VM VirtualBox Manager


A separate dialog window is put in front of the VM VirtualBox Manager


5. Click on the folder icon that will open a file dialog window.  Select the virtual
appliance file that you downloaded as a prerequisite.  After selecting the file click
the Open button.

NOTE:  The name of the file you have downloaded depends on the version of the
Hortonworks Sandbox you have chosen to download.  The above pictures are referencing
Sandbox HDP version 2.2

Application settings are now displayed.

On Windows after you select the virtual appliance file, you are brought back to this


After clicking on Next, the Appliance Settings are displayed.

6. Modify Appliance Settings as needed.
Within the Appliance Settings section you may wish to allocate more RAM to the
virtual appliance.  Setting 8GB of RAM to the Hortonworks Sandbox virtual appliance
will improve the performance.  Make sure you have enough physical RAM on the
host machine to make this change. To make the change, click on the specific value to
modify and make your edits.  Once finished configuring, click Import.

Progress of the Import


7. Once the import finishes, you are brought to the main Oracle VM VirtualBox
Manager screen.  From the left hand pane, select the appliance you just imported
and click the green Start arrow.

A console window opens and displaying the boot up information.


Once the virtual machine fully boots up, the console displays the login instructions.


8. Use one of the supported browsers mentioned in the prerequisites section of this
document within your host machine.  Enter the URL displayed in the console.  By
default it should be

  • 0

Hello world!

Category : Bigdata

Welcome to

This is welcome post.