Real time use cases of Hadoop
Category : Bigdata
As data continues to grow, businesses now have access to (or generate) more data than ever before–much of which goes unused. How can you turn this data into a competitive advantage? In this article, we explore different ways businesses are capitalizing on data.
We keep hearing statistics about the growth of data. For instance:
- Data volume in the enterprise is going to grow 50x year-over-year between now and 2020.
- The volume of business data worldwide, across all companies, doubles every 1.2 years.
- Back in 2010, Eric Schmidt famously stated that every 2 days, we create as much information as we did from the dawn of civilization up until 2003.
The big questions: Where is this data? How can you use it to your advantage?
If you want to capitalize on this data, you must first begin storing it somewhere. But, how can you store and process massive data sets without spending a fortune on storage? “That’s where Hadoop comes into play”
Hadoop is an open-source software framework for storing and processing large data sets. It stores data in a distributed fashion on clusters of commodity hardware, and is designed to scale up easily as needed. Hadoop helps businesses store and process massive amounts of data without purchasing expensive hardware.
The great advantage of Hadoop: It lets you collect data now and ask questions later. You don’t need to know every question you want answered before you start using Hadoop.
Once you begin storing data in Hadoop, the possibilities are endless. Companies across the globe are using this data to solve big problems, answer pressing questions, improve revenue, and more. How? Here are some real-life examples of ways other companies are using Hadoop to their advantage.
1. Analyze life-threatening risks: Suppose you’re a doctor in a busy hospital. How can you quickly identify patients with the biggest risks? How can you ensure that you’re treating those with life-threatening issues, before spending your time on minor problems? Here’s a great example of one hospital using big data to determine risk–and make sure they’re treating the right patients.
“Patients in a New York hospital with suspicion of heart attack were submitted to series of tests, and the results were analyzed with use of big data – history of previous patients,” says Agnieszka Idzik, Senior Product Manager at SALESmanago. “Whether a patient was admitted or sent home depended on the algorithm, which was more efficient than human doctors.”
2. Identify warning signs of security breaches: What if you could stop security breaches before they happened? What if you could identify suspicious employee activity before they took action? The solution lies in data.
As explained below, security breaches usually come with early warning signs. Storing and analyzing data in Hadoop is a great way to identify these problems before they happen.
“Data breaches like we saw with Target, Sony, and Anthem never just happen; there are typically early warning signs – unusual server pings, even suspicious emails, IMs or other forms of communication that could suggest internal collusion,” according to Kon Leong, CEO, ZL Technologies. “Fortunately, with the ability to now mine and correlate people, business, and machine-generated data all in one seamless analytics environment, we can get a far more complete picture of who is doing what and when, including the detection of collusion, bribery, or an Ed Snowden in progress even before he has left the building.”
3. Prevent hardware failure:Machines generate a wealth of information–much of which goes unused. Once you start collecting that data with Hadoop, you’ll learn just how useful this data can be.
For instance, this recent webinar on “Practical Uses of Hadoop,” explores one great example. Capturing data from HVAC systems helps a business identify potential problems with products and locations.
Here’s another great example: One power company combined sensor data from the smart grid with a map of the network to predict which generators in the grid were likely to fail, and how that failure would affect the network as a whole. Using this information, they could react to problems before they happened.
4. Understand what people think about your company: Do you ever wonder what customers and prospects say about your company? Is it good or bad? Just imagine how useful that data could be if you captured it.
With Hadoop, you can mine social media conversations and figure out what people think of you and your competition. You can then analyze this data and make real-time decisions to improve user perception.
For instance, this article explains how one company used Hadoop to track user sentiment online. It gave their marketing teams the ability to assess external perception of the company (positive, neutral, or negative), and make adjustments based on that data.
5. Understand when to sell certain products:
“Done well, data can help companies uncover, quantitatively, both pain points and areas of opportunity,” says Mark Schwarz, VP of Data Science, at Square Root. “For example, tracking auto sales across dealerships may highlight that red cars are selling and blue cars or not. Knowing this, the company could adjust inventory to avoid the cost of blue cars sitting on the lot and increase revenue from having more red cars. It’s a data-driven way to understand what’s working and what’s not in a business and helps eliminate “gut reaction” decision making.”
Of course, this can go far beyond determining which product is selling best. Using Hadoop, you can analyze sales data against any number of factors.
For instance, if you analyzed sales data against weather data, you could determine which products sell best on hot days, cold days, or rainy days.
Or, what if you analyzed sales data by time and day. Do certain products sell better on specific weeks/days/hours?
Those are just a couple of examples, but I’m sure you get the point. If you know when products are likely to sell, you can better promote those products.
6. Find your ideal prospects: Chances are, you know what makes a good customer. But, do you know exactly where they are? What if you could use freely available data to identify and target your best prospects?
There’s a great example in this article. It explains how one company compared their customer data with freely available census data. They identified the location of their best prospects, and ran targeted ads at them. The results: Increased conversions and sales.
7. Gain insight from your log files: Just like your hardware, your software generates lots of useful data. One of the most common examples: Server log files. Server logs are computer-generated log files that capture network and server operations data. How can this data help? Here are a couple examples:
Security: What happens if you suspect a security breach? The server log data can help you identify and repair the vulnerability.
Usage statistics: As demonstrated in this webinar, server log data provides valuable insight into usage statistics. You can instantly see which applications are most popular, and which users are most active.
8. Threat Analysis: How can companies detect threats and fraudulent activity?
Businesses have struggled with theft, fraud, and abuse since long before computers existed. Computers and on-line systems create new opportunities for criminals to act swiftly, efficiently, and anonymously. On-line businesses use Hadoop to monitor and combat criminal behavior.
Challenge: Online criminals write viruses and malware to take over individual computers and steal valuable data. They buy and sell using fraudulent identities and use scams to steal money or goods. They lure victims into scams by sending email or other spam over networks. In “pay-per-click” systems like online advertising, they use networks of compromised computers
to automate fraudulent activity, bilking money from advertisers or ad networks. Online businesses must capture, store, and analyze both the content and the pattern of messages that flow through the network to tell the difference between a legitimate
transaction and fraudulent activity by criminals.
One of the largest users of Hadoop, and in particular of HBase, is a global developer of software and services to protect against computer viruses. Many detection systems compute a “signature” for a virus or other malware, and use that signature to spot instances of the virus in the wild. Over the decades, the company has built up an enormous library of malware indexed by signatures. HBase provides an inexpensive and high-performance storage system for this data. The vendor uses MapReduce to compare instances of malware to one another, and to build higher-level models of the threats that the different pieces of malware pose. The ability to examine all the data comprehensively allows the company to build much more robust tools for detecting known and emerging threats. A large online email provider has a Hadoop cluster that provides a similar service. Instead of detecting viruses, though, the system recognizes spam messages. Email flowing through the system is examined automatically. New spam messages are properly flagged, and the system detects and reacts to new attacks as criminals create them. Sites that sell goods and services over the internet are particularly vulnerable to fraud and theft. Many use web logs to monitor user behavior on the site. By tracking that activity, tracking IP addresses and using knowledge of the location of individual visitors, these sites are able to recognize and prevent fraudulent activity. The same techniques work for online advertisers battling click fraud. Recognizing patterns of activity by individuals permits the ad networks to detect and reject fraudulent activity. Hadoop is a powerful platform for dealing with fraudulent and criminal activity like this. It is
flexible enough to store all of the data—message content, relationships among people and computers, patterns of activity—that matters. It is powerful enough to run sophisticated detection and prevention algorithms and to create complex models from historical data to monitor real-time activity
9. Ad Targeting: How can companies increase campaign efficiency?
Two leading advertising networks use Hadoop to choose the best ad to show to any given user.
Challenge: Advertisement targeting is a special kind of recommendation engine. It selects ads best suited to a particular visitor. There is, though, an additional twist: each advertiser is willing to pay a certain amount to have its ad seen. Advertising networks auction ad space, and advertisers want their ads shown to the people most likely to buy their products. This creates a complex optimization challenge. Ad targeting systems must understand user preferences and behavior, estimate how interested a given user will be in the different ads available for display, and choose the one that maximizes revenue to both the advertiser and the advertising network. The data managed by these systems is simple and structured. The ad exchanges, however, provide services to a large number of advertisers, deliver advertisements on a wide variety of Web properties and must scale to millions of end users browsing the web and loading pages that must include advertising. The data volume is enormous. Optimization requires examining both the relevance of a given advertisement to a particular user, and the collection of bids by different advertisers who want to reach that visitor. The analytics required to make the correct choice are complex, and running them on the large dataset requires a large-scale, parallel system.
Solution: One advertising exchange uses Hadoop to collect the stream of user activity coming off of its servers. The system captures that data on the cluster, and runs analyses continually to determine how successful the system has been at displaying ads that appealed to users. Business analysts at the exchange are able to see reports on the performance of individual ads, and to adjust the system to improve relevance and increase revenues immediately. A second exchange builds sophisticated models of user behavior in order to choose the right ad for a given visitor in real time. The model uses large amounts of historical data on user behavior to cluster ads and users, and to deduce preferences. Hadoop delivers much better-targeted advertisements by steadily refining those models and delivering better ads.
Article credit: Joe Stangarone and Cloudera