Unearthing previously unimaginable insights from massive data sets is the premise of all the big data hype. Over the past few years as more and more stories come out about how companies are finding competitive advantages in their data, big data has moved beyond the buzz. Enterprises are deploying big data projects at a faster rate every year, and even more plan to do so within the next 2 years.
The extent to which a company can take advantage of big data analysis is determined by the amount of resources and infrastructure it has available. The good news is that now the barriers to entry have been lowered, making it possible for more organizations to meet their goals to transform operations with insights gained from big data. Here are three approaches that companies of any size can take based on their particular situation.
One thing to note is that these are underlying infrastructure approaches, and that you'll still need an analytic engine like arcplan on top in order to interact with, visualize and distribute your insights.
Before big data was "big data," Teradata was the only game in town. They've been at it for so long and their functionality is so robust – some of their capabilities are second to none. Now other vendors like SAP (with HANA) and Kognitio have their own massively parallel analytic databases. They offer robust processing and querying power on multiple machines simultaneously, enable near real-time MDX (Multidimensional Expressions, for OLAP querying) and SQL (Structured Query Language, the standard way to ask a database a question) queries, and in the case of SAP HANA and Kognitio, are fully in-memory. Not surprisingly, Teradata and SAP HANA come at a high price, but for that high price, the insights you achieve can be very near the speed of thought.
Limited resources and some infrastructure
Many organizations fall into this category. There are two big data options available to this group. The first centers around the open source technology Hadoop. Its two main components – Hadoop Distributed File System (HDFS) and Hadoop MapReduce – form an operating system for distributed parallel processing of huge amounts of data. Its immense processing power comes at a low cost – since it's open source, there are no licensing fees and it only requires several small servers, which consist of "something with an Intel processor, a networking card, and usually four or so hard drives in it," according to Hadoop co-creator Doug Cutting (at a cost of around $4k per server). MapReduce divides requests across the servers and then reconnects the dots to produce the query result.
What makes Hadoop so popular is that for a reasonable price, you can chain together large clusters of commodity hardware to create a Hadoop cluster. You can even utilize Hadoop by purchasing a cluster from an organization like Rackspace, Amazon, or other Infrastructure-as-a-Service (IaaS) providers like Sqrrl. In addition, it's suitable for processing, storing and analyzing not only structured data, but also unstructured data (like social media data), log files, audio, pictures, and more better than analytic databases. Hadoop has mostly been used for batch-processing historical data, though it's closer to real-time now with some newer technologies that have come out in 2013.
The second option is NoSQL (Not Only SQL) databases, an alternative to relational databases. NoSQL databases serve up large volumes of data for web applications and are better for real-time analytics on operational data. Like Hadoop, they are distributed across commodity server clusters. Unlike traditional relational databases, NoSQL databases are schemaless, making it easier to incorporate new types of data as a company's business model evolves. Especially attractive for pilot projects where a company doesn't want to make a large investment in database licensing, NoSQL databases are often open source. You may have heard of Cassandra (created by Facebook) or MongoDB, two examples of NoSQL databases.
No resources and no infrastructure
The final approach to big data is for organizations with limited resources and limited or no infrastructure. Yes, organizations in this position can still have a big data problem and fortunately for this group, Google released BigQuery to address the situation. Google BigQuery is based on some of the primary research released 10 years ago which was the basis for Hadoop. With this solution you can utilize Google's Hadoop infrastructure to quickly analyze very large data sets. Adding to the appeal of BigQuery is that at 8 cents per gigabyte per month, the pricing is cheaper than the data plan you have on your cell phone. And the first 100 GB of data processed per month is free.
It has its limitations, like import restrictions and its use of limited SQL commands, but for organizations that want to get their feet wet and try big data queries without investments in hardware and software, Google BigQuery is a good place to start.
Of course, none of these big data approaches are mutually exclusive, and in fact they should co-exist. See this Wikibon article for how to use them in conjunction with each other.
Big data analytics isn't reserved for big enterprises only. New technology innovations have leveled the playing field, enabling organizations of any size to leverage big data insights. Whether your organization's infrastructure and resources are abundant, non-existent or somewhere in between, there is a solution out there to fit your needs.
Consider arcplan as your big data front-end. It makes big data available for real-time self-service analysis as well as summary reports and dashboards for decision-makers at all levels, ensuring that the reach goes well beyond a group of data scientists and into the hands of a broader audience who can make impactful decisions related to customer interactions, marketing campaigns, R&D and more. Learn more >>