Acquiring thorough insight into your data and tapping into the needs and buying patterns of customers are growing needs for businesses striving to increase operational efficiency and gain competitive advantage. Throughout 2011, I noticed a heightened interest in 'big data' and 'big data analytics' and the implications they have for businesses. In August, Gartner placed big data and extreme information processing on the initial rising slope of their Hype Cycle for Emerging Technologies, so we're just at the beginning of the big data trend. A recent TDWI survey reports that 34% of organizations are tapping into large data sets using advanced analytics tools with the goal of providing better business insight. The promise of big data analytics is that harnessing the wealth (and volume) of information within your business can significantly boost efficiency and increase your bottom line.
The term 'big data' is an all-inclusive term used to describe vast amounts of information. In contrast to traditional data which is typically stored in a relational database, big data varies in terms of volume, frequency, variety and value. Big data is characteristically generated in large volumes – on the order of terabytes or exabytes of data (one exabyte starts with 1 and has 18 zeros after it) per individual data set. Big data is also generated in high frequency, meaning that information is collected at frequent intervals. Additionally, big data is usually not nicely packaged in a spreadsheet or even a multidimensional database and often takes unstructured, qualitative information into account as well.
The thing about big data is that it's unwieldy. It presents a storage problem, requiring sometimes thousands of servers to store the information. It's often difficult to analyze with traditional BI tools that weren't designed with these massive data sets in mind. This is changing though, as BI and data warehousing vendors are getting better at real time or near-real time information delivery to allow analysts to quickly spot trends and avoid business problems.
So where does all this data come from? Everywhere really – transactional records, log files, and posts to social media sites just to name a few. But we should make a distinction between "big data" and regular old "large data." A financial director with thousands of client invoices and statements on file might classify this information as big data, but it's probably just large data. Log files from social media sites such as LinkedIn, Facebook and FourSquare are definitely considered big data. So what's the difference? One of the most important distinctions between big data and large data is the speed at which the data must be captured and available for analysis.
This article on Forbes.com gives a great example:
"When you walk through the airport and they take pictures of everybody in the security line to match every face through facial recognition, they have to do that almost in real-time. That becomes a big data problem. If I am a bank and looking at a vast number of credit scores and histories, and I don’t need to provide an answer in five seconds but can do it next day, then that is not a big data problem."
The good news is that large data can be handled by the traditional BI, reporting and analysis tools you're likely already using. But the discussion around big data is important because it reminds us that useful information does not only reside in your traditional relational databases. Rather, there are many more sources of information available throughout a company that we sometimes forget – sources from which you can cull customer behavior, purchasing trends and new sales opportunities. Consider bringing your CRM data, while not necessarily big data, into your BI mix to benefit your sales, marketing, and finance teams' analysis. Your marketing team is probably already analyzing a lot of social media data that can be used throughout the company to spot trends. Even though it might not be on the "big data level" yet, it's still useful information.