Why It's a Bad Idea to Build a Business Intelligence Platform From Scratch
A friend of mine is a Python developer for a billion-dollar corporation. His team is building a custom call center reporting app that connects to the company's cloud data storage via APIs. I've seen some of the application and while it's impressive for a custom system, it's mostly tables of numbers with the occasional pie chart. This is after 8 months of work, and the only people accessing the system are a select few big data scientists.
Believe it or not, a number of companies are doing this kind of in-house development of analytical platforms. All the hype surrounding big data has them convinced that they're missing out on the action. Consequently, companies large and small are devoting huge amounts of time, money and human resources to developing custom business intelligence systems for big data (Google BigQuery, Hadoop, etc.) reporting rather than simply choosing a platform that already exists and is proven to work in a similar environment.
At the heart of this trend is a desire for big data to have a greater impact in the organization. Since it's usually small teams of data scientists who are dealing with big data, their impact and effectiveness is equivalent to a small drop in a much larger body of water – their ripple effect throughout the organization is often minimal and short-lived. To extend the reach of big data in the company and get important insights out to a greater number of decision makers, a BI platform is a necessary next step – one that leverages big data insights in easily-digestible executive reports and dashboards.
Some companies are going down the road of custom BI platform development, but their efforts are no match for solutions like arcplan that are already available. Below is a list of what you'd need to do to build a BI platform from scratch. You'll quickly see why the effort and expense aren't worthwhile.
Step 1: Hire a team
To build a BI platform, you'll need a range of talent with a combination of IT and business acumen – business experts, data scientists, statisticians that didn't skip class in college, and developers with programming skills.
Step 2: Add security
As the number of people accessing your BI system grows, you'll need to implement a security layer to protect sensitive information and manage permissions. On the other hand, if you're working with Google BigQuery or Hadoop deployments, there is no built-in security concept whatsoever. So your next plan of action should be to invent LDAP integration, single sign-on and don't forget Kerberos (tech squad required). And here's the kicker: with an added security layer, there's a high probability you'll make your application significantly slower. Good luck with this one.
Step 3: Build an analytical engine
Systems like Hadoop are very efficient for querying big data sets; you ask them questions and they provide responses. However, they are not analytical platforms, meaning they will return raw information that needs to be interpreted or analyzed somewhere else. To perform simulation, regression and execute multistep analytical models on your data, you'll need to invent this functionality through programming.
Step 4: Use connection pooling
A connection pool is a cache of database connections that maintains connections that can be reused. Connections can be used to execute commands in a database or enhance database performance. Since opening and maintaining a database connection for each user is inefficient and a waste of resources, connection pooling exists so connections can be placed in a pool and reused when requested. With numerous people accessing your reporting system, be prepared to add programming statements for timeout values and a command for closing connections, and to inflate the connection pool with readily available database connections to maintain superior system performance.
Step 5: Take out the trash
With users frequently logging on, logging off, processing queries and even abandoning sessions, you need to know how to "take out the trash" in order to maintain efficiency throughout the system. Your program should include "garbage collection," a function that attempts to clean memory occupied by objects that are no longer being used by the program. If you fail to collect the garbage, there's no question as to whether your system will become unstable and crash – it's a matter of when.
Step 5: Add visualization
Though you often see big data represented with fancy visualizations, big data platforms don't come with visualization engines out of the box. Most likely you'll need to integrate a graphing engine to display your data. This built-in functionality is what made BI platforms so popular within the industry. There are plenty of rainbow-makers out there on the market, touting their ability to create the best visualization for your data and sparing you the trouble of recreating your own graphing engine. But if you think you can create a better stacked bar graph than the BI vendor next door, kudos to you.
Once you have completed all this work, give yourself a pat on the back for reinventing the wheel. The simple fact is that all this has been done before. Modern BI platforms are scalable, flexible, and secure, and allow you to tap into analytic techniques perfected over many years. Big data needs BI, but you don't have to build your own solution from scratch. With all of the time and expense that would go into the project, you could have been up and running with arcplan in less than 4 months and for much less than you'd pay a team of Python programmers.