In my last entry on this subject, I discussed some of the impediments to the smooth extraction of transactional data in the cloud for the purposes of analytical processing. Today, I'm discussing why automation of data extraction is the way to go.
Imagine a 3-layer stack made up of Source, Target, and the Connection between the two. In this example, we have a Source (the transactional data) that is less accessible than a standard DBMS. The accessibility issue is caused by
- the data not being in-house
- the data is an abstraction (rather than individual tables)
So instead of joining SQL Server Tables that make up your General Ledger system to find specific transactions, you now have to log into a cloud using a browser and look for Assets. Therefore, when you have to extract the Assets data from the source, you'll have to find or write a query that creates the Assets data extract and you no longer have the ability to directly interact with the underlying tables, as they've been hidden from view.
The creation of the Target data model requires a basic understanding of denormalization and/or dimensional modeling. This is necessary because transactional data isn't reporting friendly and doesn't lend itself to historical analysis. By aggregating data and allowing the loosening of some of the normalization rules, you allow business intelligence tools to slice and dice the data more efficiently. By the way, this is nothing new and has been the basis for most successful BI implementations. A BI dashboard is nothing more than a pretty front end to this aggregated data. More on this later.
Now comes the hard part: connecting the extraction queries we mentioned above to the denormalized data model, particularly when these queries are in a cloud. Clearly, the most expedient way is to have a person play the part of a connector. Believe it or not, plenty of shops out there are still using this as a means of creating reports. As I said in my previous entry, although this is one way of doing it, the more desirable method is to automate the process. Automation is good because it's repeatable, dependable, can be scheduled or triggered, it doesn't call in sick and won't go on vacation! However, if you buy into the idea that automation is the way to go, then you have to find vendors who'll make it possible and easy to accomplish.
In my next entry, I will discuss both 'need to have' as well as 'nice to have' attributes that you must consider when you start your BI vendor selection process.