If you’re ready to deploy a business intelligence reporting and analytics solution, then data quality is probably on your mind. Last week, we demystified master data management (MDM) and how it combines technology with consensus-building to improve and maintain data quality. Today let’s review another technology option for data quality initiatives: ETL.
Extract, transform, and load (ETL) tools are widely used to improve data quality. ETL tools first combine data from normally heterogeneous sources to a single repository (the extract phase), transform the data by cleansing and organizing it to optimize reporting and analysis, and then load the cleansed data into a data warehouse or other system, where end users can then rely on vetted information. When just starting out, many organizations simply use ETL to cleanse their data – like resetting dates or adding default values to empty fields. More advanced data cleansing can mean “deduping” duplicate customer records or parsing fields like “Full Name” into separate fields, “First Name” and “Last Name.”
Enterprise ETL tools are expensive, with Oracle’s Data Integrator costing $23,000 per processor, for example. There are open source and low-cost ETL tools out there, but they’re not generally suitable for enterprise-scale data quality initiatives. ETLTools.net has compiled a list of them as well as their limitations.
The advantages of ETL tools vary. Obviously SAP ETL tools offer tighter integration with SAP products; the same with Oracle. Some tools are faster than others to implement and learn, and some offer better support and documentation than others. Beyond advanced data cleansing support, it’s important to carefully consider the following list of items before purchasing any ETL tool:
In my last post on the subject of Cloud Computing, I mentioned two ways to slice and dice data in the cloud — depend on query tools to extract data to a local database, or use Data Warehouses to support the transactional system in the cloud. Today, I’ll delve deeper into these two choices for culling meaningful trends and KPIs from data in the cloud.
Whether or not a transactional system is moved to the cloud, the data collected is still necessary for analytical processing. A transaction processing system is optimized to capture the specific transactions as effectively as possible. On the other hand, analytical processing data has to be optimized to allow detection of trends in Key Performance Indicators. Business Intelligence (BI) systems are usually built on the latter. When the transaction system is in-house, an Extract-Transform and Load (ETL) system can be written to automate the transformation of data from highly normalized transactional to denormalized analytical form.