If you’re ready to deploy a business intelligence reporting and analytics solution, then data quality is probably on your mind. Last week, we demystified master data management (MDM) and how it combines technology with consensus-building to improve and maintain data quality. Today let’s review another technology option for data quality initiatives: ETL.
Extract, transform, and load (ETL) tools are widely used to improve data quality. ETL tools first combine data from normally heterogeneous sources to a single repository (the extract phase), transform the data by cleansing and organizing it to optimize reporting and analysis, and then load the cleansed data into a data warehouse or other system, where end users can then rely on vetted information. When just starting out, many organizations simply use ETL to cleanse their data – like resetting dates or adding default values to empty fields. More advanced data cleansing can mean “deduping” duplicate customer records or parsing fields like “Full Name” into separate fields, “First Name” and “Last Name.”
Enterprise ETL tools are expensive, with Oracle’s Data Integrator costing $23,000 per processor, for example. There are open source and low-cost ETL tools out there, but they’re not generally suitable for enterprise-scale data quality initiatives. ETLTools.net has compiled a list of them as well as their limitations.
The advantages of ETL tools vary. Obviously SAP ETL tools offer tighter integration with SAP products; the same with Oracle. Some tools are faster than others to implement and learn, and some offer better support and documentation than others. Beyond advanced data cleansing support, it’s important to carefully consider the following list of items before purchasing any ETL tool: