Many people think that if they have tested the database they’ve tested the data warehouse. But that’s a mistake. A BIG mistake. OLTP (on-line transaction processing) testing is database testing. But OLAP (on-line analytical processing) system testing applies to the data warehouse, and more specifically, ETL testing. The difference is this:
- Database testing compares data from source to target tables.
- Data warehouse (or ETL) testing traces the accuracy of information throughout the data warehouse.
Data Warehouse testing assures that information is not just loaded correctly, but that it is appropriately aggregated, catalogued and verified so that it is useful and accurate for analysis and decision-making. The ETL process is particularly vulnerable in this regard.
Looking at Diagram 1 (above), you can see that data has to pass through a variety of processes from source to BI (reporting). The transfer points where data can be lost or transferred incorrectly are in the ETL process:
- From source to staging area
- From staging area to the data warehouse
- From the data warehouse to data marts
It is vital that we test at all of these points. There are seven kinds of tests that need to be performed to assure accuracy, scalability, system integrity, usefulness and long-term quality of the system. These are:
- Data completeness. Verifying that all the expected data is loaded into the target from the source.
- Data transformation. Verifying that data is transformed correctly according to business requirements and Source To Target mapping documents.
- Data quality. Verifying that invalid data has been corrected or eliminated in accordance with requirements.
- Performance and scalability. Verifying that the system is scalable and can sustain further growth, handling new data and subsequent queries up to acceptable performance limits.
- Integration testing. Verifying that the application fits into the overall architecture without violating the integrity of the system.
- User-acceptance testing. Verifying that implementation corresponds to user needs and expectations.
- Regression testing. This testing will be implemented for new releases or application changes.
Sometimes, to save on costs, clients try to eliminate some of these tests. The problem with this is twofold. 1) Bugs or other issues may crop up in the system as a result of not thoroughly testing all of it. 2) You, and your users, will trust that the system is sound, but without complete testing some parts may not fit together properly, resulting in misinformation.
Reasons to test thoroughly
Let’s consider some examples of how incomplete testing can cause problems, beginning with points a, b and c above. Let’s say a client/data warehouse owner is a retailer with three stores. Every store carries the same products and each orders refills as needed. Now imagine that the weather has turned hot and all the stores order more ice cream, but shipments all go to store #2. The total amount of ice cream received is correct, but now customers are disappointed to find stores #1 and 3 are out of stock on ice cream and they go to a competitor. Once there, they find they can get other products as well and begin shopping there instead. Now our client has lost more than just a few dollars on an ice cream sale. They have lost a customer – or maybe even dozens of customers.
Moving on to performance and scalability – let’s look at a successful client who opens new branches in other cities but has not tested the scalability of their data warehouse. As queries to the BI system increase, the system bogs down and response time slows. This causes customers to contact a competitor who is able to handle the increased business.
In another instance, we have a client who has expanded into internet sales with a great new website. But while the new system is functional and ready to go, it hasn’t been tested with the overall architecture. Once up, our client finds that the integration has caused bugs in the original system, creating problems that back up orders from the original business.
Sometimes clients want to drop user acceptance testing because they assume that the requirements were clear – so what could go wrong? The answer is: a lot! There are dozens, if not hundreds, of steps between requirements and implementation. The opportunity for misinterpretation exists in every one of those steps. If you wait until the final release to test the new system with users you may find that what they said, and what you thought, were entirely different. Users need to weigh in early and often on the end product, so you know that it is meeting their needs.
Finally, while it seems obvious it is also worth noting that testing should go hand-in-hand with any application changes or new releases. This is regression testing. In fact, automated regression testing is typically performed weekly or even daily, depending upon users’ or the system’s requirements. A well-oiled machine only stays that way when it is continuously re-oiled. But that is a topic for a future blog …