Transformation tools update and reformat data from one state into another. They can prepare data for a wide variety of use cases, from cleaning it for analysis, aggregating it for consumption, or restructuring it to load into operational tools like machine learning platforms or marketing automation software.
Most commonly, transformation tools are used to clean and model raw data—typically loaded by an ELT tool into a warehouse—for analysis and consumption. This process makes raw data easier to use use in three ways:
- By cleaning data. Raw data often has errors or inconsistencies, like inconsistent phone number formats or field values that mean the same thing but are coded differently (e.g., “monthly” and “Monthly” and “month-to-month” and “m2m”).
- By modeling data. When raw data is created, it’s not designed for easy analysis, but to comprehensively track whatever event it’s recording. For example, if you wanted to build a personal finance dashboard, paychecks and individual purchases are your raw data. To derive your monthly savings rate, you’d have to apply very particular computations to this data—computations that are often referred to as “business logic.” Transformation tools allow companies to both encode this logic and to regularly transform and update data so that it reflects that logic. In other words, they allow you to generate a dataset of monthly savings rates from datasets of paychecks and purchases.
- By reducing data to a smaller size. Raw data can be extremely large. Web companies that log every page view and button click can create terabytes of data every day, which can be unwieldy to work with. Transformation tools also reduce datasets down to smaller, more manageable sizes. For example, a dataset of every page view could include billions of records, while a dataset of daily visitor countss might only include a few thousand.
Transformation tools update and reformat data from one state into another.
Transformation tools operate very similarly to ETL tools in that they aren’t designed for one-off changes; they’re also meant to continually update data regularly, automatically, and durably.
As new raw data is added to a data warehouse, it will automatically get transformed, typically on regular intervals, like every hour or every day; if a transformation fails, the transformation tool should make sure that any missed changes get corrected later.
Related terms:Data warehouse, ETL tools