tl;dr If you missed out on PipeRider’s initial release, then now is a great time to take it for a spin. Data reliability just got even more reliable with better dbt integration, data assertion recommendations, and reporting enhancements. PipeRider is open-source and easy to get started.
PipeRider is your open-source data reliability toolkit that connects to your existing data pipelines and provides data profiling, data quality assertions, convenient HTML reports, and integration with popular data warehouses. Read more about the story behind PipeRider’s creation.
Recent updates to PipeRider include the following features:
- Recommended data assertion generation — PipeRider’s intelligently generated assertions give you a head-start on data reliability.
- Improved dbt integration — PipeRider will auto-detect your dbt project and data source settings. You can also run dbt tests with PipeRider and dbt test results are included in your report.
- Improved reporting — Automatically generated reports provide data profiling information and assertion results.
Let’s take a deeper dive into these new features.
The first time you run PipeRider and your data is profiled, PipeRider will offer to generate recommended assertions based on the profile of your data source.
PipeRider analyzes the contents of your data source and makes intelligent suggestions based on the content, such as:
- Asserting which columns require data and should not be null
- Asserting the schema type for columns
- Asserting the acceptable range of minimum and maximum values for numerical columns
By using the recommended assertions you give yourself a head start by not having to manually write all of the assertions. Instead, you can tweak and add to the recommendations if necessary.
If you’d rather not use the recommendations, PipeRider can also generate empty assertion template files, complete with column names, ready for you to customize with built-in assertions, or your own custom assertions.
You can now initialize PipeRider inside your dbt project, this brings PipeRider’s profiling, assertions, and reporting features to your dbt models.
PipeRider will automatically detect your dbt project settings and treat your dbt models as if they were part of your PipeRider project. This includes –
- Profiling dbt models.
- Generating recommended assertions for dbt models.
- Testing dbt model data-profiles with PipeRider assertions.
- Including dbt test results in PipeRider reports.
You can also build your dbt models using PipeRider, which means it’s possible to condense your dbt and PipeRider workflow to one command:
piperider run --dbt-build --dbt-test. This one command will:
- Build your dbt models.
- Profile your data with PipeRider’s profiler.
- Run your dbt tests.
- Test your data profile with PipeRider assertions
- Generate an HTML report.
Full details and other available options can be found in the command reference.
Reports are now automatically generated each time you run PipeRider and include the following information.
- All tables included
- Per-table profiling data
- Per-table test results
- Per-table dbt test results
The report overview page shows a quick view of which tables or dbt models have been profiled and tested, along with stats about passed and failed tests.
Click through to each table to view the data profile detailed test results for that table.
dbt test results
If the PipeRider is run on a dbt project, then the report will include a tab with details of dbt-specific test results.
Two reports can also be compared showing the differences between the data profile for each run, together with the expected and actual results for each test.
No excuse for unreliable data
PipeRider is so easy to use there really is no excuse for not picking up on that data drift. All you need to do it:
- Install PipeRider (a quick install with pip)
- Point it to your data warehouse — if you’re using dbt your settings will be auto-detected
- With one command you can generate a data profile, with suggested assertions, and output a report
Check out the quick-start guide, which includes a sample dataset, for how to use the main features of PipeRider.
This blog has been republished by AIIA. To view the original article, please click HERE.