tl;dr If you missed out on PipeRider’s initial release, then now is a great time to take it for a spin. Data reliability just got even more reliable with better dbt integration, data assertion recommendations, and reporting enhancements. PipeRider is open-source and easy to get started.

PipeRider Recap

PipeRider is your open-source data reliability toolkit that connects to your existing data pipelines and provides data profiling, data quality assertions, convenient HTML reports, and integration with popular data warehouses. Read more about the story behind PipeRider’s creation.

What’s New?

Recent updates to PipeRider include the following features:

Recommended data assertion generation — PipeRider’s intelligently generated assertions give you a head-start on data reliability.
Improved dbt integration — PipeRider will auto-detect your dbt project and data source settings. You can also run dbt tests with PipeRider and dbt test results are included in your report.
Improved reporting — Automatically generated reports provide data profiling information and assertion results.

Let’s take a deeper dive into these new features.

Recommended assertions

The first time you run PipeRider and your data is profiled, PipeRider will offer to generate recommended assertions based on the profile of your data source.

PipeRider analyzes the contents of your data source and makes intelligent suggestions based on the content, such as:

Asserting which columns require data and should not be null
Asserting the schema type for columns
Asserting the acceptable range of minimum and maximum values for numerical columns

Generate recommended assertions with PipeRider on first run

By using the recommended assertions you give yourself a head start by not having to manually write all of the assertions. Instead, you can tweak and add to the recommendations if necessary.

An example of a recommended assertions file generate by PipeRider, a data reliability tool — Example of PipeRider’s recommended assertions for a data source

If you’d rather not use the recommendations, PipeRider can also generate empty assertion template files, complete with column names, ready for you to customize with built-in assertions, or your own custom assertions.

dbt integration

You can now initialize PipeRider inside your dbt project, this brings PipeRider’s profiling, assertions, and reporting features to your dbt models.

PipeRider auto-detects your dbt project settings which makes adding data profiling to a dbt project easy — PipeRider auto detects your dbt project settings

PipeRider will automatically detect your dbt project settings and treat your dbt models as if they were part of your PipeRider project. This includes –

Profiling dbt models.
Generating recommended assertions for dbt models.
Testing dbt model data-profiles with PipeRider assertions.
Including dbt test results in PipeRider reports.

PipeRider can run both dbt tests and PipeRider’s own assertions with one command — PipeRider and dbt tests can be run with one command

You can also build your dbt models using PipeRider, which means it’s possible to condense your dbt and PipeRider workflow to one command: piperider run --dbt-build --dbt-test. This one command will:

Build your dbt models.
Profile your data with PipeRider’s profiler.
Run your dbt tests.
Test your data profile with PipeRider assertions
Generate an HTML report.

Full details and other available options can be found in the command reference.

Improved reporting

Reports are now automatically generated each time you run PipeRider and include the following information.

All tables included
Per-table profiling data
Per-table test results
Per-table dbt test results

Report overview

The report overview page shows a quick view of which tables or dbt models have been profiled and tested, along with stats about passed and failed tests.

Per-table reports

Click through to each table to view the data profile detailed test results for that table.

PipeRider data table report features data profile and test results

dbt test results

If the PipeRider is run on a dbt project, then the report will include a tab with details of dbt-specific test results.

Compare reports

Two reports can also be compared showing the differences between the data profile for each run, together with the expected and actual results for each test.

Compare two reports from different PipeRider runs

No excuse for unreliable data

PipeRider is so easy to use there really is no excuse for not picking up on that data drift. All you need to do it:

Install PipeRider (a quick install with pip)
Point it to your data warehouse — if you’re using dbt your settings will be auto-detected
With one command you can generate a data profile, with suggested assertions, and output a report

Check out the quick-start guide, which includes a sample dataset, for how to use the main features of PipeRider.

This blog has been republished by AIIA. To view the original article, please click HERE.

Test your data quality in minutes with PipeRider

PipeRider Recap

What’s New?

Recommended assertions

dbt integration

Improved reporting

Report overview

Per-table reports

dbt test results

Compare reports

No excuse for unreliable data

Recent Posts

Recent Comments

Archives

Categories

Meta