This article provides information on two tools that help in testing and deploying Looker dashboards.
- The testing framework (Data Tests) introduced recently in Looker.
- An open-source continuous integration tool (Spectacle) that takes advantage of the testing framework and adds several layers of testing such as SQL Validation, Lint, etc.
Looker introduced data quality testing with the 6.20 release. Prior to this release data errors on dashboards could only be discovered after promoting the developed LookML code to a test or production environment. With the release of the model test feature, it is now possible for LookML developers to write test cases for data quality prior to pushing to the test branch or production. This greatly enhances the quality of dashboards. LookML Validator and Content Validator are also available for developers to manually check for errors during the development stage.
Open-source continuous integration (CI) tools can run SQL Validation checks as well as data tests prior to pushing code to production. Developers can add an automatic invocation of Content Validator and Lint capabilities in the near future.
Data Quality
One of the common complaints heard after a Looker dashboard is deployed to business users is that the data is either incorrect or null values. While it is not the responsibility of LookML developers to ensure the quality of data, the blame for faulty data is placed on LookML developers. Furthermore, there was a burden of ensuring data quality on developers even though they did not have the means to fix it.
However, data quality testing (similar to the unit testing framework of compiler languages) allows LookML developers to write test code that checks the quality of data and report to application teams. Additionally, LookML deployments can be paused until data quality tests pass.
Test is a Looker model parameter (structure) with two sub-parameters explore_source and assert. The test is a structural parameter that should be defined in the LookML model file and can test any or all the Explores defined in it.
The data tests that can be performed in LookML fall into these three categories:
connection: "looker_skp_final"
# include all the views
include: "/views/**/*.view"
datagroup: looker_skp_default_datagroup {
# sql_trigger: SELECT MAX(id) FROM etl_log;;
max_cache_age: "1 hour"
}
week_start_day: sunday
persist_with: looker_skp_default_datagroup
explore: skp_activity_reporting_view {
label: "Activity Reporting"
}
explore: rco_organization {
label: "RCO Organization"
}
explore: rco_activity {
label: "RCO_Activity"
}
Ensuring That a Primary Key Is Unique
This will ensure that the developer is alerted about the duplicate district id in the table or view. This test is needed since BigQuery allows duplicates. Developers have to ensure that data contains unique district_id only.
test: district_id_is_unique {
explore_source: view_rco_organization {
column: district_id {}
}
assert: district_id_is_unique {
expression: ${view_rco_organization.count_distinct(district_id)} = 1 ;;
}
}
Verifying a Known Value
There should only be 8 activity class codes with values ‘Put Away’, ‘Inbound’, “Destruction’, ‘Perm Withdrawal’, ‘Retrieval’, ‘Internal Project’, ‘Image Request’, ‘XOD Request’. This test will ensure that if null values creep into the data then the developer is alerted.
test: activity_class_code_is_accurate {
explore_source: view_rco_activity {
column: activity_class_code { field: view_rco_activity.activity_class_code }
}
assert: activity_class_code_is_expected_value {
expression: ${view_rco_activity.activity_class_code} = 8 ;;
}
}
Confirming There Are No Null Values
This next data test checks to make sure that there are no null values in the data. This explore_source uses a sort to be sure that any nulls will be returned at the top of the query. The employee hours must be tracked to any one of the 47 pay codes hence nulls are not acceptable.
test: paycode_name_is_not_null {
explore_source: view_rco_employee_activity {
column: paycode_name {}
sorts: [status: desc]
limit: 1
}
assert: paycode_name_is_not_null {
expression: NOT is_null(${view_rco_employee_activity.paycode_name}) ;;
}
}
Spectacles Automated Testing
Spectacles is a command-line, continuous integration open-source tool for Looker and LookML. One of the goals of this tool is to automate testing by running 4 types of validations (or validators) prior to deploying code to production. This tool can be configured to run tests and push to production upon receiving success status from all the test cases. Two of the four validations are discussed below while Content Validator and Lint validation are still being developed by the Spectacles team.
SQL Validation
If the underlying column names are changed in the database tables this validation checks to see if the SQL in all the LookML files are updated to reflect the change. If the tool finds the mismatches it will alert the developer to these errors.
Data Validation
After all the tests in the model file are run and receive exit codes of 0 (pass) the code can be pushed to production.
Conclusion
As the number of Looks and Dashboards grow over time, it will become harder for the LookML developers to manage changes and to test them effectively. Looker has developed a testing framework to improve the data quality of the Looks and Dashboards. An open-source community, Spectacles has developed a continuous integration tool to leverage the testing framework and developed other validation tools to raise the quality to a higher level.