Query Audit
Query Audits allow Admins to gain insight into query sets outside the Single Origin platform, and to identify consolidation opportunities in a user’s external system. Query Audits follow a similar flow to Bulk Imports, but they do not create drafts or entities within the Single Origin platform.
Navigate to the Query Audit page by selecting Audits from the homepage, or by selecting the gear icon in the top right corner of the screen and clicking Query Audit in the left-hand nav. From here, administrators can create new query audits and view a history of previously run query audits. Click on a query audit to see more details about that audit.
Prerequisites for a new Query Audit
- Queries must come from a valid source that can be imported into Single Origin
- Queries must include:
- A compiled, executed query
- The executor of the query
- A list of tags to associate with the query. Tags can either be a single item, like an Airflow task ID, or multiple items.
Note
Your personal connector must have permission to access all referenced tables.
Valid Sources
Queries may come from 1 of 3 sources:
- A CSV file
- A Compliant Query Table
- A Query History Table
CSV file upload
Each CSV file must follow the below format, with each column separated by a comma. If a header is present, use the column names below (query, user, and tags).
query | user | tags |
---|---|---|
STRING - a SQL query to include in the audit. Include " at the start and end of the query. | STRING - the email of the "owner" of the query or the email of who "ran" the query | STRING (comma separated) - tags of metadata to associate with the query. tags can be a single item (like a task ID or team that owns the pipeline) or multiple items. Include " at the start and end of the list. |
An example CSV looks like this:
query | user | tags |
---|---|---|
select lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from sandbox-demo-db.tpch_sf1.lineitem lines join sandbox-demo-db.tpch_sf1.part parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND | [email protected] | tpch_sf1, customer_orders_by_day |
Compliant Query Table
A Compliant Query Table is similar in structure to a CSV file, but must include a column representing the time the query was executed:
start_time | query | user | tags |
---|---|---|---|
TIMESTAMP - the time the query was executed | STRING - a SQL query to include in the audit | STRING - the email of the "owner" of the query or the email of who "ran" the query | STRING (comma separated) - tags of metadata to associate with the query. tags can be a single item (like a task ID or team that owns the pipeline) or multiple items |
An example Compliant Query Table looks like this:
start_time | query | user | tags |
---|---|---|---|
2022-06-18 20:39:14.668604 UTC | select lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from sandbox-demo-db.tpch_sf1.lineitem lines join sandbox-demo-db.tpch_sf1.part parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND | [email protected] | tpch_sf1, customer_orders_by_day |
Query History Table
Query History Tables are exported from Data Warehouse providers, and provide a history of query executions in a standardized format. Select a user’s email to filter the queries by user, and select a time range. Choose a table and apply any additional filters. Filters include:
- INTERACTIVE (queries run interactively)
- BATCH (queries run as part of a batch process)
- HIGH (queries run at a high priority level).
Results
Uploaded queries are grouped based on their source table, then by common tables and metrics. The results of the query audit include:
- One row per group, where every query in the group references the same common set of tables
- Common formulas in the group & how frequently each formula appears
View the results for a query audit by selecting the desired run in the history table. From here, users can navigate the results in Single Origin, and export the results CSV to explore locally.
Results CSV
The results CSV is returned in pipe form. To view in an app like Google Sheets, use a custom separator on import.
Column | Description |
---|---|
group | INTEGER - each row should have a unique group number, starting at 0 and increasing by 1. |
queriesInGroup | ARRAY OF INTEGERS - when an import is processed, a row number is appended to the import starting at 1. If the audit finds similarities between queries, then queriesInGroup represents the row numbers of similar queries. For example, if you upload a CSV and the first and third rows are the same, then queriesInGroup = [1, 3]. |
commonDatasets | ARRAY OF STRINGS - For the queries in the group, what are common datasets? A dataset must appear in all queries in the group to be included in commonDatasets. |
commonFormulaCount | ARRAY OF STRINGS - For the queries in the group, what are common dimensions and data entities? And how often does the common expression occur? An expression must appear in at least 2 queries in the group to be included in commonFormulaCount. |
similarity | STRING - EXACT or SIMILAR. Are expressions from commonDatasets and commonFormulaCount exactly the same, or not exactly the same but similar? |
tags | ARRAY OF STRINGS - For the group queries, what is the unique list of tags associated with the queries? Note that if your source is a Query History Table, then we use the insertId column from BigQuery as the tag for the query. This way, you can find the compiled SQL queries in your logs using a WHERE clause (insertId = 'tag'). |
Currently, Single Origin outputs groups of queries that are either similar or exact matches, as well as some metadata on the groups.
The time to process a set of queries varies depending on complexity. Expect an audit of 100 SQL queries to take 1 minute.
Batch Import
The Batch Import form is identical to the Query Audit form. To start a Batch Import of SQL queries into Single Origin, click the "Import Queries" button at the top right corner of the Query Audit details page.

Issues
- Why did my query fail to import?
- There are several potential reasons:
- The SQL query is invalid.
- The SQL query is valid, but it references a table in a different project/warehouse than your Single Origin Connector. In this case, we do not have schema metadata for the table.
- Your Single Origin connector has not finished syncing. If there is a sync in process, then we may only have partial schema metadata.
- Why does a query in my source not show up in the results?
- In the results table, we only display queries that are similar or exact matches to other queries in the set. If a query is unique (e.g., references a table that no other queries reference), it will not appear in the results table.
- Why do I see "No Results Found" as a result of my audit?
- This can happen for one of two reasons:
- There is no duplication between queries in your set.
- We failed to process some (or all) queries, which led to the appearance of no duplication between queries in your set.
- This can happen for one of two reasons:
- Can I combine the results of multiple query audits?
- At this time, each audit is specific to the source queries provided for that audit. You cannot combine audits from different sets of queries - you would have to provide the entire set of queries into a single audit.
- There are several potential reasons:
Updated 8 days ago