Collections
https://{tenant}.singleorigin.tech/manage
Query Collections are sets of SQL queries imported into the Single Origin app. Collections are a convenient place for you to see the queries you are working with and perform actions like running a query audit or parsing queries into Single Origin's entity model.
Navigate to the Query Collections page by selecting Manage in the navigation bar and then clicking Collections. From here, administrators can create new query collections and view a history of previously created query collections. Click on a query collection to see more details about that collection.
Prerequisites for a new Query Collection
- Queries must come from a valid source that can be imported into Single Origin
- Queries must include:
- A compiled, executed query
- The executor of the query
- A list of tags to associate with the query. Tags can either be a single item, like an Airflow task ID, or multiple items.
Note
Your personal connector must have permission to access all referenced tables.
Valid Sources
Queries may come from 1 of 3 sources:
- A CSV file
- A Compliant Query Table
- A Query History Table
CSV file upload
Each CSV file must follow the below format, with each column separated by a comma. If a header is present, use the column names below (query, user, and tags).
query | user | tags |
---|---|---|
STRING - a SQL query to include in the audit. Include "at the start and end of the query. | STRING - the email of the "owner" of the query or the email of who "ran" the query | STRING (comma separated) - tags of metadata to associate with the query. tags can be a single item (like a task ID or team that owns the pipeline) or multiple items. Include " at the start and end of the list, and (if possible) do not just use numbers as tags. For example, "10" should ideally be "query_10" (for import purposes). |
An example CSV looks like this:
query | user | tags |
---|---|---|
select lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from sandbox-demo-db.tpch_sf1.lineitem lines join sandbox-demo-db.tpch_sf1.part parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND | [email protected] | tpch_sf1, customer_orders_by_day |
Compliant Query Table
A Compliant Query Table is similar in structure to a CSV file but must include a column representing the time the query was executed:
start_time | query | user | tags |
---|---|---|---|
TIMESTAMP - the time the query was executed | STRING - a SQL query to include in the audit | STRING - the email of the "owner" of the query or the email of who "ran" the query | STRING (comma separated) - tags of metadata to associate with the query. tags can be a single item (like a task ID or team that owns the pipeline) or multiple items |
An example Compliant Query Table looks like this:
start_time | query | user | tags |
---|---|---|---|
2022-06-18 20:39:14.668604 UTC | select lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from sandbox-demo-db.tpch_sf1.lineitem lines join sandbox-demo-db.tpch_sf1.part parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND | [email protected] | tpch_sf1, customer_orders_by_day |
Query History Table
Query History Tables are exported from Data Warehouse providers and provide a history of query executions in a standardized format. Fill out the form with the following fields (note that not all fields are available for all warehouses):
Form Input | Options | Required? | Example Use Case |
---|---|---|---|
Name | Name of the collection | Yes | |
Date and Time Range | The queries in the collection were run between this start and end time (in UTC). Time can be specified at the minute level. | Yes | |
Limit | The maximum number of queries to add to the collection. | Yes | |
Min/Max Row Number Count | The queries in the collection resulted in this many rows when run. | No | Bring in queries that generated ten rows of data or more. Queries that output 0 or 1 row may not be broadly reusable. |
Emails | The queries in the collection were run by users associated with this set of emails (can be more than one email). | No | Bring in queries for either you or your team |
Referencing Table | The queries in the collection all reference this table. | No | Create a collection that only contains queries that reference key_table (either by itself or with other tables) |
Advanced Filter | 1. INTERACTIVE (queries run interactively) 2. BATCH (queries run as part of a batch process) 3. HIGH (queries run at a high priority level). | No | |
Only include queries with write operations | The queries in the collection either created or inserted into a table (or not). | No | Bring in queries that are part of your pipelines so that you can audit them. Alternatively, bring in ad hoc queries (that are not part of your pipelines) so that you can audit them. |
Note
Using Query History Table as a source assumes that you are exporting your logs in a standardized way. For example:
- in BigQuery, the standard format is a table called
<project_id>.auditlog_dataset.cloudaudit_googleapis_com_data_access_
using a process like this. Tables with different formats will not appear in the tables list.- in Snowflake, the standard format is a table callsed
SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
Details
Once you have created a collection, you can select it from the table on the Manage Queries page. You will then see more details about queries in the collection, including:
- Index: an integer to identify the query in the collection
- Query run by: the user that ran the query
- Runtime: when the query was executed
- Tags
- The full SQL
When you no longer need a collection, then you can delete it.
Deleting a Collection
When you delete a collection, you delete its associated audits and imports.
Query Audit
Once you have created a collection, you can run a Query Audit on the queries in the collection. To start a Query Audit, click the "New Query Audit" button at the top right corner of the Audits tab on the Manage Queries page: https://{tenant}.singleorigin.tech/manage?tab=audits.
Batch Import
Once you have created a collection, you can have the queries in the collection parsed into entities using Single Origin's common definition model. To start a Batch Import of SQL queries in a collection, click the "New Import" button at the top right corner of the Imports tab on the Manage Queries page: https://{tenant}.singleorigin.tech/manage?tab=imports.
FAQ
- When creating a new query collection, are any validations run on the queries provided?
- No. When you run an audit or import of the collection, that is when queries are parsed and validated.
- Is there a max size of a collection?
- The max size for a single collection is 10 million queries.
- Is there a max number of collections I can create?
- Yes, 100.
- What role is needed to create a collection?
- A user must be an admin to create a collection.
- Can I edit queries in a collection?
- Not yet, but this is coming soon!
- Can I search queries in a collection?
- Not yet, but this is coming soon!
Updated over 1 year ago