Collections

https://{tenant}.singleorigin.tech/manage

Query Collections are sets of SQL queries imported into the Single Origin app. Collections are a convenient place for you to see the queries you are working with and perform actions like running a query audit or parsing queries into Single Origin's entity model.

Navigate to the Query Collections page by selecting Manage in the navigation bar and then clicking Collections. From here, administrators can create new query collections and view a history of previously created query collections. Click on a query collection to see more details about that collection.

Prerequisites for a new Query Collection

  • Queries must come from a valid source that can be imported into Single Origin
  • Queries must include:
    • A compiled, executed query
    • The executor of the query
    • A list of tags to associate with the query. Tags can either be a single item, like an Airflow task ID, or multiple items.

πŸ“˜

Note

Your personal connector must have permission to access all referenced tables.

Valid Sources

Queries may come from 1 of 3 sources:

  • A CSV file
  • A Compliant Query Table
  • A Query History Table

CSV file upload

Each CSV file must follow the below format, with each column separated by a comma. If a header is present, use the column names below (query, user, and tags).

queryusertags
STRING - a SQL query to include in the audit. Include "at the start and end of the query.STRING - the email of the "owner" of the query or the email of who "ran" the querySTRING (comma separated) - tags of metadata to associate with the query. tags can be a single item (like a task ID or team that owns the pipeline) or multiple items.

Include " at the start and end of the list, and (if possible) do not just use numbers as tags. For example, "10" should ideally be "query_10" (for import purposes).

An example CSV looks like this:

queryusertags
select lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from sandbox-demo-db.tpch_sf1.lineitem lines join sandbox-demo-db.tpch_sf1.part parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND[email protected]tpch_sf1, customer_orders_by_day

Compliant Query Table

A Compliant Query Table is similar in structure to a CSV file but must include a column representing the time the query was executed:

start_timequeryusertags
TIMESTAMP - the time the query was executedSTRING - a SQL query to include in the auditSTRING - the email of the "owner" of the query or the email of who "ran" the querySTRING (comma separated) - tags of metadata to associate with the query. tags can be a single item (like a task ID or team that owns the pipeline) or multiple items

An example Compliant Query Table looks like this:

start_timequeryusertags
2022-06-18 20:39:14.668604 UTCselect lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from sandbox-demo-db.tpch_sf1.lineitem lines join sandbox-demo-db.tpch_sf1.part parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND[email protected]tpch_sf1, customer_orders_by_day

Query History Table

Query History Tables are exported from Data Warehouse providers and provide a history of query executions in a standardized format. Fill out the form with the following fields (note that not all fields are available for all warehouses):

Form InputOptionsRequired?Example Use Case
NameName of the collectionYes
Date and Time RangeThe queries in the collection were run between this start and end time (in UTC). Time can be specified at the minute level.Yes
LimitThe maximum number of queries to add to the collection.Yes
Min/Max Row Number CountThe queries in the collection resulted in this many rows when run.NoBring in queries that generated ten rows of data or more. Queries that output 0 or 1 row may not be broadly reusable.
EmailsThe queries in the collection were run by users associated with this set of emails (can be more than one email).NoBring in queries for either you or your team
Referencing TableThe queries in the collection all reference this table.NoCreate a collection that only contains queries that reference key_table (either by itself or with other tables)
Advanced Filter1. INTERACTIVE (queries run interactively)
2. BATCH (queries run as part of a batch process)
3. HIGH (queries run at a high priority level).
No
Only include queries with write operationsThe queries in the collection either created or inserted into a table (or not).NoBring in queries that are part of your pipelines so that you can audit them. Alternatively, bring in ad hoc queries (that are not part of your pipelines) so that you can audit them.

πŸ“˜

Note

Using Query History Table as a source assumes that you are exporting your logs in a standardized way. For example:

  • in BigQuery, the standard format is a table called <project_id>.auditlog_dataset.cloudaudit_googleapis_com_data_access_ using a process like this. Tables with different formats will not appear in the tables list.
  • in Snowflake, the standard format is a table callsed SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY

Details

Once you have created a collection, you can select it from the table on the Manage Queries page. You will then see more details about queries in the collection, including:

  • Index: an integer to identify the query in the collection
  • Query run by: the user that ran the query
  • Runtime: when the query was executed
  • Tags
  • The full SQL

When you no longer need a collection, then you can delete it.

🚧

Deleting a Collection

When you delete a collection, you delete its associated audits and imports.

Query Audit

Once you have created a collection, you can run a Query Audit on the queries in the collection. To start a Query Audit, click the "New Query Audit" button at the top right corner of the Audits tab on the Manage Queries page: https://{tenant}.singleorigin.tech/manage?tab=audits.

Query Audit

Batch Import

Once you have created a collection, you can have the queries in the collection parsed into entities using Single Origin's common definition model. To start a Batch Import of SQL queries in a collection, click the "New Import" button at the top right corner of the Imports tab on the Manage Queries page: https://{tenant}.singleorigin.tech/manage?tab=imports.

Batch Import

FAQ

  • When creating a new query collection, are any validations run on the queries provided?
    • No. When you run an audit or import of the collection, that is when queries are parsed and validated.
  • Is there a max size of a collection?
    • The max size for a single collection is 10 million queries.
  • Is there a max number of collections I can create?
    • Yes, 100.
  • What role is needed to create a collection?
    • A user must be an admin to create a collection.
  • Can I edit queries in a collection?
    • Not yet, but this is coming soon!
  • Can I search queries in a collection?
    • Not yet, but this is coming soon!