Collections

Query Collections are sets of SQL queries imported into the Single Origin app. Collections are a convenient place for you to see the queries you are working with and perform actions like running a query audit or parsing queries into Single Origin's entity model.

Navigate to the Query Collections page by selecting Manage in the navigation bar and then clicking Collections. From here, administrators can create new query collections and view a history of previously created query collections. Click on a query collection to see more details about that collection.

Prerequisites for a new Query Collection

  • Queries must come from a valid source that can be imported into Single Origin
  • Queries must include:
    • A compiled, executed query
    • The executor of the query
    • A list of tags to associate with the query. Tags can either be a single item, like an Airflow task ID, or multiple items.

📘

Note

Your personal connector must have permission to access all referenced tables.

Valid Sources

Queries may come from 1 of 3 sources:

  • A CSV file
  • A Compliant Query Table
  • A Query History Table

CSV file upload

Each CSV file must follow the below format, with each column separated by a comma. If a header is present, use the column names below (query, user, and tags).

queryusertags
STRING - a SQL query to include in the audit. Include "at the start and end of the query.STRING - the email of the "owner" of the query or the email of who "ran" the querySTRING (comma separated) - tags of metadata to associate with the query. tags can be a single item (like a task ID or team that owns the pipeline) or multiple items.

Include " at the start and end of the list, and (if possible) do not just use numbers as tags. For example, "10" should ideally be "query_10" (for import purposes).

An example CSV looks like this:

queryusertags
select lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from sandbox-demo-db.tpch_sf1.lineitem lines join sandbox-demo-db.tpch_sf1.part parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND[email protected]tpch_sf1, customer_orders_by_day

Compliant Query Table

A Compliant Query Table is similar in structure to a CSV file but must include a column representing the time the query was executed:

start_timequeryusertags
TIMESTAMP - the time the query was executedSTRING - a SQL query to include in the auditSTRING - the email of the "owner" of the query or the email of who "ran" the querySTRING (comma separated) - tags of metadata to associate with the query. tags can be a single item (like a task ID or team that owns the pipeline) or multiple items

An example Compliant Query Table looks like this:

start_timequeryusertags
2022-06-18 20:39:14.668604 UTCselect lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from sandbox-demo-db.tpch_sf1.lineitem lines join sandbox-demo-db.tpch_sf1.part parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND[email protected]tpch_sf1, customer_orders_by_day

Query History Table

Query History Tables are exported from Data Warehouse providers and provide a history of query executions in a standardized format. Select a user's email to filter the queries by the user, and select a time range. Choose a table and apply any additional filters. Filters include:

  1. INTERACTIVE (queries run interactively)
  2. BATCH (queries run as part of a batch process)
  3. HIGH (queries run at a high priority level).

Note that this function assumes that you are exporting your logs in a standard format, e.g., with BigQuery using a process like this. Non-standard tables will not appear in the tables list.

Details

Once you have created a collection, you can select it from the table on the Manage Queries page. You will then see more details about queries in the collection, including:

  • Index: an integer to identify the query in the collection
  • Query run by: the user that ran the query
  • Runtime: when the query was executed
  • Tags
  • The full SQL

When you no longer need a collection, then you can delete it.

🚧

Deleting a Collection

When you delete a collection, you delete its associated audits and imports.

Query Audit

Once you have created a collection, you can run a Query Audit on the queries in the collection. To start a Query Audit, click the "Run Query Audit" button at the top right corner of the Audits tab on the Manage Queries page: https://{tenant}.singleorigin.tech/manage?tab=audits.

Batch Import

Once you have created a collection, you can have the queries in the collection parsed into entities using Single Origin's common definition model. To start a Batch Import of SQL queries in a collection, click the "New Import" button at the top right corner of the Imports tab on the Manage Queries page: https://{tenant}.singleorigin.tech/manage?tab=imports.

FAQ

  • When creating a new query collection, are any validations run on the queries provided?
    • No. When you run an audit or import of the collection, that is when queries are parsed and validated.
  • Is there a max size of a collection?
    • The max size for a single collection is 10 million queries.
  • Is there a max number of collections I can create?
    • Yes, 100.
  • What role is needed to create a collection?
    • A user must be an admin to create a collection.
  • Can I edit queries in a collection?
    • Not yet, but this is coming soon!
  • Can I search queries in a collection?
    • Not yet, but this is coming soon!