Bulk Import

Single Origin provides a few different ways to achieve zero-effort migration or bootstrap from other systems. Note that all the following approaches require the user to have the Admin role in Single Origin.

To access batch import from the home screen, follow "Workbench > Create" and then choose "Batch Import" on the left bar. You can currently choose from three types of sources when creating a "New" import:

  1. import via CSV (Processed Queries)
  2. import via Compliant Query Table
  3. import via Query History Table

You can also start a batch import from the results page for a Query Audit by clicking the "Import Queries" button.

Batch importing follows the same process as importing a single SQL, except for two differences:

  • If imported view/data entities do not have any validation issues or errors that require a user's action, they will automatically be added to Single Origin.
  • Views and data entities are reused and deduplicated among all of the views/data entities created from this batch import and those in Single Origin.

The batch import process happens asynchronously, and many drafts can be created from one import. Summaries of imports can be found by clicking on a historical run on the "Batch Import" page.

📘

Note

  • for BigQuery we require tables to be referenced usingproject.dataset.table notation e.g.sandbox-demo-db.thelook_ecommerce.order_items
  • for BigQuery we only support standard SQL (not Legacy SQL)
  • for BigQuery we require that you add a _TABLE_SUFFIXto your WHERE clause when querying a table with a wildcard. For more on querying wildcard tables with _TABLE_SUFFIX, please see this BigQuery documentation

Batch Import via CSV

You can collect a set of SQL statements into a CSV file and import all them at once. The required format for CSV file upload is:

queryusertags
STRING - a SQL query to include in the importSTRING - the email of the user who should own the draft created from this SQLSTRING (comma separated) - tags can be a single item (like a task ID or team that owns the pipeline) or multiple items. the first tag in the list will be used to name the view that is created from the SQL.

🚧

CSV Header

Please remove the header from the CSV files: the first row of the CSV should contain a query, user, and tags.

So an example CSV looks like:

select lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from `sandbox-demo-db.tpch_sf1.lineitem` lines join `sandbox-demo-db.tpch_sf1.part` parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND [email protected] tpch_sf1, customer_orders_by_day

The generated view would be named after the first tag, tpch_sf1.

Batch Import via Compliant Query Table

You can also collect a set of SQL statements into a "Compliant Table" to import. The required format for a Compliant Table is similar to CSV, but we assume that there is a column representing the time the query was executed:

start_timequeryusertags
TIMESTAMP - the time the query was executedSTRING - a SQL query to include in the importSTRING - the email of the user who should own the draft created from this SQLSTRING (comma separated) - tags can be a single item (like a task ID or team that owns the pipeline) or multiple items. the first tag in the list will be used to name the view that is created from the SQL.

So an example Compliant Query Table looks like:

start_timequeryusertags
2022-06-18 20:39:14.668604 UTCselect lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from sandbox-demo-db.tpch_sf1.lineitem lines join sandbox-demo-db.tpch_sf1.part parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND[email protected]tpch_sf1, customer_orders_by_day

To import using a compliant table, start a "New" import and choose a source of "Compliant Query Table." Fill in the user email that you would like to filter to in the table (i.e. the user that ran the queries), as well as the time range when the queries were run. These filters will apply to the start_time and user columns in your compliant table & only import matching rows. Choose a Single Origin Connector that is able to read tables, and then choose the name of the table you want to use with the table dropdown:

Batch Import via Query History

In addition to compiling your queries (into either a CSV or a Compliant Table), you can pull directly from your "Query History" logs to import. Note that this functionality assumes that you are exporting your logs in a standard format e.g. with BigQuery using a process like this.

To import using a query history table, start a "New" import and choose a source of "Query History." Fill in the user email that you would like to filter to in the table (i.e. the user that ran the queries), as well as the time range when the queries were run. Choose a Single Origin Connector that is able to read tables, choose the name of the table you want to use, and apply any final filters (INTERACTIVE = queries that were run interactively, BATCH = queries that were run as part of a batch process, HIGH = queries that were run at a high priority level):

📘

Naming Entities

If you are running a bulk import based on a CSV or Compliant Query Table, then we will name the entity based on the first tag of the associated SQL. If you are running a bulk import based on a Query History Table, then we will name the entity based on the insertId column of the associated SQL.

Since we require names to be in snake case format, we will remove special characters and numbers from the beginning of the name when necessary; however, in the description for the entity we will keep the original tag (including any special characters and numbers). This way, you can still search for the original tag and find the associated entities.

Batch Import from dbt (coming soon)

dbt import is similar to Raw SQL import. Instead of uploading a CSV file, you upload the exported JSON file from your dbt project. Here are the steps to follow:

  • Run dbt compile in terminal under your dbt project path
  • Upload target/manifest.json file from dbt project folder to Single Origin website

Did this page help you?