Bulk Import

Single Origin provides different ways to achieve low-effort migrations / bootstrapping from other systems.

📘

Permissions

Only users with the Admin role can run bulk imports

Batch importing follows the same process as importing a single SQL, except for three differences:

  • If imported view/data entities do not have validation issues or errors that require a user's action, they will automatically be added to production. No drafts will be created.
  • In a batch import we auto-generate names and descriptions for entities. The methodology for generating this information varies by import source, so please see "Naming Entities" below.
  • Views and data entities are deduplicated across both:
    • the views/data entities in the batch import
    • the views/data entities already in Single Origin.

The batch import process happens asynchronously, and you can find the summary for an import by clicking the run on your "Batch Import" page.

While we try to minimize the number of drafts generated by a batch import, multiple drafts may be created. For example, if your batch import attempts to create an entity with the same name as an entity already in production, then a draft will be created so that you can give the new entity a unique name.

📘

Note

  • for BigQuery we require tables to be referenced usingproject.dataset.table notation e.g.sandbox-demo-db.thelook_ecommerce.order_items
  • for BigQuery we only support standard SQL (not Legacy SQL)
  • for BigQuery we require that you add a _TABLE_SUFFIXto your WHERE clause when querying a table with a wildcard. For more on querying wildcard tables with _TABLE_SUFFIX, please see this BigQuery documentation

Accessing Batch Import

To access batch import from the home screen, follow "Workbench > Create Entities" and then choose "Batch Import" on the left bar. On this page, you can create a "New" import from one of three sources:

  1. import via CSV (Processed Queries)
  2. import via Compliant Query Table
  3. import via Query History Table

You can also start a batch import from the results page for a Query Audit by clicking the "Import Queries" button.

Batch Import via CSV

You can collect a set of SQL statements into a CSV file and import all of them at once. The required format for CSV file upload is the following, with each column separated by a comma:

queryusertags
STRING - a SQL query to include in the import. Include " at the start and end of the query.STRING - the email of the user who should own the draft created from this SQLSTRING (comma separated) - tags can be a single item (like a task ID or team that owns the pipeline) or multiple items. The first tag in the list will be used to name the view created from the SQL. Also, include " at the start and end of the list.

🚧

CSV Header

Please remove the header from the CSV files: the first row of the CSV should contain a query, user, and tags.

An example CSV looks like this:

"select lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from `sandbox-demo-db.tpch_sf1.lineitem` lines join `sandbox-demo-db.tpch_sf1.part` parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND" [email protected] "tpch_sf1, customer_orders_by_day"

The generated view would be named after the first tag, tpch_sf1.

Batch Import via Compliant Query Table

You can also collect a set of SQL statements into a "Compliant Table" to import. The required format for a Compliant Table is similar to CSV, but we assume that there is a column representing the time the query was executed:

start_timequeryusertags
TIMESTAMP - the time the query was executedSTRING - a SQL query to include in the importSTRING - the email of the user who should own the draft created from this SQLSTRING (comma separated) - tags can be a single item (like a task ID or team that owns the pipeline) or multiple items. The first tag in the list will be used to name the view created from the SQL.

An example Compliant Query Table looks like this:

start_timequeryusertags
2022-06-18 20:39:14.668604 UTCselect lines.L_SHIPDATE as ship_date, parts.P_BRAND as part_brand, AVG(lines.L_DISCOUNT) as avg_discount from sandbox-demo-db.tpch_sf1.lineitem lines join sandbox-demo-db.tpch_sf1.part parts on lines.L_PARTKEY = parts.P_PARTKEY group by lines.L_SHIPDATE, parts.P_BRAND[email protected]tpch_sf1, customer_orders_by_day

To import using a compliant table, start a "New" import and choose a source of "Compliant Query Table." Fill in the user email that you would like to filter to in the table (i.e., the user that ran the queries) and the time range when the queries were run. These filters will apply to the start_time and user columns in your compliant table & only import matching rows. Choose the name of the table you want to import from (only valid tables will appear in the list), check the Preview, and then Start:

Batch Import via Query History

In addition to compiling your queries (into either a CSV or a Compliant Table), you can pull directly from your "Query History" logs to import. Note that this function assumes that you are exporting your logs in a standard format, e.g., with BigQuery using a process like this.

To import using a query history table, start a "New" import and choose a source of "Query History." Choose (or fill in) the user email that you would like to filter to in the table (i.e., the user that ran the queries) and the time range when the queries were run. Choose the name of the table you want to import from (only valid tables will appear in the list), and apply any final filters (INTERACTIVE = queries that were run interactively, BATCH = queries that were run as part of a batch process, HIGH = queries that were run at a high priority level). Check the Preview, and then Start:

📘

Naming Entities

If you are running a bulk import based on a CSV or Compliant Query Table, then we will name the entity based on the first tag of the associated SQL. If you are running a bulk import based on a Query History Table, then we will name the entity based on the insertId column of the associated SQL. Since we require names to be in snake case format, we will remove special characters and numbers from the beginning of the name when necessary; however, in the description for the entity we will keep the original tag (including any special characters and numbers). This way, you can still search for the original tag and find the associated entities.

Batch Import from dbt (coming soon)

dbt import is similar to Raw SQL import. Instead of uploading a CSV file, you upload the exported JSON file from your dbt project. Here are the steps to follow:

  • Run dbt compile in your terminal under your dbt project path
  • Upload the resulting target/manifest.json file in Single Origin

Finishing a Batch Import

Once you start a batch import from one of the sources above, it will appear on your "Batch Import" history page with a "To Be Confirmed" status. Since batch imports can potentially create hundreds of entities in production at once, you can review the summary statistics of your import before finishing it.

If you want to proceed and finish the import, click "Finish Import." After this, your entities will be added to production!

📘

Duplicate Names

When you click "Finish Import," you may see a modal that duplicate names are deteced. This happens when an entity named using the "Naming Entities" metholdology described above is the same name as an entity already in production. We require entities to have globally unique names, so on this modal you can either (a) add a suffix to try and make entity names from your import unique, or (b) proceed without adding a suffix. If you choose option (b), then drafts will be created where you can update the name.

Example Video

Issues

  • There are a few reasons we may fail to process a query during a batch import, including:
    • The SQL query is invalid.
    • The SQL query is valid, but it references a table that is in a different project/warehouse than you Single Origin Connector. In this case, we do not have schema metadata for the table.