Lineage Import is a tool that Admins can use to import a set of queries that is used to build out field-level lineage between datasets in the project you have connected to.
You can find lineage import by navigating to "Lineage > Lineage Import." From this page, click "New Lineage Import" and fill out the Configuration form:
- choose your query history table
- select a time range
- apply any additional filters. Filters include:
- email inclusion: filter to queries run by a particular user(s)
- email exclusion: filter out queries run by a particular user(s)
Click "Preview" to see how many queries will be processed, and then click "Import" to start building your lineage from these queries.
After the import starts, you can set up a recurring run so your lineage updates as new queries come in. You can process your query history table (with the additional filters you set in the Configuration step) every 1, 2, 3, 4, 6, 8, or 12 hours.
Lineage Import assumes that you are exporting your logs in a standardized way. For example:
- in BigQuery, the standard format is a table called
<project_id>.auditlog_dataset.cloudaudit_googleapis_com_data_access_using a process like this. Tables with different formats will not appear in the tables list.
- in Snowflake, the standard format is a table callsed
Based on the set of queries that match the form, we will build dataset-to-dataset lineage that you can explore in our app on
- the Lineage Explorer page (accessible by navigating to "Lineage > Lineage Explorer," or
- an entity's details page.
For more, see our Data Lineage page.
If you perform multiple "Lineage Imports," then we will build lineage based on the union of queries across the imports. For example:
- You upload your query history from January 1.
- The next day you upload your query history from January 2.
After the last upload, your lineage will be based on the union of queries from January 1 and January 2.
You can investigate the details of a lineage import by clicking on its name in the "Lineage Import" table. On the details page, you can see:
- a summary of the import, including its recurrence settings (if applicable) and a history of tables that were added/updated in your lineage graph
- activity for the import, which includes the time window each (recurring) run processed from your query history table
- an actions button to update the recurrence settings
- We can only build dataset-to-dataset lineage based on the set of queries provided. For example, if the set of queries never references a particular dataset in your project, then we will not be able to derive any lineage for that dataset - we do not know where it comes from or where it is used.
Updated 6 days ago