Tables & References
Table & reference tools
Tools for warehouse metadata and the queries hitting a table: look up a table or view by fully-qualified name, resolve a partial or ambiguous reference into a canonical FQN, list the queries that read or produce a table, trace a table's upstream/downstream neighbor tables, inspect per-column schema and cardinality, and find tables that are structurally similar by column set, inputs, or consumers. For column-usage frequency and table storage-overview metrics, see Table & Column Usage.
All table tools accept a 3-part (db.schema.table) or 2-part (schema.table) name; a bare table name is rejected. A 2-part name resolves exactly when a default database is supplied, otherwise it is suffix-matched (multiple hits return as ambiguous with candidate FQNs).
All sample responses below use synthetic data.
table_detail
table_detailReturns physical metadata for a table or view by fully-qualified name. Accepts three-part (db.schema.table) or two-part (schema.table) names.
Sample call
"How big is analytics.orders_fact?"
Sample response (base table)
{
"table_type": "TABLE",
"row_count": 1400000000,
"bytes": 3100000000000,
"cluster_key": "(order_date, region)"
}Not found
{ "error": "No table found for fqn='ANALYTICS_DB.ANALYTICS.DOES_NOT_EXIST'" }resolve_table_reference
resolve_table_referenceCanonicalizes a 1-, 2-, or 3-part name to a fully-qualified table or view, returning physical metadata and (for views) the SQL definition and underlying physical sources.
Resolution behavior
- 3-part FQN → exact match
- 2-part with a known default database → concatenated then exact match
- Otherwise → suffix match; multiple hits return as
ambiguouswith candidate FQNs
Sample call
"What does
ordersactually refer to?"
Sample response — base table
{
"input": "orders_fact",
"fqn": "analytics_db.analytics.orders_fact",
"table_type": "TABLE",
"row_count": 1400000000,
"bytes": 3100000000000,
"cluster_key": "(order_date, region)"
}Sample response — view
{
"input": "analytics.orders_recent",
"fqn": "analytics_db.analytics.orders_recent",
"table_type": "VIEW",
"row_count": null,
"bytes": null,
"cluster_key": null,
"definition": "SELECT * FROM analytics_db.analytics.orders_fact WHERE order_date >= CURRENT_DATE - 30",
"definition_truncated": false,
"definition_full_chars": 84,
"definition_distinct_versions": 1,
"physical_tables": [
{
"fqn": "analytics_db.analytics.orders_fact",
"row_count": 1400000000,
"bytes": 3100000000000,
"cluster_key": "(order_date, region)"
}
]
}View definitions are capped at 16,000 characters; when a definition is longer, definition_truncated is true and definition_full_chars reports the full length.
Sample response — ambiguous
{
"input": "orders",
"ambiguous": true,
"candidates": [
{"fqn": "analytics_db.analytics.orders_fact", "table_type": "TABLE", "row_count": 1400000000, "bytes": 3100000000000, "cluster_key": "(order_date, region)"},
{"fqn": "analytics_db.staging.orders_raw", "table_type": "TABLE", "row_count": 1400000000, "bytes": 4200000000000, "cluster_key": null}
]
}Sample response — not found
{
"input": "does_not_exist",
"error": "No table found for 'does_not_exist'"
}table_query_list
table_query_listReturns the queries that read and/or produce a table within a processing-time date range, ordered by cost (highest first), 20 rows per page. Answers "what is actually running against this table?" — the workloads any clustering, partitioning, or schema change will affect.
Provide at least one of:
fqn— queries that read the table (it appears insource_tables).destination_fqn— queries that produce the table (it is thedestination_table); use this to find how a source table is built.
Pass both to find queries that read fqn and write destination_fqn. start_date and end_date are inclusive YYYY-MM-DD bounds; page is zero-based.
Caveat: the ingest pipeline only captures producer queries above a cost/runtime floor, so an empty
destination_fqnresult does not prove the table has no producer.
Sample call
"Which queries hit analytics.orders_fact between 2026-05-22 and 2026-05-29, and which cost the most?"
Sample response
[
{ "query_id": "01b2f9a4-7c3d-4e21-9a01-2f5b8c0d1e34", "cost": 12.47, "job_id": "job_8f3c1d", "processing_time": "2026-05-28T00:00:00Z", "signature": "a1c9e7f2" },
{ "query_id": "04e7c211-1b88-4f0a-8d22-6a90fb3c7d10", "cost": 3.10, "job_id": "job_2ab990", "processing_time": "2026-05-27T00:00:00Z", "signature": "9f02bd55" }
]Sample response — ambiguous
{
"ambiguous": true,
"candidates": ["analytics_db.analytics.orders_fact", "analytics_db.staging.orders_raw"]
}table_lineage
table_lineageReturns a table's one-hop neighbor tables within a processing-time date range. upstream_tables are the tables that produce it — the source tables of the queries that write it, plus a clone-provenance edge when the table was created via CREATE TABLE ... CLONE. downstream_tables are the tables built from queries that read it. Each neighbor carries n_queries, total_cost (dollars), and via. start_date and end_date are inclusive YYYY-MM-DD bounds; limit caps neighbors per direction (default 50).
via is a stackable list of edge tags — [] for an ordinary query-derived edge, ["clone"] for clone provenance, ["staging"] when the neighbor matches the staging-table naming convention, and ["clone", "staging"] when both apply.
If you pass a view, it is dissolved to its physical source tables: the response gains a top-level resolved_from_view note and each edge carries from_physical (which physical source it was reached through).
Upstream comes from writer queries (destination_table), which are sparse and absent for engines with no captured SQL text — treat an empty upstream_tables as "unknown", not "no producer".
Sample call
"What feeds analytics.orders_daily_agg and what reads from it between 2026-05-22 and 2026-05-29?"
Sample response
{
"fqn": "analytics_db.analytics.orders_daily_agg",
"upstream_tables": [
{ "fqn": "ANALYTICS_DB.ANALYTICS.ORDERS_FACT", "n_queries": 14, "total_cost": 142.5, "via": [] }
],
"downstream_tables": [
{ "fqn": "ANALYTICS_DB.REPORTING.REVENUE_DAILY", "n_queries": 31, "total_cost": 88.0, "via": [] },
{ "fqn": "ANALYTICS_DB.STAGING.ORDERS_DAILY_AGG_TMP", "n_queries": 4, "total_cost": 6.2, "via": ["staging"] }
]
}table_column_detail
table_column_detailReturns per-column metadata, one row per column ordered by position: field_name, data_type, cardinality, part_cluster. cardinality is the schema crawler's distinct-value estimate — read it as a magnitude signal: a column whose cardinality approaches the row count is a poor leading cluster key, while a low-cardinality column used often in filters is a strong candidate. part_cluster flags an existing partition/cluster role.
Sample call
"Show me the columns and cardinalities for analytics.orders_fact."
Sample response
[
{ "field_name": "order_id", "data_type": "NUMBER(38,0)", "cardinality": 1400000000, "part_cluster": null },
{ "field_name": "order_date", "data_type": "DATE", "cardinality": 730, "part_cluster": "CLUSTER" },
{ "field_name": "region", "data_type": "VARCHAR", "cardinality": 6, "part_cluster": "CLUSTER" }
]Finding similar tables
Three tools surface redundant copies, mergeable pipelines, and tables that must move together — each ranks candidates by Jaccard similarity (min_jaccard threshold in [0.5, 1], default 0.8) and paginates with page / page_size. Every candidate carries its own table_type so views can be told apart from base tables.
similar_tables_by_schema
similar_tables_by_schemaCompares column structure. Splits results into two buckets: identical (exactly the same column set — the strongest redundant-copy signal) and similar (column-set Jaccard ≥ min_jaccard, excluding the identical ones). Both buckets report exact totals (identical_count / similar_count).
Sample call
"Are there any tables with the same columns as analytics.orders_fact?"
Sample response
{
"fqn": "analytics_db.analytics.orders_fact",
"table_type": "BASE TABLE",
"identical_count": 1,
"similar_count": 2,
"identical": [
{ "fqn": "analytics_db.staging.orders_fact_bak", "table_type": "BASE TABLE" }
],
"similar": [
{ "fqn": "analytics_db.analytics.orders_fact_v2", "jaccard": 0.92, "shared": 22, "table_type": "BASE TABLE" },
{ "fqn": "analytics_db.reporting.orders_enriched", "jaccard": 0.81, "shared": 19, "table_type": "VIEW" }
],
"page": 0,
"page_size": 20,
"identical_has_more": false,
"similar_has_more": false
}similar_tables_by_upstream
similar_tables_by_upstreamCompares the upstream set (the source tables a table is built FROM). A high Jaccard means two tables are built from the same inputs — likely redundant or mergeable pipelines. A single similar list ranked by Jaccard descending; axis is "upstream". Often empty for source tables, which read nothing.
Sample call
"Which tables are built from the same sources as reporting.revenue_daily?"
Sample response
{
"fqn": "analytics_db.reporting.revenue_daily",
"table_type": "BASE TABLE",
"axis": "upstream",
"similar_count": 1,
"similar": [
{ "fqn": "analytics_db.reporting.revenue_weekly", "jaccard": 0.88, "shared": 7, "table_type": "BASE TABLE" }
],
"page": 0,
"page_size": 20,
"similar_has_more": false
}similar_tables_by_downstream
similar_tables_by_downstreamCompares the downstream set (the tables built FROM a table — its consumers). A high Jaccard means the same downstream depends on both: interchangeable/redundant supply, or tables that must move together in a migration. A single similar list ranked by Jaccard descending; axis is "downstream".
Sample call
"What's interchangeable with analytics.orders_fact downstream?"
Sample response
{
"fqn": "analytics_db.analytics.orders_fact",
"table_type": "BASE TABLE",
"axis": "downstream",
"similar_count": 1,
"similar": [
{ "fqn": "analytics_db.staging.orders_raw", "jaccard": 0.95, "shared": 18, "table_type": "BASE TABLE" }
],
"page": 0,
"page_size": 20,
"similar_has_more": false
}