Tables & References

Table & reference tools

Tools for warehouse metadata and the queries hitting a table: look up a table or view by fully-qualified name, resolve a partial or ambiguous reference into a canonical FQN, list the queries that read or produce a table, trace a table's upstream/downstream neighbor tables, inspect per-column schema and cardinality, and find tables that are structurally similar by column set, inputs, or consumers. For column-usage frequency and table storage-overview metrics, see Table & Column Usage.

All table tools accept a 3-part (db.schema.table) or 2-part (schema.table) name; a bare table name is rejected. A 2-part name resolves exactly when a default database is supplied, otherwise it is suffix-matched (multiple hits return as ambiguous with candidate FQNs).

All sample responses below use synthetic data.

table_detail

Returns physical metadata for a table or view by fully-qualified name. Accepts three-part (db.schema.table) or two-part (schema.table) names.

Sample call

"How big is analytics.orders_fact?"

Sample response (base table)

{
  "table_type": "TABLE",
  "row_count": 1400000000,
  "bytes": 3100000000000,
  "cluster_key": "(order_date, region)"
}

Not found

{ "error": "No table found for fqn='ANALYTICS_DB.ANALYTICS.DOES_NOT_EXIST'" }

resolve_table_reference

Canonicalizes a 1-, 2-, or 3-part name to a fully-qualified table or view, returning physical metadata and (for views) the SQL definition and underlying physical sources.

Resolution behavior

  • 3-part FQN → exact match
  • 2-part with a known default database → concatenated then exact match
  • Otherwise → suffix match; multiple hits return as ambiguous with candidate FQNs

Sample call

"What does orders actually refer to?"

Sample response — base table

{
  "input": "orders_fact",
  "fqn": "analytics_db.analytics.orders_fact",
  "table_type": "TABLE",
  "row_count": 1400000000,
  "bytes": 3100000000000,
  "cluster_key": "(order_date, region)"
}

Sample response — view

{
  "input": "analytics.orders_recent",
  "fqn": "analytics_db.analytics.orders_recent",
  "table_type": "VIEW",
  "row_count": null,
  "bytes": null,
  "cluster_key": null,
  "definition": "SELECT * FROM analytics_db.analytics.orders_fact WHERE order_date >= CURRENT_DATE - 30",
  "definition_truncated": false,
  "definition_full_chars": 84,
  "definition_distinct_versions": 1,
  "physical_tables": [
    {
      "fqn": "analytics_db.analytics.orders_fact",
      "row_count": 1400000000,
      "bytes": 3100000000000,
      "cluster_key": "(order_date, region)"
    }
  ]
}

View definitions are capped at 16,000 characters; when a definition is longer, definition_truncated is true and definition_full_chars reports the full length.

Sample response — ambiguous

{
  "input": "orders",
  "ambiguous": true,
  "candidates": [
    {"fqn": "analytics_db.analytics.orders_fact", "table_type": "TABLE", "row_count": 1400000000, "bytes": 3100000000000, "cluster_key": "(order_date, region)"},
    {"fqn": "analytics_db.staging.orders_raw", "table_type": "TABLE", "row_count": 1400000000, "bytes": 4200000000000, "cluster_key": null}
  ]
}

Sample response — not found

{
  "input": "does_not_exist",
  "error": "No table found for 'does_not_exist'"
}

table_query_list

Returns the queries that read and/or produce a table within a processing-time date range, ordered by cost (highest first), 20 rows per page. Answers "what is actually running against this table?" — the workloads any clustering, partitioning, or schema change will affect.

Provide at least one of:

  • fqn — queries that read the table (it appears in source_tables).
  • destination_fqn — queries that produce the table (it is the destination_table); use this to find how a source table is built.

Pass both to find queries that read fqn and write destination_fqn. start_date and end_date are inclusive YYYY-MM-DD bounds; page is zero-based.

Caveat: the ingest pipeline only captures producer queries above a cost/runtime floor, so an empty destination_fqn result does not prove the table has no producer.

Sample call

"Which queries hit analytics.orders_fact between 2026-05-22 and 2026-05-29, and which cost the most?"

Sample response

[
  { "query_id": "01b2f9a4-7c3d-4e21-9a01-2f5b8c0d1e34", "cost": 12.47, "job_id": "job_8f3c1d", "processing_time": "2026-05-28T00:00:00Z", "signature": "a1c9e7f2" },
  { "query_id": "04e7c211-1b88-4f0a-8d22-6a90fb3c7d10", "cost": 3.10, "job_id": "job_2ab990", "processing_time": "2026-05-27T00:00:00Z", "signature": "9f02bd55" }
]

Sample response — ambiguous

{
  "ambiguous": true,
  "candidates": ["analytics_db.analytics.orders_fact", "analytics_db.staging.orders_raw"]
}

table_lineage

Returns a table's one-hop neighbor tables within a processing-time date range. upstream_tables are the tables that produce it — the source tables of the queries that write it, plus a clone-provenance edge when the table was created via CREATE TABLE ... CLONE. downstream_tables are the tables built from queries that read it. Each neighbor carries n_queries, total_cost (dollars), and via. start_date and end_date are inclusive YYYY-MM-DD bounds; limit caps neighbors per direction (default 50).

via is a stackable list of edge tags[] for an ordinary query-derived edge, ["clone"] for clone provenance, ["staging"] when the neighbor matches the staging-table naming convention, and ["clone", "staging"] when both apply.

If you pass a view, it is dissolved to its physical source tables: the response gains a top-level resolved_from_view note and each edge carries from_physical (which physical source it was reached through).

Upstream comes from writer queries (destination_table), which are sparse and absent for engines with no captured SQL text — treat an empty upstream_tables as "unknown", not "no producer".

Sample call

"What feeds analytics.orders_daily_agg and what reads from it between 2026-05-22 and 2026-05-29?"

Sample response

{
  "fqn": "analytics_db.analytics.orders_daily_agg",
  "upstream_tables": [
    { "fqn": "ANALYTICS_DB.ANALYTICS.ORDERS_FACT", "n_queries": 14, "total_cost": 142.5, "via": [] }
  ],
  "downstream_tables": [
    { "fqn": "ANALYTICS_DB.REPORTING.REVENUE_DAILY", "n_queries": 31, "total_cost": 88.0, "via": [] },
    { "fqn": "ANALYTICS_DB.STAGING.ORDERS_DAILY_AGG_TMP", "n_queries": 4, "total_cost": 6.2, "via": ["staging"] }
  ]
}

table_column_detail

Returns per-column metadata, one row per column ordered by position: field_name, data_type, cardinality, part_cluster. cardinality is the schema crawler's distinct-value estimate — read it as a magnitude signal: a column whose cardinality approaches the row count is a poor leading cluster key, while a low-cardinality column used often in filters is a strong candidate. part_cluster flags an existing partition/cluster role.

Sample call

"Show me the columns and cardinalities for analytics.orders_fact."

Sample response

[
  { "field_name": "order_id", "data_type": "NUMBER(38,0)", "cardinality": 1400000000, "part_cluster": null },
  { "field_name": "order_date", "data_type": "DATE", "cardinality": 730, "part_cluster": "CLUSTER" },
  { "field_name": "region", "data_type": "VARCHAR", "cardinality": 6, "part_cluster": "CLUSTER" }
]

Finding similar tables

Three tools surface redundant copies, mergeable pipelines, and tables that must move together — each ranks candidates by Jaccard similarity (min_jaccard threshold in [0.5, 1], default 0.8) and paginates with page / page_size. Every candidate carries its own table_type so views can be told apart from base tables.

similar_tables_by_schema

Compares column structure. Splits results into two buckets: identical (exactly the same column set — the strongest redundant-copy signal) and similar (column-set Jaccard ≥ min_jaccard, excluding the identical ones). Both buckets report exact totals (identical_count / similar_count).

Sample call

"Are there any tables with the same columns as analytics.orders_fact?"

Sample response

{
  "fqn": "analytics_db.analytics.orders_fact",
  "table_type": "BASE TABLE",
  "identical_count": 1,
  "similar_count": 2,
  "identical": [
    { "fqn": "analytics_db.staging.orders_fact_bak", "table_type": "BASE TABLE" }
  ],
  "similar": [
    { "fqn": "analytics_db.analytics.orders_fact_v2", "jaccard": 0.92, "shared": 22, "table_type": "BASE TABLE" },
    { "fqn": "analytics_db.reporting.orders_enriched", "jaccard": 0.81, "shared": 19, "table_type": "VIEW" }
  ],
  "page": 0,
  "page_size": 20,
  "identical_has_more": false,
  "similar_has_more": false
}

similar_tables_by_upstream

Compares the upstream set (the source tables a table is built FROM). A high Jaccard means two tables are built from the same inputs — likely redundant or mergeable pipelines. A single similar list ranked by Jaccard descending; axis is "upstream". Often empty for source tables, which read nothing.

Sample call

"Which tables are built from the same sources as reporting.revenue_daily?"

Sample response

{
  "fqn": "analytics_db.reporting.revenue_daily",
  "table_type": "BASE TABLE",
  "axis": "upstream",
  "similar_count": 1,
  "similar": [
    { "fqn": "analytics_db.reporting.revenue_weekly", "jaccard": 0.88, "shared": 7, "table_type": "BASE TABLE" }
  ],
  "page": 0,
  "page_size": 20,
  "similar_has_more": false
}

similar_tables_by_downstream

Compares the downstream set (the tables built FROM a table — its consumers). A high Jaccard means the same downstream depends on both: interchangeable/redundant supply, or tables that must move together in a migration. A single similar list ranked by Jaccard descending; axis is "downstream".

Sample call

"What's interchangeable with analytics.orders_fact downstream?"

Sample response

{
  "fqn": "analytics_db.analytics.orders_fact",
  "table_type": "BASE TABLE",
  "axis": "downstream",
  "similar_count": 1,
  "similar": [
    { "fqn": "analytics_db.staging.orders_raw", "jaccard": 0.95, "shared": 18, "table_type": "BASE TABLE" }
  ],
  "page": 0,
  "page_size": 20,
  "similar_has_more": false
}