Entities

Single Origin's Definition Model enables standardization of key parts of your data. To illustrate the concepts in the definition model, we work through a case study below. The case study uses a public dataset for a fictional eCommerce company, theLook eCommerce.

📘

Deeper Dive

For a more technical dive into how Single Origin decomposes SQL, see this blog post!

Case Study

Suppose John is an analyst at theLook eCommerce. One KPI John tracks is the number of returns, broken down by age. To pull this data, he executes the following query:

SELECT
  oi.created_at,
  u.age AS user_age,
  COUNT(DISTINCT oi.order_id) AS count_orders
FROM
  `sandbox-demo-db.thelook_ecommerce.users` AS u
  JOIN `sandbox-demo-db.thelook_ecommerce.order_items` AS oi ON u.id = oi.user_id
WHERE
  oi.status = 'Returned'
  AND oi.created_at >= '2022-01-01'
GROUP BY
  created_at,
  age

We break down this source query into views, dimensions, and data entities so that others can reuse it.

Source

Source queries are what analysts like John are familiar with. A source can be as complex as any SQL statement. Each source can generate many views, dimensions, and data entities.

View

A view is based on a source. Views remove grouping and filtering so that the underlying data is sliceable in many ways. Since many sources can lead to the same view, views unify similar tables in different places.

If John imports "Returns by Age" as a source, then the derived view (called orders) is:

SELECT
  *
FROM
  `sandbox-demo-db.thelook_ecommerce.users` AS u
  INNER JOIN `sandbox-demo-db`.thelook_ecommerce.order_items AS oi ON u.id = oi.user_id

This view appears the following way in a Single Origin draft:

Dimension

A dimension is an attribute of a view used to group or filter data for the view. A view can have many dimensions. Single Origin breaks dimensions into two types: standardized and local.

Standardized Dimension

A standardized dimension is a dimension used by many views. We recommend using standardized dimensions when possible. This creates consistent semantics and enables querying data entities from many views together.

In the source "Returns by Age", the dimension created_at represents the time of the row. We can create a standardized dimension called time to map such columns to:

ViewStandardized Dimension NameExpressionDescriptionData Type
orderstimecreated_atTimestamp of the data entryTIMESTAMP

The standard dimension will appear the following way in a Single Origin draft:

Local Dimension

A local dimension is a dimension used by a single view. Users can only combine local dimensions with data entities from the same view.

For the orders view John is working with, he could create the following local dimension:

ViewLocal Dimension NameExpression
ordersuser_ageage

The local dimension will appear the following way in a Single Origin draft:

Data Entity

A data entity is a SQL expression that performs a calculation on a view. A view can have many data entities, and data entities can be used as both metrics and machine learning features. Single Origin breaks data entities into three types: aggregation, composite, and scalar.

Aggregation

An aggregation data entity uses an aggregation function. For the orders view John is working with, he could create the following aggregation:

ViewData Entity NameDescriptionFormula
orderscount_ordersTotal number of ordersCOUNT(DISTINCT order_id)

The data entity will appear the following way in a Single Origin draft:

Note that we do not specify the group by or time range for the data entity. Data entities are flexible so that many users can specify what to group and filter by. The focus is on the function used to compute the data, not the parameters on which the data is computed.

Composite

A composite data entity uses non-aggregation operators along with existing data entities. Composite data entities are not currently supported, but they will be in the future.

Scalar

A scalar data entity uses non-aggregation operators. Scalar data entities are not currently supported, but they will be in the future.

Putting Everything Together

Recall that we started with the following query, "Returns by Age":


SELECT
  oi.created_at,
  u.age AS user_age,
  COUNT(DISTINCT oi.order_id) AS count_orders
FROM
  `sandbox-demo-db.thelook_ecommerce.users` AS u
  JOIN `sandbox-demo-db.thelook_ecommerce.order_items` AS oi ON u.id = oi.user_id
WHERE
  oi.status = 'Returned'
  AND oi.created_at >= '2022-01-01'
GROUP BY
  created_at,
  age

In Single Origin, we broke this down into:

  • oi.created_at is a standard dimension
  • u.age AS user_age is a local dimension
  • COUNT(DISTINCT oi.order_id) is a data entity named count_orders
  • SELECT * FROM users JOIN order_items is a view

We can combine these to help John compute his KPI: number of returns, broken down by age. Instead of writing SQL though, John interacts with views, dimensions, and data entities:

Choose a metric: count_orders
Choose a time grouping: time
Choose additional dimensions to group by: user_age
Choose a time range: Jan 1, 2022 to June 30, 2022

This input will then generate the equivalent SQL and run it.