Entities
Single Origin's Definition Model enables standardization of key parts of your data. To illustrate the concepts in the definition model, we work through a case study below. The case study uses a public dataset for a fictional eCommerce company, theLook eCommerce.
Deeper Dive
For a more technical dive into how Single Origin decomposes SQL, see this blog post!
Case Study
Suppose John is an analyst at theLook eCommerce. One KPI John tracks is the number of returns, broken down by age. To pull this data, he executes the following query:
SELECT
oi.created_at,
u.age AS user_age,
COUNT(DISTINCT oi.order_id) AS count_orders
FROM
`sandbox-demo-db.thelook_ecommerce.users` AS u
JOIN `sandbox-demo-db.thelook_ecommerce.order_items` AS oi ON u.id = oi.user_id
WHERE
oi.status = 'Returned'
AND oi.created_at >= '2022-01-01'
GROUP BY
created_at,
age
We break down this source query into views, dimensions, and data entities so that others can reuse it.
Source
Source queries are what analysts like John are familiar with. A source can be as complex as any SQL statement. Each source can generate many views, dimensions, and data entities.
View
A view is based on a source. Views remove grouping and filtering so that the underlying data is sliceable in many ways. Since many sources can lead to the same view, views unify similar tables in different places.
If John imports "Returns by Age" as a source, then the derived view (called orders
) is:
SELECT
*
FROM
`sandbox-demo-db.thelook_ecommerce.users` AS u
INNER JOIN `sandbox-demo-db`.thelook_ecommerce.order_items AS oi ON u.id = oi.user_id
This view appears the following way in a Single Origin draft:
Dimension
A dimension is an attribute of a view used to group or filter data for the view. A view can have many dimensions. Single Origin breaks dimensions into two types: standardized and local.
Standardized Dimension
A standardized dimension is a dimension used by many views. We recommend using standardized dimensions when possible. This creates consistent semantics and enables querying data entities from many views together.
In the source "Returns by Age", the dimension created_at
represents the time of the row. We can create a standardized dimension called time
to map such columns to:
View | Standardized Dimension Name | Expression | Description | Data Type |
---|---|---|---|---|
orders | time | created_at | Timestamp of the data entry | TIMESTAMP |
The standard dimension will appear the following way in a Single Origin draft:
Local Dimension
A local dimension is a dimension used by a single view. Users can only combine local dimensions with data entities from the same view.
For the orders
view John is working with, he could create the following local dimension:
View | Local Dimension Name | Expression |
---|---|---|
orders | user_age | age |
The local dimension will appear the following way in a Single Origin draft:
Data Entity
A data entity is a SQL expression that performs a calculation on a view. A view can have many data entities, and data entities can be used as both metrics and machine learning features. Single Origin breaks data entities into three types: aggregation, composite, and scalar.
Aggregation
An aggregation data entity uses an aggregation function. For the orders
view John is working with, he could create the following aggregation:
View | Data Entity Name | Description | Formula |
---|---|---|---|
orders | count_orders | Total number of orders | COUNT(DISTINCT order_id) |
The data entity will appear the following way in a Single Origin draft:
Note that we do not specify the group by or time range for the data entity. Data entities are flexible so that many users can specify what to group and filter by. The focus is on the function used to compute the data, not the parameters on which the data is computed.
Composite
A composite data entity uses non-aggregation operators along with existing data entities. Composite data entities are not currently supported, but they will be in the future.
Scalar
A scalar data entity uses non-aggregation operators. Scalar data entities are not currently supported, but they will be in the future.
Putting Everything Together
Recall that we started with the following query, "Returns by Age":
SELECT
oi.created_at,
u.age AS user_age,
COUNT(DISTINCT oi.order_id) AS count_orders
FROM
`sandbox-demo-db.thelook_ecommerce.users` AS u
JOIN `sandbox-demo-db.thelook_ecommerce.order_items` AS oi ON u.id = oi.user_id
WHERE
oi.status = 'Returned'
AND oi.created_at >= '2022-01-01'
GROUP BY
created_at,
age
In Single Origin, we broke this down into:
- oi.created_at is a standard dimension
- u.age AS user_age is a local dimension
- COUNT(DISTINCT oi.order_id) is a data entity named count_orders
- SELECT * FROM users JOIN order_items is a view
We can combine these to help John compute his KPI: number of returns, broken down by age. Instead of writing SQL though, John interacts with views, dimensions, and data entities:
Choose a metric: | count_orders |
---|---|
Choose a time grouping: | time |
Choose additional dimensions to group by: | user_age |
Choose a time range: | Jan 1, 2022 to June 30, 2022 |
This input will then generate the equivalent SQL and run it.
Updated 4 months ago