For SnowFlake
This guide details the process for creating and exporting four Parquet files (columns.parquet, tables.parquet, views.parquet, and query-historys.parquet) from Snowflake.
This Guide Will Walk You Through Three Main Steps:
1、Creating an S3 bucket
2、Creating and uploading four Parquet files with a specified schema.
3、Set Up an IAM User with AmazonS3ReadOnlyAccess and Save Access Keys in Single Origin.
Create A S3 Bucket
Prerequisites
AWS Account: Ensure you have an AWS account.
Instructions
- Login to AWS Management Console: Go to the AWS Management Console and log in.
- Create an S3 Bucket:
- Navigate to the S3 service.
- Click on Create bucket.
- Enter a unique bucket name (e.g., your-unique-bucket-name).
- Select the AWS region where you want to create the bucket.
- Leave the default settings and click Create bucket.
- Enter a unique bucket name (e.g., your-unique-bucket-name).
- Select the AWS region where you want to create the bucket.
- Leave the default settings and click Create bucket.
Create And Upload Four Parquet Files
- Upload three Parquet files with the defined schema to a three-level folder structure for current year, month, and day (eg, s3a://your-unique-bucket-name/2024/06/30,s3a://your-unique-bucket-name/2024/07/01):
tables.parquet
you can run following sql on snowflake console.
SELECT
table_catalog AS "table_catalog",
table_schema AS "table_schema",
table_name AS "table_name",
table_type AS "table_type",
is_transient AS "is_transient",
is_iceberg AS "is_iceberg",
is_dynamic AS "is_dynamic",
table_owner AS "table_owner",
row_count::BIGINT AS "row_count",
bytes::BIGINT AS "bytes",
auto_clustering_on AS "auto_clustering_on",
clustering_key AS "clustering_key",
comment AS "comment",
created AS "created",
last_altered AS "last_altered"
FROM
snowflake.account_usage.tables;
Corresponding parquet schema
message TableItem {
string table_catalog = 1;
string table_schema = 2;
string table_name = 3;
string table_type = 4;
string is_transient = 5;
string is_iceberg = 6;
string is_dynamic = 7;
string table_owner = 8;
int64 row_count = 9;
int64 bytes = 10;
string auto_clustering_on = 11;
string clustering_key = 12;
string comment = 13;
int64 created = 14;
int64 last_altered = 15;
}
columns.parquet
you can run following sql on snowflake console
SELECT
table_catalog AS "table_catalog",
table_schema AS "table_schema",
table_name AS "table_name",
column_name AS "column_name",
ordinal_position AS "ordinal_position",
is_nullable AS "is_nullable",
data_type AS "data_type" ,
numeric_precision AS "numeric_precision",
numeric_scale AS "numeric_scale",
datetime_precision AS "datetime_precision"
FROM snowflake.account_usage.columns;
Corresponding parquet schema
message schema {
string table_catalog = 1;
string table_schema = 2;
string table_name = 3;
string column_name = 4;
int32 ordinal_position = 5;
string is_nullable = 6;
string data_type = 7;
bytes numeric_precision = 8;
bytes numeric_scale = 9;
bytes datetime_precision = 10;
}
views.parquet
you can run following sql on snowflake console
SELECT
table_catalog AS "table_catalog",
table_schema AS "table_schema",
table_name AS "table_name",
view_definition AS "view_definition",
table_owner AS "table_owner"
FROM
snowflake.account_usage.views;
Corresponding parquet schema
message schema {
string table_catalog = 1;
string table_schema = 2;
string table_name = 3;
string view_definition = 4;
string table_owner = 5;
}
query-historys.parquet
you can run following sql on snowflake console
SELECT
query_id AS "query_id",
query_text AS "query",
user_name AS "query_user",
'' as "email",
total_elapsed_time AS "total_elapsed_time",
END_TIME::TIMESTAMP_NTZ "updated_at",
START_TIME::TIMESTAMP_NTZ as "start_time",
END_TIME::TIMESTAMP_NTZ "end_time",
bytes_scanned::BIGINT AS "processed_bytes",
'' as "source_tables",
'' AS "destination_table",
database_name AS "default_database",
schema_name AS "default_schema",
query_type AS "statement_type",
ROWS_PRODUCED::BIGINT as "produced_rows",
EXECUTION_TIME::BIGINT "execution_time",
warehouse_size AS "warehouse_size",
warehouse_id AS "warehouse_id",
warehouse_name AS "warehouse_name",
execution_status AS "execution_status",
query_tag AS "query_tag",
0.0 AS "query_credit_cost", -- pls input query cost from your system
CLUSTER_NUMBER::BIGINT AS "cluster_number"
FROM snowflake.account_usage.query_history
Corresponding parquet schema
message schema {
string query_id = 1;
string query = 2;
string query_user = 3;
string email = 4;
int64 total_elapsed_time = 5;
int64 updated_at = 6;
int64 start_time = 7;
int64 end_time = 8;
int64 processed_bytes = 9;
string source_tables = 10;
string destination_table = 11;
string default_database = 12;
string default_schema = 13;
string statement_type = 14;
int64 produced_rows = 15;
int64 execution_time = 16;
string warehouse_size = 17;
string warehouse_id = 18;
string warehouse_name = 19;
string execution_status = 20;
string query_tag = 21;
double query_credit_cost = 22;
int64 cluster_number = 23;
}
- Verify the Upload: Check your S3 bucket to ensure that the files views.parquet, columns.parquet,tables.parquet and query-historys.parquet have been uploaded to the three levels folder named with the current year, month, and day (e.g., s3a://your-unique-bucket-name/2024/06/30/views.parquet, s3a://your-unique-bucket-name/2024/06/30/columns.parquet, s3a://your-unique-bucket-name/2024/06/30/tables.parquet, s3a://your-unique-bucket-name/2024/06/30/query-historys.parquet).
Set Up S3 access_key and secret_key and bucket-name for Single Origin
Prerequisites
IAM User: Ensure you have an IAM user with programmatic access.
Instructions
- Create an IAM User with AmazonS3ReadOnlyAccess:
i. Go to the IAM service in the AWS Management Console.
ii. Click on Users and then Add user.
iii. Enter a user name (e.g., S3ReadOnlyUser) and select Programmatic access.
iv. Click on Next: Permissions.
v. Select Attach policies directly and search for AmazonS3ReadOnlyAccess.
vi. Select the policy and click Next: Tags, then Next: Review, and finally Create user.
vii. Note down the Access Key ID and Secret Access Key. - Save Access Keys in Our System:
Log in to our system and navigate to the settings or configuration page for S3 access.
Enter the Access Key ID and Secret Access Key in the respective fields. - Save the configuration.
Verify the Configuration: Ensure that Single Origin system can access the S3 bucket using the provided access keys.
Trigger Processing
Single Origin will automatically detect the uploaded Parquet files in the S3 bucket. Once the files (tables.parquet, columns.parquet, views.parquet, and query-historys.parquet) are successfully uploaded to the a three-level folder structure for current year, month, and day. After you set the file correctly, click the 'Start Import' button on the Single Origin page. A notification will show the number of query logs processed. You can then start the task and proceed with the optimization work. Once completed, you can view the data for the corresponding date on the page.
Updated 3 months ago