For Mongo Atlas
This page outlines the process of exporting queries and table/view metadata from Mongo Atlas, and delivering them to a Single Origin s3 bucket.
Export Queries
First, we need to enable profiling to capture slow queries. Define a threshold for what is considered "slow". E.g. we can start with 1s (1000ms)
First, check if profiling is already enabled. If profiling status is 1 or 2, then it is enabled.
mongosh --uri {CLUSTER_URI}/{DB_NAME}
> db.getProfilingStatus()
If profiling is not enabled, then enable it in mongosh:
> db.setProfilingLevel(1, { slowms: 1000 })
Then to dump query profiles to a file:
mongoexport --uri {CLUSTER_URI}/{DB_NAME} --collection system.profile --out /tmp/output.json
Export Collection and Schema Metadata
For each collection, we need:
- Name
- Schema definition
- Number of documents
- Size in bytes
- A list of index definitions
Save this script as dump_collection_metadata.js:
const dbName = db.getName();
// Load all collection metadata (includes schema validators)
const collInfos = db.getCollectionInfos({ type: "collection" });
collInfos.forEach((collInfo) => {
const name = collInfo.name;
const coll = db.getCollection(name);
// Collection stats
const stats = coll.stats();
// Index definitions
const indexes = coll.getIndexes().map((idx) => ({
name: idx.name,
key: idx.key,
unique: idx.unique || false,
sparse: idx.sparse || false,
partialFilterExpression: idx.partialFilterExpression || null
}));
// JSON Schema validation, if present
let jsonSchema = null;
if (
collInfo.options &&
collInfo.options.validator &&
collInfo.options.validator.$jsonSchema
) {
jsonSchema = collInfo.options.validator.$jsonSchema;
}
// Compose result
const result = {
db_name: dbName,
collection_name: name,
document_count: stats.count,
total_size_bytes: stats.size,
indexes: indexes,
json_schema: jsonSchema
};
print(JSON.stringify(result));
});
Run the script, dumping to a file named collection_metadata.json:
mongosh "mongodb+srv://<cluster_address>/<db_name>" --quiet --file dump_collection_metadata.js > collection_metadata.json
Export Views
Save script as dump_view_definitions.js:
const dbName = db.getName();
// Get all views in the database
const viewInfos = db.getCollectionInfos({ type: "view" });
viewInfos.forEach((view) => {
const result = {
db_name: dbName,
view_name: view.name,
source: view.options.viewOn,
pipeline: view.options.pipeline,
options: view.options.collation ? { collation: view.options.collation } : {}
};
print(JSON.stringify(result));
});
Run:
mongosh "mongodb+srv://<cluster_address>/<db_name>" --quiet --file dump_view_definitions.js > views.json
Deliver to Single Origin S3
Setup IAM role
Pre-requisite: AWS account and permission to create IAM roles
- Log in to the AWS Management Console
- Go to IAM > Roles and click Create role
- Share the role ARN with Single Origin, e.g. arn:aws:iam::<client-account-id>:role/DataExportToSingleOrigin
On our end, we will grant the role access to the s3 bucket for delivering data
Copy Files
Using the above role, it should now be possible to upload the data files for a day:
aws s3 cp query-logs-YYYY-MM-DD.txt collection_metadata.json views.json s3://schema-parquet-{client-name}/YYYY/MM/DD/
Updated 13 days ago