For Mongo Atlas

This page outlines the process of exporting queries and table/view metadata from Mongo Atlas, and delivering them to a Single Origin s3 bucket.

Export Queries

First, we need to enable profiling to capture slow queries. Define a threshold for what is considered "slow". E.g. we can start with 1s (1000ms)

First, check if profiling is already enabled. If profiling status is 1 or 2, then it is enabled.

mongosh --uri {CLUSTER_URI}/{DB_NAME}
> db.getProfilingStatus()

If profiling is not enabled, then enable it in mongosh:

> db.setProfilingLevel(1, { slowms: 1000 })

Then to dump query profiles to a file:

mongoexport --uri {CLUSTER_URI}/{DB_NAME} --collection system.profile --out /tmp/output.json

Export Collection and Schema Metadata

For each collection, we need:

  • Name
  • Schema definition
  • Number of documents
  • Size in bytes
  • A list of index definitions

Save this script as dump_collection_metadata.js:

const dbName = db.getName();

// Load all collection metadata (includes schema validators)
const collInfos = db.getCollectionInfos({ type: "collection" });

collInfos.forEach((collInfo) => {
  const name = collInfo.name;
  const coll = db.getCollection(name);

  // Collection stats
  const stats = coll.stats();

  // Index definitions
  const indexes = coll.getIndexes().map((idx) => ({
    name: idx.name,
    key: idx.key,
    unique: idx.unique || false,
    sparse: idx.sparse || false,
    partialFilterExpression: idx.partialFilterExpression || null
  }));

  // JSON Schema validation, if present
  let jsonSchema = null;
  if (
    collInfo.options &&
    collInfo.options.validator &&
    collInfo.options.validator.$jsonSchema
  ) {
    jsonSchema = collInfo.options.validator.$jsonSchema;
  }

  // Compose result
  const result = {
    db_name: dbName,
    collection_name: name,
    document_count: stats.count,
    total_size_bytes: stats.size,
    indexes: indexes,
    json_schema: jsonSchema
  };

  print(JSON.stringify(result));
});

Run the script, dumping to a file named collection_metadata.json:

mongosh "mongodb+srv://<cluster_address>/<db_name>" --quiet --file dump_collection_metadata.js > collection_metadata.json

Export Views

Save script as dump_view_definitions.js:

const dbName = db.getName();

// Get all views in the database
const viewInfos = db.getCollectionInfos({ type: "view" });

viewInfos.forEach((view) => {
  const result = {
    db_name: dbName,
    view_name: view.name,
    source: view.options.viewOn,
    pipeline: view.options.pipeline,
    options: view.options.collation ? { collation: view.options.collation } : {}
  };

  print(JSON.stringify(result));
});

Run:

mongosh "mongodb+srv://<cluster_address>/<db_name>" --quiet --file dump_view_definitions.js > views.json

Deliver to Single Origin S3

Setup IAM role

Pre-requisite: AWS account and permission to create IAM roles

  1. Log in to the AWS Management Console
  2. Go to IAM > Roles and click Create role
  3. Share the role ARN with Single Origin, e.g. arn:aws:iam::<client-account-id>:role/DataExportToSingleOrigin

On our end, we will grant the role access to the s3 bucket for delivering data

Copy Files

Using the above role, it should now be possible to upload the data files for a day:

aws s3 cp query-logs-YYYY-MM-DD.txt collection_metadata.json views.json s3://schema-parquet-{client-name}/YYYY/MM/DD/