For Mongo Atlas

This page outlines the process of exporting queries and table/view metadata from Mongo Atlas, and delivering them to a Single Origin s3 bucket.

Export Queries

  1. Download mongod logs per these instructions
  2. Extract query logs: zcat mongod.gz | grep -i "command" > query-logs.txt
    1. If query logs aren’t present, enable database profiling
  3. Using timestamp, bucket query logs by day, creating one file per day, e.g. query-logs-YYYY-MM-DD.txt

Export Collection and Schema Metadata

For each collection, we need:

  • Name
  • Schema definition
  • Number of documents
  • Size in bytes
  • A list of index definitions

Save this script as dump_collection_metadata.js:

const dbName = db.getName();

// Load all collection metadata (includes schema validators)
const collInfos = db.getCollectionInfos({ type: "collection" });

collInfos.forEach((collInfo) => {
  const name = collInfo.name;
  const coll = db.getCollection(name);

  // Collection stats
  const stats = coll.stats();

  // Index definitions
  const indexes = coll.getIndexes().map((idx) => ({
    name: idx.name,
    key: idx.key,
    unique: idx.unique || false,
    sparse: idx.sparse || false,
    partialFilterExpression: idx.partialFilterExpression || null
  }));

  // JSON Schema validation, if present
  let jsonSchema = null;
  if (
    collInfo.options &&
    collInfo.options.validator &&
    collInfo.options.validator.$jsonSchema
  ) {
    jsonSchema = collInfo.options.validator.$jsonSchema;
  }

  // Compose result
  const result = {
    db_name: dbName,
    collection_name: name,
    document_count: stats.count,
    total_size_bytes: stats.size,
    indexes: indexes,
    json_schema: jsonSchema
  };

  print(JSON.stringify(result));
});

Run the script, dumping to a file named collection_metadata.json:

mongosh "mongodb+srv://<cluster_address>/<db_name>" --quiet --file dump_collection_metadata.js > collection_metadata.json

Export Views

Save script as dump_view_definitions.js:

const dbName = db.getName();

// Get all views in the database
const viewInfos = db.getCollectionInfos({ type: "view" });

viewInfos.forEach((view) => {
  const result = {
    db_name: dbName,
    view_name: view.name,
    source: view.options.viewOn,
    pipeline: view.options.pipeline,
    options: view.options.collation ? { collation: view.options.collation } : {}
  };

  print(JSON.stringify(result));
});

Run:

mongosh "mongodb+srv://<cluster_address>/<db_name>" --quiet --file dump_view_definitions.js > views.json

Deliver to Single Origin S3

Setup IAM role

Pre-requisite: AWS account and permission to create IAM roles

  1. Log in to the AWS Management Console
  2. Go to IAM > Roles and click Create role
  3. Select trusted entity type, choose custom trust policy
    1. Allow Single Origin to assume this custom role [1]
  4. Attach permission policy to write to Single Origin bucket [2]
  5. Name and create the role, e.g. DataExportToSingleOrigin
  6. Share the role ARN with Single Origin, e.g. arn:aws:iam::<client-account-id>:role/DataExportToSingleOrigin

[1]

{
  "Version": "2025-07-03",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<single-origin-aws-account-id>:root"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

[2]

{
  "Version": "2025-07-03",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::schema-parquet-{client-name}",
        "arn:aws:s3:::schema-parquet-{client-name}/*"
      ]
    }
  ]
}

Copy Files

As a user who assumes the role DataExportToSingleOrigin, it should now be possible to upload the data files for a day:

aws s3 cp query-logs-YYYY-MM-DD.txt collection_metadata.json views.json s3://schema-parquet-{client-name}/YYYY/MM/DD/