Claude Code with MongoDB: Schema, Aggregation, Performance
Using Claude Code on MongoDB projects
MongoDB is the database where Claude Code most needs help. With a relational database, the schema is enforced by the engine. If Claude generates a bad query, Postgres will tell you. With MongoDB, the database accepts almost any document shape you throw at it, indexes are advisory, and a query that looks fine in development can become a collection scan in production once the data grows.
Claude Code understands MongoDB. It knows BSON, the aggregation framework, Mongoose, the native Node.js driver, Atlas, and change streams. What it does not know is your specific data model: which fields you embed and which you reference, how your aggregation pipelines are structured, which compound indexes exist, and what the read patterns of your application actually are.
Without a project-specific CLAUDE.md, Claude generates MongoDB code that mixes embedded and referenced patterns inconsistently, writes aggregations without considering index usage, creates queries that perform N+1 lookups in application code instead of using $lookup, and adds new collections without the indexes those collections need to scale. This guide covers the CLAUDE.md configuration that prevents those failures. If you are new to Claude Code itself, the Claude Code setup guide covers installation and authentication before any of this applies.
The MongoDB CLAUDE.md template
The CLAUDE.md at your project root is read at the start of every Claude Code session. For a MongoDB project, it needs to declare which driver you use, which Node version is in play, what your schema conventions are, where your aggregation pipelines live, and which indexes exist on each collection.
# MongoDB project rules
## Stack
- Node.js: 20.x LTS, TypeScript 5.x strict
- MongoDB: 7.0 (Atlas M30 production, M10 staging)
- Driver: mongoose 8.x (preferred), native mongodb 6.x for aggregation-heavy paths
- Connection pool: 20 production, 5 local dev
- Read concern: majority (writes), local (reads); write concern: { w: "majority", j: true }
## Project structure
- src/models/: Mongoose schemas, one file per collection
- src/repositories/: Typed wrappers around each model
- src/aggregations/: Reusable pipelines, one file each
- src/indexes/: Index definitions registered at boot
- src/migrations/: Numbered, idempotent schema migrations
## Schema rules
- Default to embedded for one-to-few, reference for one-to-many or many-to-many
- Decision matrix in src/models/SCHEMA_RULES.md (read before any new model)
- Documents target under 256KB, hard limit 4MB
- created_at and updated_at on every collection, both indexed
- Soft delete via deleted_at, never hard delete in application code
- ObjectId for all _id fields, never strings
## Driver rules
- Mongoose for CRUD on a single collection
- Native driver for any pipeline with $lookup or $facet
- Always pass projection to limit returned fields
- All queries pass through a repository function
## Hard rules
- Every new collection requires an index file in src/indexes/ before merge
- Every aggregation longer than 3 stages lives in src/aggregations/
- NEVER iterate query results to do per-document lookups (use $lookup)
- NEVER use $where with JavaScript (disables index use)
- Connection string from MONGODB_URI only, never hardcoded
Three rules in this CLAUDE.md prevent the most common Claude Code failures.
The driver split rule matters because Mongoose and the native driver have different mental models. Mongoose is excellent for typed CRUD but its aggregation API forces you to drop down to plain objects anyway. The native driver is cleaner for aggregations but lacks the schema validation that makes Mongoose useful for writes. Without the rule, Claude picks whichever it generated last in the file, producing a codebase where routes use either tool for no clear reason. The split gives Claude a deterministic choice: writes through Mongoose, complex reads through the native driver.
The N+1 prohibition is the single most impactful rule. Without it, Claude generates code that fetches a list of documents and loops through them firing additional queries to populate related data. This is fast with ten test documents and catastrophic with ten thousand. The rule forces Claude to use $lookup in pipelines or populate() with a projection in Mongoose.
The mandatory index file rule keeps performance from rotting silently. Indexes added inline in model definitions drift out of sync with what production needs. A dedicated src/indexes/ directory with registration at boot is reviewable, testable, and visible to Claude when it generates the next collection.
Schema design: embedded vs referenced
The hardest part of MongoDB is also the part Claude needs the most help with. Embedded versus referenced is a real architectural decision that depends on read patterns, write patterns, and document size. Claude generates either pattern competently, but without explicit guidance it defaults to embedding aggressively, producing documents that grow unboundedly and queries that return more data than needed.
Add a schema decision matrix to your CLAUDE.md or a referenced SCHEMA_RULES.md:
## Schema decision matrix
### Embed when
- One-to-few (fewer than ~100 children)
- Children always read with the parent
- Children not queried independently
- Total document stays under 256KB at the upper bound
### Reference when
- One-to-many or many-to-many
- Children queried independently
- Children are large or contain media
- Children referenced from multiple parent types
### Hybrid (denormalised reference) when
- Reference is right but you need fast read access to a few fields
- Store the foreign _id plus a snapshot of read-hot fields
- Update snapshots via change stream or transaction on parent change
### Example: e-commerce order
- Embeds: line_items, shipping_address (snapshot)
- References: customer_id, product_id within each line_item
- Denormalises: customer.email and customer.name on the order
Claude follows this matrix when it is present. Without it, you get aggressive embedding because Mongoose tutorials lean that way, and order documents grow as line items are added until reads return more data than the request needs.
The hybrid pattern deserves attention. A common production failure is over-normalisation, where every relationship is a reference and every list view requires an aggregation. The denormalised reference is the pragmatic compromise: the foreign key is authoritative, but a small snapshot of read-hot fields is cached on the parent for list display. Claude implements this correctly when the pattern is documented, including the change stream logic that keeps snapshots fresh.
Mongoose patterns vs the native driver
The split rule needs concrete patterns. Mongoose for writes, native driver for complex reads is the convention, but Claude needs to see what each looks like.
Mongoose model and repository pattern:
// src/models/order.ts
import { Schema, model, Types } from 'mongoose';
const OrderSchema = new Schema({
customer_id: { type: Schema.Types.ObjectId, ref: 'Customer', required: true, index: true },
customer_email_snapshot: { type: String, required: true },
line_items: [{
product_id: { type: Schema.Types.ObjectId, ref: 'Product', required: true },
product_name_snapshot: { type: String, required: true },
quantity: { type: Number, required: true, min: 1 },
unit_price_pence: { type: Number, required: true, min: 0 }
}],
status: { type: String, enum: ['pending', 'paid', 'shipped', 'cancelled'], default: 'pending', index: true },
total_pence: { type: Number, required: true, min: 0 },
deleted_at: { type: Date, default: null, index: true }
}, { timestamps: { createdAt: 'created_at', updatedAt: 'updated_at' } });
export const Order = model('Order', OrderSchema);
// src/repositories/order.ts
import { Order, IOrder } from '../models/order';
import { Types } from 'mongoose';
export async function findOrdersByCustomer(customerId: string, limit = 20): Promise<IOrder[]> {
return Order.find({ customer_id: new Types.ObjectId(customerId), deleted_at: null })
.select('_id status total_pence created_at')
.sort({ created_at: -1 })
.limit(limit)
.lean<IOrder[]>()
.exec();
}
Native driver for an aggregation that joins data:
// src/aggregations/order_with_customer.ts
import { ObjectId } from 'mongodb';
import { getDb } from '../db/client';
export async function getOrderWithCustomer(orderId: string) {
const cursor = getDb().collection('orders').aggregate([
{ $match: { _id: new ObjectId(orderId), deleted_at: null } },
{ $lookup: {
from: 'customers',
localField: 'customer_id',
foreignField: '_id',
as: 'customer',
pipeline: [{ $project: { email: 1, name: 1, lifetime_value_pence: 1 } }]
} },
{ $unwind: '$customer' },
{ $project: { line_items: 0, customer_email_snapshot: 0, __v: 0 } }
]);
return await cursor.next();
}
Two details matter here. The first is lean() on every Mongoose read. Without it, Mongoose hydrates each result into a full document with change tracking and virtuals, wasteful when reading data to return to a caller. The second is the projection inside the $lookup. The default pulls every field of the joined collection. Adding a pipeline with $project keeps the join narrow and dramatically reduces the working set.
For the broader question of which database to use, the Claude Code with PostgreSQL guide covers the equivalent for relational projects, and the Claude Code with Prisma guide covers the typed ORM layer common across both flavours.
Aggregation pipeline conventions
Aggregation pipelines are where MongoDB application performance is decided. A well-structured pipeline that uses indexes is fast even on hundreds of millions of documents. A poorly structured one is slow on a thousand. Claude needs explicit conventions for stage order, when to use $lookup, and how to structure $group.
Add to CLAUDE.md:
## Aggregation pipeline rules
### Stage order
1. $match early (use indexes, reduce document count first)
2. $project to drop unneeded fields after match
3. $lookup for joins (always with a sub-pipeline that projects narrowly)
4. $unwind only when necessary (multiplies document count)
5. $group after match and lookup, never before
6. $sort and $limit at the end
### Performance rules
- $match must use an indexed field for any pipeline on >10k docs
- $lookup must have an index on foreignField in the joined collection
- Prefer $group with multiple accumulators over multiple pipelines
- $facet for parallel aggregations on the same input set
- explain('executionStats') before merging any new aggregation
### File pattern
- One pipeline per file in src/aggregations/, typed function in/out
- Include the latest explain output in a comment at the top
Real $match, $group, and $lookup example for a sales-by-product report:
// src/aggregations/sales_by_product.js
// Indexes used: orders.status_1_created_at_-1, products._id (default)
// Last explain: IXSCAN, totalDocsExamined 12,400, executionTimeMillis 87ms
export async function salesByProduct({ startDate, endDate, productCategory }) {
return getDb().collection('orders').aggregate([
{ $match: {
status: { $in: ['paid', 'shipped'] },
created_at: { $gte: startDate, $lte: endDate },
deleted_at: null
} },
{ $unwind: '$line_items' },
{ $lookup: {
from: 'products',
localField: 'line_items.product_id',
foreignField: '_id',
as: 'product',
pipeline: [
{ $match: { category: productCategory } },
{ $project: { name: 1, category: 1 } }
]
} },
{ $unwind: '$product' },
{ $group: {
_id: '$product._id',
product_name: { $first: '$product.name' },
units_sold: { $sum: '$line_items.quantity' },
revenue_pence: { $sum: { $multiply: ['$line_items.quantity', '$line_items.unit_price_pence'] } },
order_count: { $addToSet: '$_id' }
} },
{ $project: {
_id: 0,
product_id: '$_id',
product_name: 1,
units_sold: 1,
revenue_pence: 1,
order_count: { $size: '$order_count' }
} },
{ $sort: { revenue_pence: -1 } },
{ $limit: 100 }
]).toArray();
}
Several decisions in this pipeline matter. The $match is first and uses the compound index on (status, created_at). The $lookup filters on category inside its sub-pipeline, so the join only pulls products in the relevant category. The $group uses $addToSet on _id to count distinct orders. The explain comment at the top catches regression: when the pipeline changes and the numbers shift, the diff is reviewable.
Without the convention, Claude tends to write $match after $lookup, which is the single biggest performance mistake in MongoDB aggregation because the lookup runs against the full collection before the filter narrows it.
Indexes and performance gotchas
Indexes are where MongoDB applications either scale or do not. A missing index on a hot query path produces a collection scan that gets linearly slower as the collection grows. Claude needs to know which indexes exist and to add new ones when it writes new query patterns.
The index registration pattern:
// src/indexes/register.ts
import { getDb } from '../db/client';
export async function registerIndexes() {
const db = getDb();
await db.collection('orders').createIndexes([
{ key: { customer_id: 1, created_at: -1 }, name: 'customer_recent_orders' },
{ key: { status: 1, created_at: -1 }, name: 'status_recent' },
{ key: { 'line_items.product_id': 1 }, name: 'line_items_product' },
{ key: { deleted_at: 1 }, name: 'soft_delete', sparse: true }
]);
await db.collection('customers').createIndexes([
{ key: { email: 1 }, name: 'email_unique', unique: true }
]);
await db.collection('products').createIndexes([
{ key: { category: 1, is_active: 1 }, name: 'active_by_category' },
{ key: { sku: 1 }, name: 'sku_unique', unique: true }
]);
}
Performance gotchas to add to CLAUDE.md:
## Index rules
- Compound index field order: equality fields first, sort/range fields last
Example: { customer_id: 1, status: 1, created_at: -1 }
- Prefer covering compound indexes over relying on index intersection
- Sparse indexes for fields that are often null
- TTL indexes for time-bounded data (sessions, audit logs)
- Unique indexes are write-blocking; never add to a populated collection without a backfill plan
- Register indexes at boot only, never at request time
## N+1 prevention
- findById or findOne inside forEach/map is forbidden
- Use $lookup, populate() with projection, or batch-fetch by $in
- Code review rejects any commit that introduces a per-iteration query
The compound index field order rule is the single most misunderstood part of MongoDB indexing. Equality predicates must come first. If your compound index is { created_at: -1, customer_id: 1 } and your query filters by customer_id and sorts by created_at, that index is effectively useless. Reverse the order and the same query becomes a fast index scan. For how CLAUDE.md rules shape generated code across all projects, the CLAUDE.md explained guide covers the file format, precedence, and patterns that work in any codebase.
MongoDB Atlas and operational concerns
If your MongoDB lives in Atlas, there are operational details Claude should know: connection limits, read preferences, change streams, and the Performance Advisor.
Add to CLAUDE.md:
## Atlas operational rules
- Connection string includes replicaSet, retryWrites=true, w=majority
- Read from secondaries only for analytics; readPreference=secondaryPreferred only on aggregations marked @analytics
- Change streams subscribe with resumeAfter, persist resume tokens
- Atlas Search indexes defined separately in src/indexes/atlas_search.ts
- Performance Advisor recommendations reviewed weekly, accepted ones land as PRs
## Environment variables
- MONGODB_URI: full Atlas SRV connection string
- MONGODB_DB_NAME, MONGODB_POOL_SIZE
- ATLAS_PROJECT_ID, ATLAS_CLUSTER_NAME for Performance Advisor API
## Migrations
- Numbered idempotent scripts in src/migrations/, run via npm run migrate
- Shape-changing migrations always batch with a backfill loop
- Each migration logs start, end, and document count touched
The read preference rule prevents a category of latency surprise where someone routes app reads to a secondary, hits replication lag, and shows users data that is two seconds stale. Analytics can tolerate that lag, application reads cannot. For environment variable handling, the Claude Code environment variables guide covers the patterns that prevent the connection string from leaking into commits or logs.
Hard rules and conclusion
The full set of hard rules from a production MongoDB CLAUDE.md:
## Hard rules (consolidated)
- Mongoose for writes, native driver for aggregations with $lookup or $facet
- Every collection registered in src/indexes/register.ts before merge
- Every aggregation longer than 3 stages lives in src/aggregations/
- $match first in any pipeline, never after $lookup
- N+1 queries forbidden: use $lookup, populate() with projection, or $in batch
- Compound index order: equality fields first, then sort/range fields
- All queries pass projection; all Mongoose reads use lean() unless mutation needed
- explain('executionStats') before merging any new aggregation
- No $where with JavaScript, no $regex without an anchor
- Connection string from MONGODB_URI only, all _id fields are ObjectId
- Soft delete via deleted_at, never deleteOne in application code
These rules produce a development environment where Claude generates schemas that fit the application's read patterns, queries that use the indexes you have, aggregations that are fast at scale, and code that respects the operational characteristics of your cluster.
Without the configuration, Claude generates MongoDB code that works in development and falls over in production. Documents grow unboundedly because embedding was the default, queries are slow because indexes do not match predicates, aggregations run collection scans because the pipeline order is wrong, and N+1 patterns leak in. With the configuration, Claude follows your conventions from the first line and surfaces the explain output for every aggregation it writes.
For broader workflow principles, the Claude Code best practices guide covers the habits that apply universally across project types, including the type-safety patterns that make repository functions and aggregation results easy to reason about.
Claudify includes a MongoDB-specific CLAUDE.md template as part of the Claude Code workflow kit, with the schema decision matrix, aggregation conventions, and index registration patterns pre-configured for a typical Node plus Mongoose plus Atlas stack.
More like this
Ready to upgrade your Claude Code setup?
Get Claudify