MongoDB Basics: Document Database Fundamentals

MongoDB is a document-oriented NoSQL database that stores data in flexible, JSON-like documents. It uses collections instead of tables and supports dynamic schemas, rich queries, and powerful aggregation pipelines.

MongoDB Basics: Document Database Fundamentals

MongoDB is a document-oriented NoSQL database that stores data in flexible, JSON-like documents instead of traditional tables with rows and columns. It is the most popular NoSQL database and is widely used for modern applications that require flexible schemas, horizontal scaling, and high performance. MongoDB's document model maps directly to objects in object-oriented programming, reducing the impedance mismatch between application code and the database.

Unlike relational databases, MongoDB does not require a predefined schema. Documents in the same collection can have different fields, and you can add or remove fields without running expensive migration scripts. This flexibility makes MongoDB ideal for applications with evolving data models or varied data structures. To understand MongoDB properly, it is helpful to be familiar with NoSQL databases, JSON data format, and database sharding.

MongoDB overview:

SQL (Relational)           MongoDB (Document)
─────────────────────────────────────────────────
Database                   Database
Table                      Collection
Row                        Document
Column                     Field
Index                      Index
JOIN                       $lookup (or embedded documents)
Primary Key                _id (ObjectId)

Example document:
{
    "_id": ObjectId("507f1f77bcf86cd799439011"),
    "name": "John Doe",
    "email": "john@example.com",
    "address": {
        "street": "123 Main St",
        "city": "Boston"
    },
    "orders": ["order1", "order2"]
}

What Is MongoDB

MongoDB is a cross-platform, document-oriented NoSQL database that stores data in flexible, JSON-like documents called BSON (Binary JSON). It was developed by MongoDB Inc. and first released in 2009. MongoDB is designed for scalability, high performance, and ease of development, making it a popular choice for modern web applications, mobile apps, and real-time analytics.

Document-Oriented: Data is stored as documents (JSON-like structures) rather than rows and columns.
Schema-Flexible: Documents in the same collection can have different fields.
Scalable: Built-in sharding for horizontal scaling across multiple servers.
High Performance: Optimized for high-throughput read and write operations.
Rich Query Language: Supports filtering, sorting, aggregation, geospatial queries, and text search.
ACID Transactions: Supports multi-document ACID transactions (since version 4.0).

Why MongoDB Matters

MongoDB addresses many of the challenges that developers face when using traditional relational databases, particularly for modern, agile development.

Developer Productivity: The document model maps directly to application objects, reducing the need for complex ORM mapping.
Schema Flexibility: Iterate quickly without running time-consuming schema migrations. Add fields as your application evolves.
Horizontal Scaling: Sharding distributes data across multiple servers, allowing you to scale out rather than up.
High Performance: Embedded documents reduce the need for expensive JOIN operations.
Rich Querying: Powerful aggregation pipeline for complex data processing without MapReduce.
Native JSON Support: Perfect for modern applications that work with JSON data from APIs and frontend frameworks.

Core MongoDB Concepts

Databases

A database is a container for collections. Each database has its own set of files and is isolated from other databases. A single MongoDB server can host multiple databases.

# Show all databases
show dbs

# Use or create a database
use myapp

# Show current database
db

Collections

A collection is a group of documents, similar to a table in relational databases. Collections do not enforce a schema, meaning documents in the same collection can have different fields.

# Create a collection implicitly (by inserting a document)
db.users.insertOne({name: "John", email: "john@example.com"})

# Create a collection explicitly
db.createCollection("logs")

# Show collections
show collections

Documents

A document is the basic unit of data in MongoDB. It is a JSON-like structure consisting of field-value pairs. Documents in the same collection can have different fields. Each document has a unique _id field that serves as the primary key.

Document example:

{
    "_id": ObjectId("507f1f77bcf86cd799439011"),
    "name": "John Doe",
    "email": "john@example.com",
    "age": 30,
    "isActive": true,
    "address": {
        "street": "123 Main St",
        "city": "Boston",
        "zip": "02101"
    },
    "tags": ["developer", "mongodb"],
    "createdAt": ISODate("2024-01-15T10:30:00Z")
}

CRUD Operations

Create (Insert)

# Insert a single document
db.users.insertOne({
    name: "John Doe",
    email: "john@example.com",
    age: 30
})

# Insert multiple documents
db.users.insertMany([
    {name: "Jane Smith", email: "jane@example.com", age: 28},
    {name: "Bob Johnson", email: "bob@example.com", age: 35}
])

Read (Query)

# Find all documents
db.users.find()

# Find with filter
db.users.find({name: "John Doe"})

# Find with multiple conditions
db.users.find({age: {$gt: 25}, isActive: true})

# Find one document
db.users.findOne({email: "john@example.com"})

# Projection (select specific fields)
db.users.find({}, {name: 1, email: 1, _id: 0})

# Sorting
db.users.find().sort({age: -1})  # Descending
db.users.find().sort({name: 1})   # Ascending

# Limit and skip (pagination)
db.users.find().limit(10).skip(20)

# Count documents
db.users.countDocuments({age: {$gt: 25}})

Update

# Update one document
db.users.updateOne(
    {name: "John Doe"},
    {$set: {age: 31, isActive: true}}
)

# Update multiple documents
db.users.updateMany(
    {isActive: false},
    {$set: {isActive: true}}
)

# Replace a document (completely replace)
db.users.replaceOne(
    {name: "John Doe"},
    {name: "John Doe", email: "john.new@example.com", age: 31}
)

# Increment a value
db.users.updateOne(
    {name: "John Doe"},
    {$inc: {loginCount: 1}}
)

# Add to array
db.users.updateOne(
    {name: "John Doe"},
    {$push: {tags: "expert"}}
)

Delete

# Delete one document
db.users.deleteOne({name: "John Doe"})

# Delete multiple documents
db.users.deleteMany({isActive: false})

# Delete all documents
db.users.deleteMany({})

Query Operators

Operator	Description	Example
$eq	Equal to	{age: {$eq: 30}}
$gt	Greater than	{age: {$gt: 25}}
$gte	Greater than or equal	{age: {$gte: 18}}
$lt	Less than	{age: {$lt: 65}}
$lte	Less than or equal	{age: {$lte: 100}}
$in	In array	{status: {$in: ["active", "pending"]}}
$nin	Not in array	{status: {$nin: ["deleted", "banned"]}}
$and	Logical AND 十八章{$and: [{age: {$gt: 18}}, {age: {$lt: 65}}]}
$or	Logical OR	{$or: [{status: "active"}, {role: "admin"}]}
$exists	Field exists	{email: {$exists: true}}

Indexing in MongoDB

Indexes improve query performance by allowing MongoDB to quickly locate documents without scanning every document in a collection.

# Create a single-field index
db.users.createIndex({email: 1})

# Create a compound index
db.users.createIndex({status: 1, created_at: -1})

# Create a unique index
db.users.createIndex({email: 1}, {unique: true})

# Create a text index (for text search)
db.articles.createIndex({content: "text", title: "text"})

# List indexes
db.users.getIndexes()

# Drop an index
db.users.dropIndex("email_1")

Aggregation Pipeline

The aggregation pipeline is MongoDB's powerful framework for data processing and analysis. It allows you to chain multiple stages to transform, filter, group, and compute data.

Basic aggregation stages:

# $match - Filter documents
# $group - Group by a field and compute aggregates
# $sort - Sort documents
# $project - Reshape documents (select fields, add computed fields)
# $lookup - Perform left outer join with another collection
# $unwind - Deconstruct arrays into multiple documents
# $limit / $skip - Pagination

Aggregation examples:

# Group by status and count
db.orders.aggregate([
    {$group: {_id: "$status", count: {$sum: 1}}}
])

# Filter, group, and sort
db.orders.aggregate([
    {$match: {status: "completed"}},
    {$group: {_id: "$customer_id", total_spent: {$sum: "$total"}}},
    {$sort: {total_spent: -1}},
    {$limit: 10}
])

# Lookup (join) with another collection
db.orders.aggregate([
    {$lookup: {
        from: "customers",
        localField: "customer_id",
        foreignField: "_id",
        as: "customer"
    }},
    {$unwind: "$customer"}
])

Embedded Documents vs References

MongoDB offers two ways to model relationships: embedding documents inside parent documents or referencing documents by ID. Choose based on your access patterns.

Embedded documents (use when):

{
    "_id": 1,
    "name": "John Doe",
    "address": {
        "street": "123 Main St",
        "city": "Boston"
    }
}
- Data is always accessed together
- One-to-one or one-to-few relationships
- Data doesn't change independently

References (use when):

// users collection
{
    "_id": 1,
    "name": "John Doe"
}

// orders collection
{
    "_id": 100,
    "user_id": 1,
    "total": 150.00
}
- Many-to-many relationships
- Large arrays that grow unbounded
- Data that is accessed independently

Common MongoDB Mistakes to Avoid

No Indexes: Queries without indexes perform full collection scans, becoming slow as data grows.
Large Documents: MongoDB has a 16MB document size limit. Avoid embedding unbounded arrays.
Deeply Nested Documents: Complex nesting makes queries difficult and slow.
Missing _id: MongoDB generates an _id automatically if not provided. Use it for primary key lookups.
Inefficient Schema Design: Model for your access patterns, not for normalization.
Not Using Projection: Returning all fields when you only need a few wastes bandwidth and memory.
Forgetting to Handle Duplicate Key Errors: Unique indexes can cause errors; handle them in application code.

MongoDB Best Practices

Index Your Queries: Create indexes for all fields used in query filters, sorts, and aggregations.
Use Projection to Limit Fields: Return only the fields your application needs.
Keep Documents Small: Aim for documents under a few hundred KB. Split large documents when possible.
Model for Access Patterns: Design your schema based on how your application reads and writes data.
Use Bulk Writes for Large Operations: Bulk operations are much faster than individual writes.
Enable Write Concern and Read Concern: Configure appropriate consistency levels for your use case.
Monitor Slow Queries: Enable the database profiler to find and optimize slow operations.

Bulk write example:

db.users.bulkWrite([
    {insertOne: {document: {name: "John", email: "john@example.com"}}},
    {updateOne: {filter: {name: "Jane"}, update: {$set: {age: 30}}}},
    {deleteOne: {filter: {name: "Bob"}}}
])

Frequently Asked Questions

What is the difference between MongoDB and MySQL?
MongoDB is a document-based NoSQL database with flexible schemas and horizontal scaling. MySQL is a relational database with fixed schemas, ACID transactions, and complex JOIN support. Choose based on your data model and scalability needs.
Does MongoDB support joins?
Yes, MongoDB supports left outer joins using the $lookup aggregation stage. However, frequent joins may indicate a schema design problem. Consider embedding or denormalizing for better performance.
What is the maximum document size in MongoDB?
The maximum BSON document size is 16MB. For larger data, use GridFS to store files in chunks.
Does MongoDB support ACID transactions?
Yes, MongoDB supports multi-document ACID transactions across multiple collections and databases since version 4.0. However, transactions have performance overhead and are not recommended for every operation.
What is an ObjectId in MongoDB?
ObjectId is a 12-byte BSON type used as the default primary key (_id). It contains a timestamp, machine ID, process ID, and counter, ensuring uniqueness across distributed systems.
What should I learn next after MongoDB basics?
After mastering MongoDB basics, explore aggregation pipeline in depth, advanced indexing strategies, sharding for horizontal scaling, and replica sets for high availability.

MongoDB Basics: Document Database Fundamentals