Claude Code with pytest: Fixtures, Mocks, Coverage
Why pytest needs more CLAUDE.md scaffolding than Claude expects
pytest is the dominant Python test framework in 2026. It is also the framework Claude Code is most likely to get subtly wrong, for a specific reason: the gap between how pytest works and how Claude has learned to think about Python testing.
Claude's default Python test instinct is shaped by the standard library. unittest.TestCase, setUp, tearDown, self.assertEqual. That model is coherent, and it compiles, and it runs. It is also exactly wrong for a pytest project. pytest fixtures are not setUp methods. conftest.py is not a test utilities module. parametrize is not a loop wrapper. The vocabulary overlaps just enough that Claude produces code that looks like pytest and behaves like something else.
Without a CLAUDE.md, Claude will:
- Write
class TestSomething(unittest.TestCase)inside a pytest project, losing all fixture injection - Default to
functionscope for every fixture, including expensive database setups that run once per suite in your hand-written tests - Skip
parametrizeand write explicittest_case_1,test_case_2functions instead - Use
unittest.mock.patchas a decorator instead ofmonkeypatchorpytest-mock'smockerfixture - Leave
pytest-asynciounconfigured and generate async tests that silently pass without running
This guide covers the CLAUDE.md configuration that fixes all five of these patterns before Claude writes a single test. If you are setting up Claude Code for the first time, the Claude Code setup guide covers installation. For the JavaScript testing counterparts, Claude Code with Vitest and Claude Code with Jest cover the same structural approach. The Python backend patterns in Claude Code with FastAPI and Claude Code with Django compose directly with the pytest setup here.
The pytest CLAUDE.md template
The CLAUDE.md at your project root is read at the start of every Claude Code session. For pytest it needs to declare: Python version, installed plugins, project layout, conftest.py hierarchy, fixture scope rules, mocking library preference, async configuration, and hard rules that prevent the five failure modes above.
# pytest project rules
## Stack
- Python: 3.12.x
- pytest: 8.x
- Plugins: pytest-cov 5.x, pytest-asyncio 0.24.x, pytest-mock 3.x, pytest-xdist 3.x
- Mocking: pytest-mock (mocker fixture) for standard mocks, monkeypatch for env vars and paths
- async: pytest-asyncio with asyncio_mode = "auto" in pytest.ini / pyproject.toml
- Coverage: pytest-cov with --cov-fail-under=85
## Project layout
- src/{package}/: application source (src layout, not flat)
- tests/: all tests, mirrors src/ directory structure
- tests/conftest.py: session-scoped fixtures only (DB engine, event loop, shared config)
- tests/{module}/conftest.py: module-specific fixtures at the right scope
- tests/{module}/test_*.py: test files for that module
- pyproject.toml: [tool.pytest.ini_options] section for all pytest config
## conftest.py hierarchy rules
- Root tests/conftest.py: session-scoped fixtures ONLY (database engine, app factory,
global constants). Nothing that belongs to one module lives here.
- Module-level tests/{module}/conftest.py: fixtures used only within that module.
db_session, api_client, authenticated_user, sample_data for that module.
- NEVER put a function-scoped fixture in the root conftest.py
- NEVER define a fixture in a test file when it is used by more than one test in that file
(move it to conftest.py immediately)
## Fixture scope rules
- function (default): use for everything that mutates state or must be isolated between tests
- class: almost never use. Only if tests are genuinely grouped into a class with shared state
- module: use for fixtures that are expensive to create and are read-only within the module
(e.g. a parsed config file, a static dataset loaded from disk)
- session: use for one-time expensive setup that is safe to share across all tests
(e.g. database engine creation, app factory instantiation, event loop)
- NEVER use session scope for anything that writes to shared state. It leaks between tests.
- ALWAYS document scope choice in a comment when it is not function scope
## parametrize rules
- Use @pytest.mark.parametrize for any test that runs the same logic against multiple inputs
- NEVER write test_case_1, test_case_2, test_case_3 as separate functions
- Parametrize IDs: always provide ids= argument when the default repr is unreadable
- Indirect parametrize: use when the parameter itself needs fixture setup
- Stack @pytest.mark.parametrize decorators for matrix coverage (explicit beats implicit)
## Mocking rules
- Use mocker.patch() (pytest-mock) for patching objects in the module under test
- Use monkeypatch.setenv() for environment variables, monkeypatch.setattr() for simple attrs
- Use unittest.mock.patch ONLY as a last resort when mocker is unavailable (e.g. conftest)
- NEVER use @unittest.mock.patch as a decorator on a pytest test function (breaks arg order)
- NEVER mock the thing you are testing. Mock its dependencies.
- Patch at the point of use: mocker.patch("myapp.services.email.send_email"), not the origin
- NEVER leave a mock without an assertion. If you patch, assert the patch was called correctly.
## Async rules
- asyncio_mode = "auto" is set in pyproject.toml. All async def test functions run automatically.
- NEVER use @pytest.mark.asyncio manually, asyncio_mode = "auto" makes it redundant
- NEVER create a new event loop in a test. Use the session-scoped event_loop fixture.
- For async database tests: use an async session fixture with rollback-based isolation
- Prefer async def test_ functions for any code path that touches async IO
## Coverage gates
- Minimum: 85% overall (enforced by --cov-fail-under=85 in CI)
- Excluded from coverage: tests/, **/migrations/**, **/__init__.py, **/conftest.py
- Run coverage locally: pytest --cov=src --cov-report=term-missing
## Hard rules
- NEVER use unittest.TestCase in a pytest project. Write plain functions and use fixtures.
- ALWAYS follow Arrange / Act / Assert (AAA). One blank line between each section.
- NEVER assert on implementation details (internal method calls, object attribute names)
- NEVER write a test that depends on execution order or shares mutable state between tests
- NEVER use time.sleep() in a test. Mock time or use freezegun.
- NEVER skip a test with @pytest.mark.skip without a comment explaining when to un-skip.
Five rules in this template prevent the failures Claude generates most often.
The no-unittest.TestCase rule is the most impactful single line. When Claude sees class Test... in a pytest file it is unsure whether to inherit from unittest.TestCase or not, and it defaults to inheriting. That single inheritance decision means no fixture injection, no parametrize, and no monkeypatch. Plain functions with fixtures is the pytest way.
The fixture scope rule prevents two failure patterns at once. Without it, Claude gives every fixture function scope, and an expensive database setup runs once per test function in a suite with 400 tests. With it, Claude comments each non-default scope choice, which also forces it to reason about whether the fixture is actually safe at that scope before using it.
The parametrize rule eliminates test_case_1 / test_case_2 sprawl. The parametrize version is shorter, easier to extend, and produces better failure messages that tell you exactly which input caused the failure.
The mock-at-point-of-use rule fixes the most common mock that silently fails. Patching myapp.utils.send_email when the code calls from myapp.utils import send_email at import time patches the original, not the bound name. Patching myapp.services.email.send_email patches the name the calling module actually uses. Claude generates the wrong patch target more than half the time without this rule.
The AAA rule keeps Claude's test structure readable. Long tests without the blank-line discipline collapse into walls of setup where it is impossible to find what is actually being asserted.
conftest.py: what goes where
conftest.py is the most misused file in a pytest project. Claude's default behaviour without guidance is to put everything in one root-level conftest.py, which produces two problems: fixtures that are too slow (session-scoped fixtures rebuilding per test), and fixtures that bleed state across modules (function-scoped fixtures in the root that are accidentally shared).
The correct structure mirrors your source layout:
tests/
conftest.py <- session-scoped only
unit/
conftest.py <- unit-specific fixtures
test_user.py
test_product.py
integration/
conftest.py <- db_session, api_client
test_auth.py
test_orders.py
e2e/
conftest.py <- browser, live app URL
test_checkout.py
Here is what a root tests/conftest.py should look like with this discipline:
# tests/conftest.py
# Session-scoped only. Nothing module-specific lives here.
import pytest
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession, async_sessionmaker
from myapp.main import create_app
from myapp.db import Base
@pytest.fixture(scope="session")
def app():
"""Application factory. Session scope: created once for the entire test run."""
return create_app(testing=True)
@pytest.fixture(scope="session")
async def engine():
"""
Async engine against an in-memory SQLite database.
Session scope: engine creation is expensive, reuse across all tests.
"""
engine = create_async_engine("sqlite+aiosqlite:///:memory:", echo=False)
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)
yield engine
await engine.dispose()
@pytest.fixture(scope="session")
def anyio_backend():
return "asyncio"
And the integration-level conftest with function-scoped session isolation:
# tests/integration/conftest.py
import pytest
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
from httpx import AsyncClient, ASGITransport
@pytest.fixture
async def db_session(engine):
"""
Function scope (default): each test gets a transaction that rolls back.
Isolation: no test can see another test's writes.
"""
async_session = async_sessionmaker(engine, expire_on_commit=False)
async with async_session() as session:
async with session.begin():
yield session
await session.rollback()
@pytest.fixture
async def api_client(app, db_session):
"""
Function scope: fresh client per test, uses the rolled-back session.
"""
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://test"
) as client:
yield client
This structure means Claude always knows where to put a new fixture. If it is slow and read-only, session scope, root conftest. If it touches one module, module conftest. If it mutates state, function scope, module conftest. Once the rule is in CLAUDE.md, Claude generates the right file every time instead of appending everything to the root.
Fixture scope in practice: function, module, session
Fixture scope is where most pytest performance problems originate, and it is where Claude makes the most costly mistakes without guidance.
The wrong scope in the wrong direction is easy to spot. A session-scoped fixture that writes to a database corrupts state across tests immediately. A function-scoped fixture that spends 800ms starting a subprocess and tears it down after each of 200 tests adds 160 seconds to your suite. Both are bugs.
The decision tree is straightforward:
| Fixture type | Correct scope | Reason |
|---|---|---|
| Database engine / connection pool | session | Expensive to create, stateless itself |
| Database session / transaction | function | Must rollback between tests |
| App factory (stateless) | session | Created once, reused everywhere |
| HTTP client (stateless) | function or session | Depends on whether it holds auth state |
| In-memory data (read-only) | module or session | Safe to share, no mutation |
| Temp directory | function | Must be isolated per test |
| Env var override (monkeypatch) | function | monkeypatch resets automatically |
| External process (server, broker) | session | Startup cost is high, teardown is explicit |
Here is a parametrized test that demonstrates scope in action:
# tests/unit/test_pricing.py
import pytest
from myapp.pricing import calculate_discount
@pytest.mark.parametrize(
"price, tier, expected",
[
(100.00, "standard", 100.00),
(100.00, "silver", 90.00),
(100.00, "gold", 80.00),
(100.00, "platinum", 70.00),
(0.00, "gold", 0.00), # edge: zero price
(99.99, "silver", 89.99), # edge: fractional
],
ids=["standard", "silver", "gold", "platinum", "zero-price", "fractional"],
)
def test_calculate_discount(price, tier, expected):
# Arrange
# (inputs provided via parametrize)
# Act
result = calculate_discount(price, tier)
# Assert
assert result == pytest.approx(expected, rel=1e-6)
Without the ids argument, pytest generates IDs like test_calculate_discount[100.0-standard-100.0], which is readable. For complex objects, the default ID becomes test_calculate_discount[price0-tier0-expected0], which tells you nothing. Adding ids= costs three seconds and saves minutes of debugging on a CI failure.
pytest.approx is the right tool for floating point assertions. assert result == expected will fail on 89.990000000001. Claude generates plain equality comparisons for floats without this rule.
Mocking: monkeypatch, mocker, and unittest.mock
pytest gives you three mocking tools and Claude will mix them at random without explicit guidance. Here is how they differ and when each is correct:
monkeypatch is a pytest-native fixture for patching attributes, environment variables, and sys.path entries. It resets automatically after the test. No with blocks, no decorators. Use it for simple attribute patches and env vars.
def test_feature_flag_enabled(monkeypatch):
# Arrange
monkeypatch.setenv("FEATURE_NEW_PRICING", "true")
# Act
result = is_feature_enabled("new_pricing")
# Assert
assert result is True
mocker (from pytest-mock) wraps unittest.mock in a pytest fixture. It resets all patches after the test automatically, it composes with other fixtures naturally, and it does not require a with block. Use it for everything that requires MagicMock, AsyncMock, or call assertions.
async def test_send_welcome_email(mocker, db_session):
# Arrange
mock_send = mocker.patch("myapp.services.user.send_email")
user = await create_user(db_session, email="alice@example.com")
# Act
await send_welcome_email(user)
# Assert
mock_send.assert_called_once_with(
to="alice@example.com",
subject="Welcome to MyApp",
template="welcome",
)
unittest.mock.patch as a decorator is the legacy form and it breaks pytest argument injection:
# WRONG: do not do this in pytest
@unittest.mock.patch("myapp.services.user.send_email")
def test_send_welcome_email(mock_send, db_session): # arg order inverted, db_session breaks
...
The decorator-patched mock is injected as the first argument after self, which in a plain pytest function means it is the first positional argument. If the test also takes fixtures, the order is inverted and pytest raises a confusing TypeError. This is one of the failure modes the CLAUDE.md rule NEVER use @unittest.mock.patch as a decorator closes off entirely.
One more distinction Claude misses without guidance: patching at the right import location. If myapp/services/user.py does from myapp.utils.email import send_email at the top, the name send_email inside myapp.services.user is the one you need to patch:
# Correct: patch where the name is used
mocker.patch("myapp.services.user.send_email")
# Wrong: patches the origin, not the bound name
mocker.patch("myapp.utils.email.send_email")
The second form patches the original function but the code under test already has a reference to the old name. The patch has no effect. Tests pass without exercising the mock path. Claude generates this bug in roughly half of generated mocks when the rule is not explicit.
pytest-asyncio: configuration and common failures
pytest-asyncio is the plugin that makes async tests work in pytest, and it has gone through enough configuration changes across versions that Claude's training data includes incompatible patterns side by side. The result without guidance: async tests that appear to pass but never actually ran, or a mix of @pytest.mark.asyncio decorator and asyncio_mode = "auto" that conflicts.
The single configuration decision that matters is asyncio_mode. Set it once in pyproject.toml and never touch it again:
# pyproject.toml
[tool.pytest.ini_options]
asyncio_mode = "auto"
addopts = "--cov=src --cov-fail-under=85 --strict-markers -q"
testpaths = ["tests"]
markers = [
"unit: fast, no IO",
"integration: requires database or external service",
"e2e: requires a running application instance",
]
With asyncio_mode = "auto", every async def test_* function runs in the event loop automatically. No decorator needed. Claude generates @pytest.mark.asyncio when it is missing from its context but when asyncio_mode = "auto" is configured, the decorator is redundant. With both present, some versions of pytest-asyncio raise a deprecation warning. The rule in CLAUDE.md is: configure once, no per-test decorators.
Here is a complete async test that follows all the rules:
# tests/integration/test_user_service.py
import pytest
from myapp.services.user import UserService, DuplicateEmailError
async def test_create_user_success(db_session):
# Arrange
service = UserService(db_session)
email = "alice@example.com"
# Act
user = await service.create_user(email=email, name="Alice")
# Assert
assert user.id is not None
assert user.email == email
assert user.name == "Alice"
assert user.created_at is not None
async def test_create_user_duplicate_email_raises(db_session):
# Arrange
service = UserService(db_session)
email = "bob@example.com"
await service.create_user(email=email, name="Bob")
# Act / Assert
with pytest.raises(DuplicateEmailError, match="bob@example.com"):
await service.create_user(email=email, name="Bob 2")
@pytest.mark.parametrize(
"email",
["", "not-an-email", "@nodomain.com", "a" * 256 + "@example.com"],
ids=["empty", "no-at", "no-local", "too-long"],
)
async def test_create_user_invalid_email_raises(db_session, email):
# Arrange
service = UserService(db_session)
# Act / Assert
with pytest.raises(ValueError):
await service.create_user(email=email, name="Test")
Each test has one clear behaviour it proves. The parametrized test covers four email validation cases in four lines rather than four separate functions. The pytest.raises context manager with match= is stricter than a bare raises block and Claude generates it correctly when the pattern is in CLAUDE.md.
Common failure modes Claude generates without context
Five specific patterns appear repeatedly in Claude-generated pytest suites when there is no CLAUDE.md.
Over-mocking. Claude reaches for mocks when it is uncertain about dependencies. The result is a test suite that mocks the database, mocks the service layer, and mocks the utility functions, leaving a test that verifies only that Claude can write mock assertions. Nothing about the real code is tested. The rule NEVER mock the thing you are testing is the first gate. The second gate is the fixture hierarchy: a real in-memory database with rollback isolation is almost always cheaper than a full mock, and it actually exercises your SQL.
Fixture pollution. A session-scoped fixture that writes to a database is a time bomb. The first test that runs against it leaves records that every subsequent test sees. Test order suddenly matters. Failures appear non-deterministically. The fix is the function-scoped db_session fixture with session.rollback() at yield. Claude generates this pattern correctly when it is in CLAUDE.md and generates a shared engine-level session without rollback when it is not.
Missing parametrize. Claude writes three or four near-identical test functions for boundary conditions, edge cases, and happy paths. These are functionally correct but they hide the pattern, duplicate setup code, and produce worse failure messages. The parametrize rule pushes Claude toward the table-driven style that makes adding a fifth case a one-line change.
Decorator-patched mocks. Covered above. The @unittest.mock.patch decorator on a pytest function inverts argument order and breaks fixture injection. The CLAUDE.md rule closes this off entirely.
Missing async configuration. Without asyncio_mode = "auto", Claude generates @pytest.mark.asyncio on every async test. That is fine in isolation. But when pytest-asyncio's version changes or the asyncio_mode setting is introduced later, the decorator becomes redundant or conflicting. Configuring asyncio_mode = "auto" once and prohibiting per-test decorators keeps the suite consistent.
Fixtures vs factories: when to use each
A fixture provides a ready-to-use object. A factory is a fixture that returns a callable, so you can create the object with custom parameters inside the test. The distinction matters because Claude defaults to fixtures-for-everything and produces fixtures that accept arguments via indirect=True when a factory would be simpler.
Use a plain fixture when:
- The test always needs the same configuration of the object
- The object is expensive and can be shared (session or module scope)
- The object is simple enough that parametrize covers the variation
Use a factory fixture when:
- The test needs to control the object's attributes
- You need multiple instances of the same type in one test
- The object's construction involves async calls that a parametrize argument cannot trigger
# tests/integration/conftest.py
@pytest.fixture
def make_user(db_session):
"""Factory fixture. Returns a callable, not a user instance."""
async def _factory(
email: str = "user@example.com",
name: str = "Test User",
role: str = "member",
verified: bool = True,
):
from myapp.services.user import UserService
service = UserService(db_session)
return await service.create_user(email=email, name=name, role=role, verified=verified)
return _factory
Used in a test:
async def test_admin_can_delete_member(make_user):
# Arrange
admin = await make_user(email="admin@example.com", role="admin")
member = await make_user(email="member@example.com", role="member")
# Act
result = await delete_user(actor=admin, target=member)
# Assert
assert result.deleted is True
The factory pattern also prevents a common Claude mistake: defining a parametrize that passes raw constructor arguments to a fixture via indirect=True. That pattern works but it obscures intent and makes the test harder to read. A factory is explicit about what varies.
Real database vs in-memory: the decision rule
Claude defaults to mocking the database when there is no guidance. The instinct is defensible: database tests are slower, they require setup, and they fail on infrastructure problems. But for anything beyond the simplest unit test, mocking the database hides real bugs.
The decision rule for CLAUDE.md:
## Database strategy
### Unit tests (tests/unit/)
- No database. Test pure functions, transformations, validations.
- If a function requires a database, it is not a unit test. Move it to integration/.
### Integration tests (tests/integration/)
- Use the real async SQLAlchemy engine against SQLite in-memory.
- SQLite covers 95% of SQL behaviour with zero infrastructure.
- Use Postgres for tests that rely on Postgres-specific features (JSONB queries,
array types, ON CONFLICT DO UPDATE with RETURNING, advisory locks).
- Each test gets a rolled-back transaction. No state leaks between tests.
### E2E tests (tests/e2e/)
- Use the real database from the test environment (Postgres, running in Docker or CI).
- Seed required data in a session-scoped fixture and clean up on teardown.
### NEVER
- NEVER mock SQLAlchemy internals (Session, Engine, Query).
- NEVER write a test against a fake ORM object that does not exercise SQL.
- NEVER share a database session between tests without rollback isolation.
This decision rule tells Claude which directory controls the database strategy, which means it knows what kind of test to write before it picks a tool.
Coverage gates and what to measure
Coverage thresholds are a signal, not a goal. The goal is tests that catch real regressions. A 90% coverage score with over-mocked tests catches nothing. An 80% score with real integration tests catches most things.
The useful configuration is in two places. First, pyproject.toml:
[tool.coverage.run]
source = ["src"]
omit = [
"*/tests/*",
"*/migrations/*",
"*/__init__.py",
"*/conftest.py",
"*/settings*.py",
]
[tool.coverage.report]
fail_under = 85
show_missing = true
skip_covered = false
Second, the coverage command in CI:
pytest --cov=src --cov-report=term-missing --cov-report=xml -q
The --cov-report=xml output is what CI tools like Codecov, SonarQube, and GitHub Actions coverage summaries consume. Adding it to the CLAUDE.md test command means Claude generates the right command when it writes or updates your CI configuration.
One concrete coverage rule worth adding to CLAUDE.md: never count a test toward coverage unless it contains at least one assertion. Claude sometimes generates tests that call the function under test and assert nothing, which shows as covered lines but exercises nothing. The --strict-markers flag in addopts combined with a linter rule (or a custom pytest plugin) catches this. A simpler enforcement: the AAA rule. If there is no Assert section, the test is incomplete.
For broader Claude Code test strategy and how this pytest setup fits into a polyglot project, the Claude Code testing guide covers cross-runner conventions. For understanding how CLAUDE.md is parsed at session start and why the rules section ordering matters, CLAUDE.md explained covers the mechanics. The Python-specific Claude Code with Python guide covers the broader project setup that this pytest configuration builds on.
Building a test suite Claude can extend correctly
The pytest CLAUDE.md in this guide produces a working collaboration model. Claude generates plain test functions with fixtures, not unittest.TestCase subclasses. It places fixtures at the right conftest.py level for the scope they need. It writes parametrize tables instead of duplicate test functions. It patches at the point of use with mocker, not at the origin with a decorator. It configures async tests through asyncio_mode = "auto" and does not scatter @pytest.mark.asyncio through every file. It uses the factory pattern when tests need varied object configurations.
None of this requires Claude to understand your specific domain. It requires Claude to have the pattern in context before it starts writing. That is what CLAUDE.md does. The test suite degrades gracefully without it, which is the insidious part: the tests run, the coverage number looks acceptable, and the bugs that should have been caught reach production instead.
The pattern for every new test file is now deterministic. Claude reads the CLAUDE.md rules, picks the right conftest.py level for any new fixtures, writes parametrize tables for any multi-case scenario, uses mocker for any external dependency, and structures every test as Arrange / Act / Assert. What starts as explicit rules in a config file becomes the implicit style of the entire test suite.
For the full CLAUDE.md template, the conftest.py hierarchy, async patterns, and the mock decision guide, Claudify includes a pytest-specific configuration pre-built for this collaboration model.
More like this
Ready to upgrade your Claude Code setup?
Get Claudify