May 8, 2026·12 min read

Claude Code with LangChain: Building AI Agents

Claude CodeLangChainWorkflowAgents

Using Claude Code to build agents with LangChain

There is a specific kind of recursion in using Claude Code to build LangChain agents. You are using an AI coding tool to write code that defines how AI tools work together. The recursion is productive rather than confusing, but only if the development environment is configured correctly.

Claude Code understands LangChain. It knows chains, agents, tools, memory, retrievers, document loaders, LangGraph state machines, and the full LCEL (LangChain Expression Language) syntax. What it does not know is your project: which LLM provider you are calling, how your tools are structured, what state schema your graph uses, where memory is persisted, and how your agent loop handles errors and retries.

Without a project-specific CLAUDE.md, Claude generates LangChain code that mixes LCEL and legacy chain syntax, invents tool signatures that do not match your schemas, writes agent loops without proper error handling, and generates memory patterns that do not persist across the session boundaries your application requires.

This guide covers the CLAUDE.md configuration and patterns that prevent those failures. If you are new to Claude Code, the Claude Code setup guide covers installation and authentication before any of this applies.

The LangChain CLAUDE.md

The CLAUDE.md at your project root is read before every Claude Code session. For a LangChain project, it needs to answer: which LangChain version and Python version are in use, which LLM providers are active, how are chains and agents structured, how is state managed in LangGraph, and what are the hard rules for tool definitions?

# LangChain project rules

## Stack
- Python: 3.12
- langchain: 0.3.x (LCEL syntax only, no legacy chain classes)
- langchain-core: 0.3.x
- langchain-community: 0.3.x
- langgraph: 0.2.x
- LLM provider: langchain-anthropic (claude-3-5-sonnet-20241022 as default)
- Embeddings: langchain-openai (text-embedding-3-small) or local via Ollama
- Vector store: Chroma (local dev), Pinecone (production)
- Memory persistence: SQLite (local dev) via langgraph.checkpoint.sqlite

## Project structure
- chains/: LCEL chain definitions, one file per chain purpose
- agents/: LangGraph agent definitions, one file per agent
- tools/: Tool definitions, one file per tool or tool group
- state/: TypedDict state schemas for LangGraph graphs
- memory/: Memory and checkpoint configuration
- retrievers/: Retriever setup and configuration
- prompts/: Prompt templates (hub pulls or local PromptTemplate definitions)

## LangChain syntax rules
- Use LCEL pipe syntax (chain = prompt | model | parser), not legacy LLMChain
- Use invoke() for single calls, batch() for parallel, stream() for streaming
- Use ChatPromptTemplate.from_messages(), not PromptTemplate for chat models
- Use RunnableConfig for configurable chains (temperature overrides, callbacks)
- No deprecated classes: LLMChain, ConversationChain, RetrievalQA, use LCEL equivalents

## Running the project
- Dev: `python -m uvicorn app:app --reload`
- Tests: `pytest tests/` (requires OPENAI_API_KEY and ANTHROPIC_API_KEY in .env)
- LangSmith tracing: enabled via LANGCHAIN_TRACING_V2=true in .env.local

## Hard rules
- All tool functions must have a complete docstring (LangChain uses it as the tool description)
- All agent state schemas use TypedDict with explicit field types
- NEVER hardcode API keys, read from environment variables only
- Tool functions must handle their own exceptions and return structured error messages
- LangGraph nodes must be pure functions: same inputs always produce same outputs (no global state mutation)

Three rules in this CLAUDE.md prevent the most common Claude Code failures with LangChain.

The LCEL-only rule is the most important. LangChain's codebase contains both the modern LCEL syntax and the legacy chain classes (LLMChain, ConversationChain, RetrievalQA). Claude's training includes significant legacy LangChain code. Without the explicit rule, Claude will mix both syntaxes across a project, creating chains that cannot be composed with the pipe operator and that have different invocation patterns. LCEL-only gives Claude a single mental model to work from.

The tool docstring rule is critical because LangChain uses the function docstring as the tool description that is sent to the LLM. An empty or vague docstring produces a tool the LLM does not know how to use correctly. Claude will write thorough docstrings when told this is a functional requirement rather than a style preference.

The pure function rule for LangGraph nodes prevents a category of graph design bug where nodes have side effects that change global state between invocations. Stateless nodes are composable and testable. Stateful nodes cause behavior that changes based on execution history, which is the opposite of what a graph should do.

Chain patterns with LCEL

LCEL is where Claude Code produces its best LangChain output. The pipe syntax is structured and readable, and Claude generates correct chain compositions quickly once the conventions are clear.

Add a chain patterns section to your CLAUDE.md:

## Chain patterns

### Basic chain with output parsing
from langchain_core.prompts import ChatPromptTemplate
from langchain_anthropic import ChatAnthropic
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel

model = ChatAnthropic(model="claude-3-5-sonnet-20241022")

# String output
chain = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("human", "{question}")
]) | model | StrOutputParser()

# Structured output
class AnalysisResult(BaseModel):
    summary: str
    confidence: float
    tags: list[str]

structured_chain = prompt | model.with_structured_output(AnalysisResult)

### Branching with RunnableBranch
from langchain_core.runnables import RunnableBranch, RunnableLambda

router = RunnableBranch(
    (lambda x: x["intent"] == "search", search_chain),
    (lambda x: x["intent"] == "summarize", summary_chain),
    fallback_chain  # default
)

### Parallel execution
from langchain_core.runnables import RunnableParallel

parallel_chain = RunnableParallel(
    summary=summary_chain,
    keywords=keyword_chain,
    sentiment=sentiment_chain
)

### Always pass RunnableConfig for tracing
from langchain_core.runnables import RunnableConfig
result = chain.invoke({"question": "..."}, config=RunnableConfig(tags=["production"]))

The with_structured_output pattern is worth highlighting. Instead of asking the model to return JSON and parsing it manually, with_structured_output wraps the model call in a function that validates and parses the output against a Pydantic schema. Claude uses this pattern consistently when it is in CLAUDE.md, producing chain outputs that are typed from the start rather than requiring downstream null checks.

The RunnableParallel pattern is how you run multiple chains on the same input concurrently. Claude will generate this correctly once you establish it in CLAUDE.md. Without it, Claude tends to chain operations sequentially even when they could be parallelized, which adds unnecessary latency to chains that call the LLM multiple times per invocation.

LangGraph agent design

LangGraph is where the complexity of Claude Code integration increases. A LangGraph graph has state, nodes, edges, conditional routing, and checkpointing. Claude needs to understand your specific graph architecture to write nodes that fit the existing state schema.

Add to CLAUDE.md:

## LangGraph conventions

### State schema (state/agent_state.py)
from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage
import operator

class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    next: str
    context: dict
    error: str | None

### Node pattern (pure functions)
from langgraph.graph import StateGraph, END
from langchain_core.messages import AIMessage

def call_model(state: AgentState) -> AgentState:
    response = model.invoke(state["messages"])
    return {"messages": [response]}

def call_tool(state: AgentState) -> AgentState:
    last_message = state["messages"][-1]
    # Execute tool, return result as ToolMessage
    ...

### Conditional edge pattern
def should_continue(state: AgentState) -> str:
    last_message = state["messages"][-1]
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    return END

### Graph construction
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_node("tools", call_tool)
workflow.add_edge("tools", "agent")
workflow.add_conditional_edges("agent", should_continue)
workflow.set_entry_point("agent")
graph = workflow.compile(checkpointer=checkpointer)

### Checkpointing (memory persistence)
from langgraph.checkpoint.sqlite import SqliteSaver
checkpointer = SqliteSaver.from_conn_string("checkpoints.db")

The Annotated[Sequence[BaseMessage], operator.add] pattern for the messages field is the standard LangGraph approach for an append-only message history. The operator.add annotation tells LangGraph to append new messages to the existing list rather than replacing it. Claude will generate this correctly when it is in your state schema example. Without it, Claude may generate state updates that overwrite the message history entirely.

The pure function rule for nodes connects to testability. A node that takes AgentState and returns a partial AgentState update can be unit tested in isolation. A node that reads from a global model instance or modifies shared state cannot. Claude respects this constraint when it is explicit.

Tool definitions that work

Tools are the most common area where Claude Code generates code that looks correct but fails at runtime. LangChain's tool system uses the function signature, type annotations, and docstring to generate the tool description sent to the model. Missing type annotations, vague docstrings, or unhandled exceptions all produce tools that behave unexpectedly.

Add to CLAUDE.md:

## Tool conventions (tools/)

### Standard tool pattern
from langchain_core.tools import tool
from pydantic import BaseModel, Field

class SearchInput(BaseModel):
    query: str = Field(description="The search query to execute")
    max_results: int = Field(default=5, description="Maximum number of results to return")

@tool(args_schema=SearchInput)
def search_web(query: str, max_results: int = 5) -> str:
    """
    Search the web for current information on a topic.
    
    Use this tool when you need recent information not in your training data,
    or to verify facts about current events, prices, or rapidly changing topics.
    
    Returns a formatted string with search results including titles, URLs,
    and brief summaries. Returns an error message if the search fails.
    """
    try:
        results = search_client.search(query, max_results=max_results)
        return format_search_results(results)
    except Exception as e:
        return f"Search failed: {str(e)}. Try rephrasing the query."

### Docstring requirements
- First line: single sentence describing what the tool does
- Second paragraph: when to use this tool (the model reads this to decide)
- Third paragraph: what the return value looks like
- Always: how errors are surfaced

### Return types
- Return str for all tools (easier for the model to interpret)
- Format structured data as readable text, not JSON dumps
- Return error descriptions as strings, not raise exceptions

The Pydantic args_schema pattern is the structured alternative to docstring-only tool definitions. It gives LangChain a typed schema for the tool arguments, which the model uses to generate well-formed tool calls. Claude generates complete args_schema classes when the pattern is in CLAUDE.md, which significantly reduces malformed tool calls at runtime.

The error-as-string rule is important for agent loops. If a tool raises an exception, LangGraph catches it and may terminate the agent or loop endlessly. If a tool returns a descriptive error string, the agent can decide whether to retry, use a different tool, or tell the user it failed. Claude will add try/except blocks with informative error returns when the pattern is established.

Memory and persistence

Memory is where LangChain development has the most fragmentation. LangChain supports multiple memory types (conversation buffer, entity, summary, vector store), and the right choice depends on your use case. Without guidance, Claude will generate the simplest memory type for every use case, which is often not the right one.

Add to CLAUDE.md:

## Memory conventions

### Conversation memory: use LangGraph checkpointing (not ConversationBufferMemory)
# Thread-based conversation history
config = {"configurable": {"thread_id": user_id}}
result = graph.invoke({"messages": [HumanMessage(content=message)]}, config=config)

### Long-term memory: vector store with user-scoped namespace
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

def get_user_vectorstore(user_id: str) -> Chroma:
    return Chroma(
        collection_name=f"user_{user_id}_memory",
        embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
        persist_directory="./chroma_data"
    )

### Entity memory: store as structured data in state
# Add to AgentState:
entities: dict[str, dict]  # { "entity_name": { "type": ..., "facts": [...] } }

### Summary memory: summarize every N messages, store in state
# When len(messages) > 20: summarize oldest 10 → store as SystemMessage at position 0

The checkpointing-over-ConversationBufferMemory distinction matters for production applications. ConversationBufferMemory stores messages in memory only, which is lost when the process restarts. LangGraph checkpointing persists the full graph state including message history to SQLite or a Postgres backend. For any application where conversation continuity matters across sessions, checkpointing is the correct pattern.

The user-scoped vector store namespace is the standard approach for multi-user memory. Each user gets their own Chroma collection, which prevents memory from bleeding between users. Claude will implement this when it is in CLAUDE.md rather than creating a single shared collection.

Testing LangChain applications

Testing LangChain code without making real LLM calls for every test is essential for fast iteration. Claude Code will generate tests that call the LLM by default, which makes the test suite slow and expensive.

Add to CLAUDE.md:

## Testing conventions

### Mock the LLM in unit tests
from unittest.mock import patch, MagicMock
from langchain_core.messages import AIMessage

def test_chain_invocation():
    mock_response = AIMessage(content="Mocked response")
    with patch.object(model, 'invoke', return_value=mock_response):
        result = chain.invoke({"question": "test"})
    assert "Mocked response" in result

### Use FakeListChatModel for deterministic output
from langchain_core.language_models.fake_chat_models import FakeListChatModel

fake_model = FakeListChatModel(responses=["First response", "Second response"])
test_chain = prompt | fake_model | StrOutputParser()

### Integration tests use real LLMs but are tagged
@pytest.mark.integration
def test_agent_loop_end_to_end():
    ...  # Runs with real LLM, excluded from standard test run

### pytest.ini
[pytest]
markers =
    integration: marks tests as integration tests (deselect with '-m "not integration"')

### Run unit tests: pytest -m "not integration"
### Run all tests: pytest

The FakeListChatModel approach is the cleanest way to test chain logic without LLM calls. You define the sequence of responses the fake model will return, and the chain processes them as if they came from a real model. Claude generates this pattern when it is in CLAUDE.md, producing a test suite where the chain logic is tested in milliseconds.

Claude Code permission hooks for agent projects

LangChain and LangGraph applications often include scripts for testing agent runs, seeding vector stores, and clearing checkpoint databases. These range from safe (running a test agent invocation) to potentially expensive (bulk-processing documents into a vector store) to destructive (clearing all checkpoints).

In .claude/settings.local.json:

{
  "permissions": {
    "allow": [
      "Bash(pytest -m 'not integration'*)",
      "Bash(python -m chains.*)",
      "Bash(python -m tools.*)"
    ],
    "deny": [
      "Bash(pytest -m integration*)",
      "Bash(python -m scripts.seed_vectorstore*)",
      "Bash(python -m scripts.clear_checkpoints*)",
      "Bash(python -m scripts.bulk_process*)"
    ]
  }
}

This configuration lets Claude run fast unit tests and invoke individual chain and tool modules, but gates the expensive and destructive operations: integration tests that call real LLMs, vector store seeding (which incurs embedding API costs), checkpoint clearing, and bulk processing scripts. For how permission hooks work across project types, the Claude Code custom agents guide covers agent-specific permission patterns in more depth.

What Claude Code handles well and where to review

Claude Code generates excellent LangChain code in several areas. Chain composition with LCEL is clean and correct. Tool definitions with Pydantic schemas are structured. LangGraph node functions are well-formed. Retrieval chain patterns with vector store integration are accurate.

Two areas warrant manual review. The first is prompt templates. Claude will generate prompts that work but may not be optimal for the specific task or model. Review the system and human message content, especially for agent system prompts where the instruction clarity directly affects tool use accuracy. The second is conditional edge logic. LangGraph routing functions determine which node executes next based on state. Claude generates these correctly for simple cases, but for graphs with more than three conditional branches, review the routing logic to ensure all state combinations are handled.

Building on a solid foundation

The LangChain CLAUDE.md configuration in this guide produces an agent development environment where chain composition follows consistent patterns, LangGraph graphs have typed state and pure node functions, tools have complete docstrings and structured schemas, and tests run fast without real LLM calls.

This is how to use Claude Code effectively for AI orchestration work. If you want to go deeper on the agent design side, the Claude Code best practices guide covers the configuration principles and workflow habits that apply across all agentic projects. For MCP server integration, which extends what your LangChain agents can call as tools, the Claude Code MCP servers guide covers the setup and configuration.

The same principle applies here as in any complex framework integration: Claude Code performs at the level of the context you give it. A LangChain project without CLAUDE.md produces Claude that mixes legacy and modern syntax, writes tools without proper docstrings, and builds agent loops without error handling. A project with the configuration above produces Claude that follows your conventions from the first line, builds composable chains, and tests correctly without API calls. Claudify includes a LangChain-specific CLAUDE.md template as part of the Claude Code workflow kit, pre-configured for LangGraph state machines, LCEL chains, and Anthropic model conventions.