mcpdoc-context-retriever
A server utility adhering to the Model Context Protocol (MCP) designed to ingest a curated collection of 'llms.txt' manifest files specified by the user. It exposes an endpoint to facilitate document retrieval from the URLs listed within these manifests, primarily serving to augment the context available to Large Language Models (LLMs) and provide a transparent audit trail for all external data access during LLM operations. Integrates seamlessly with compatible IDEs and developer environments.
Author

langchain-ai
Quick Info
Actions
Tags
MCP LLMS-TXT Documentation Server
Introduction
The llms.txt indexing standard serves as a centralized repository providing foundational knowledge, operational guidelines, and actionable URLs pointing to detailed documentation in markdown format. Development environments such as Cursor and Windsurf, alongside applications like Claude Code/Desktop, leverage these llms.txt files to gather requisite context for complex tasks. A notable challenge is the variability in how these applications implement their internal tooling for file ingestion and parsing, leading to opaque retrieval chains where auditing tool invocations or verifying returned context is difficult.
MCP (Model Context Protocol) establishes an open framework granting developers granular governance over the external resources leveraged by these host applications. This specific open-source MCP server implementation offers host applications (e.g., Cursor, Windsurf, Claude Code/Desktop) two core capabilities: (1) utilization of a user-specified inventory of llms.txt manifests, and (2) exposure of a straightforward fetch_docs mechanism to retrieve content from the Uniform Resource Locators (URLs) embedded within those manifests. This architecture mandates full traceability, allowing users to scrutinize every tool execution and the resulting contextual data.
Manifest Indexing (llms-txt)
Reference URIs for common framework llms.txt files are cataloged below:
| Library | llms.txt URI |
|---|---|
| LangGraph Python | https://langchain-ai.github.io/langgraph/llms.txt |
| LangGraph JS | https://langchain-ai.github.io/langgraphjs/llms.txt |
| LangChain Python | https://python.langchain.com/llms.txt |
| LangChain JS | https://js.langchain.com/llms.txt |
Initial Setup
Installing uv Package Manager
- Installation instructions can be found on the official uv documentation for alternative methods.
bash curl -LsSf https://astral.sh/uv/install.sh | sh
Selecting Documentation Sources
- Designate the desired
llms.txtmanifest(s). As an illustration, the LangGraph Python manifest URI is accessible here.
Security Note: Domain Access Governance
mcpdoc enforces stringent domain whitelisting for retrieval security:
Remote Manifests: Specifying a remote URI (e.g.,
https://langchain-ai.github.io/langgraph/llms.txt) automatically whitelists the originating host (langchain-ai.github.io). Document fetching is restricted solely to this host.Local Files: When referencing a file path locally, the system does not auto-whitelist any domains. Users must explicitly define permissible hosts using the
--allowed-domainsargument.Expanding Scope: To permit access beyond automatically included hosts:
- Supply specific hosts via
--allowed-domains domain1.com domain2.com.- Use
--allowed-domains '*'for blanket permission (recommended against for security).This mechanism guarantees that documentation retrieval is confined exclusively to user-sanctioned origins.
(Optional) Local Testing of the MCP Server
To validate the server functionality with your chosen llms.txt sources locally:
bash
uvx --from mcpdoc mcpdoc \
--urls "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt" "LangChain:https://python.langchain.com/llms.txt" \
--transport sse \
--port 8082 \
--host localhost
- The service will be accessible at: http://localhost:8082

- Launch the MCP inspector client and connect to the running service endpoint: bash npx @modelcontextprotocol/inspector

- The inspector interface permits direct simulation and verification of
toolinvocations.
Integration with Cursor
- Navigate to
Cursor Settingsand select theMCPtab. - This action opens the configuration file located at
~/.cursor/mcp.json.

- Insert the following configuration block, using
langgraph-docs-mcpas the server identifier, pointing to the LangGraph manifest:
{ "mcpServers": { "langgraph-docs-mcp": { "command": "uvx", "args": [ "--from", "mcpdoc", "mcpdoc", "--urls", "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt LangChain:https://python.langchain.com/llms.txt", "--transport", "stdio" ] } } }
- Verify that the server status is active within the
Cursor Settings/MCPpanel. - It is highly recommended to augment Cursor's Global (User) Rules:
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- + call list_doc_sources tool to get the available llms.txt file + call fetch_docs tool to read it + reflect on the urls in llms.txt + reflect on the input question + call fetch_docs on any urls relevant to the question + use this to answer the question
- Initiate a chat session using
CMD+L(macOS) and ensure theagentmode is selected.

Test with a query, such as:
what are types of memory in LangGraph?

Integration with Windsurf
- Open the application interface via
CMD+L(macOS). - Select
Configure MCPto access and edit the configuration file:~/.codeium/windsurf/mcp_config.json. - Incorporate the
langgraph-docs-mcpdefinition as demonstrated previously.

- Update
Windsurf Rules/Global rulesaccordingly:
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- + call list_doc_sources tool to get the available llms.txt file + call fetch_docs tool to read it + reflect on the urls in llms.txt + reflect on the input question + call fetch_docs on any urls relevant to the question

- Execute the test query; tool invocations should commence automatically.

Integration with Claude Desktop
- Access
Settings/Developerwithin the application to modify~/Library/Application\ Support/Claude/claude_desktop_config.json. - Integrate the
langgraph-docs-mcpconfiguration. - A restart of the Claude Desktop application is required.
[!Note] Should Python version conflicts arise during MCPDoc tool addition in Claude Desktop, the executable path for Python can be hardcoded within the
uvxexecution arguments.
Configuration Example
{ "mcpServers": { "langgraph-docs-mcp": { "command": "uvx", "args": [ "--python", "/path/to/python", "--from", "mcpdoc", "mcpdoc", "--urls", "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt", "--transport", "stdio" ] } } }
[!Note] As of 03/21/25, Claude Desktop appears to lack native support for global rule injection; therefore, prepend the following directive to your prompt input:

- The active tools will be displayed in the lower-right quadrant adjacent to the chat input field.

- Execute the sample query; the system will prompt for tool call authorization during processing.

Integration with Claude Code
- Following installation of Claude Code, execute this command in a shell to register the MCP server within your project context:
claude mcp add-json langgraph-docs '{"type":"stdio","command":"uvx" ,"args":["--from", "mcpdoc", "mcpdoc", "--urls", "langgraph:https://langchain-ai.github.io/langgraph/llms.txt", "LangChain:https://python.langchain.com/llms.txt"]}' -s local
- The
~/.claude.jsonfile will be updated. - Verify tool visibility by starting Claude Code and running the command:
$ Claude $ /mcp

[!Note] Similar to Claude Desktop, Claude Code (as of 3/21/25) may not support global rule application. Prepend the directive to your prompt:
- Tool confirmation prompts will appear during query execution.

Command-Line Interface Reference
The mcpdoc utility facilitates launching the documentation service via a CLI. Documentation sources can be specified through any combination of the following three methodologies:
- YAML Configuration File:
- This loads documentation definitions from
sample_config.yamlwithin this repository (e.g., LangGraph Python docs).
bash mcpdoc --yaml sample_config.yaml
- JSON Configuration File:
- This loads documentation definitions from
sample_config.json(e.g., LangGraph Python docs).
bash mcpdoc --json sample_config.json
- Direct URL Specification:
- URIs can be provided raw or paired with an optional label using the
name:urlsyntax. - Multiple sources are registered by invoking the
--urlsflag repeatedly. - This method was used for the initial MCP server setup described above.
bash mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
These configuration methods can be aggregated for merged source loading:
bash mcpdoc --yaml sample_config.yaml --json sample_config.json --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
Supplementary Runtime Parameters
--follow-redirects: Instructs the HTTP client to traverse redirects (Default:False).--timeout SECONDS: Specifies the maximum allowed duration in seconds for an HTTP request (Default:10.0).
Example employing extra parameters:
bash mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15
This command loads documentation from the YAML file, enables redirect following, and sets the request deadline to 15 seconds.
Configuration Structure
Both JSON and YAML input files must contain a list structure defining the documentation sources. Each entry requires the llms_txt URI and optionally accepts a descriptive name field:
YAML Example (sample_config.yaml)
yaml
Configuration schema for the mcp-mcpdoc server
Each element necessitates an llms_txt URI and permits an optional name field
- name: LangGraph Python llms_txt: https://langchain-ai.github.io/langgraph/llms.txt
JSON Example (sample_config.json)
[ { "name": "LangGraph Python", "llms_txt": "https://langchain-ai.github.io/langgraph/llms.txt" } ]
Programmatic Interface
python from mcpdoc.main import create_server
Assemble the server instance with defined documentation feeds
server = create_server( [ { "name": "LangGraph Python", "llms_txt": "https://langchain-ai.github.io/langgraph/llms.txt", }, # Additional feeds can be appended here: # { # "name": "Another Documentation Set", # "llms_txt": "https://example.com/llms.txt", # }, ], follow_redirects=True, timeout=15.0, )
Initiate server operation using standard I/O transport
server.run(transport="stdio")
Reference: XMLHttpRequest (XHR)
XMLHttpRequest (XHR) defines a JavaScript object interface enabling a web browser environment to dispatch HTTP requests to a remote server. This API allows client-side applications to exchange data with the server asynchronously following initial page load. XHR is foundational to the Ajax programming methodology. Prior to its widespread adoption, server communication relied primarily on standard hyperlink navigation or form submissions, actions that typically necessitated a full page refresh.
== Historical Context ==
The genesis of the XMLHttpRequest concept traces back to 2000, credited to the team developing Microsoft Outlook. This concept was first realized in Internet Explorer 5 (1999). The initial implementation did not utilize the standardized XMLHttpRequest identifier; rather, it relied on COM object instantiations like ActiveXObject("Msxml2.XMLHTTP") or ActiveXObject("Microsoft.XMLHTTP"). By the release of Internet Explorer 7 (2006), virtually all major browser engines incorporated native support for the official XMLHttpRequest identifier.
This identifier is now the universal protocol across modern browsers, encompassing Mozilla’s Gecko engine (2002), Safari 1.2 (2004), and Opera 8.0 (2005).
=== Standardization Efforts === The World Wide Web Consortium (W3C) published its initial Working Draft specification for the XMLHttpRequest object on April 5, 2006. This was succeeded by the Level 2 Working Draft on February 25, 2008, which introduced enhancements such as progress monitoring, enabling cross-site data exchange, and facilitating the handling of raw byte streams. By late 2011, the Level 2 features were integrated back into the primary specification. Development responsibility transitioned to the WHATWG near the close of 2012, where it is currently maintained as a living document defined using Web IDL.
== Operational Flow == Executing a request via XMLHttpRequest generally involves a predictable sequence of programming steps:
- Object Instantiation: Create an XMLHttpRequest instance via its constructor.
- Configuration: Invoke the
openmethod to define the request method (GET, POST, etc.), designate the target resource URI, and specify synchronous or asynchronous execution mode. - Asynchronous Listener Setup: For asynchronous calls, a handler must be assigned to monitor state changes throughout the request lifecycle.
- Transmission: Trigger the communication sequence by calling the
sendmethod, optionally carrying request payload data. - Response Processing: Monitor the state changes within the event listener. Upon server data delivery, it is typically aggregated in the
responseTextattribute. The process concludes when the state transitions to 4 (the 'done' state).
Beyond these fundamentals, XHR offers extensive control over request framing and response parsing. Custom request headers can be injected to guide server behavior. Data can be transmitted to the server via the send payload. Responses can be deserialized from formats like JSON into usable JavaScript objects immediately, or processed incrementally as data streams arrive. Requests are also capable of being preemptively terminated or configured to fail if a predefined time limit is exceeded.
== Cross-Origin Communication ==
During the early architecture of the World Wide Web, security limitations prevented script-initiated requests from one domain accessing resources on a different domain, an issue that required subsequent invention...
