logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

mcpdoc-context-retriever

A server utility adhering to the Model Context Protocol (MCP) designed to ingest a curated collection of 'llms.txt' manifest files specified by the user. It exposes an endpoint to facilitate document retrieval from the URLs listed within these manifests, primarily serving to augment the context available to Large Language Models (LLMs) and provide a transparent audit trail for all external data access during LLM operations. Integrates seamlessly with compatible IDEs and developer environments.

Author

mcpdoc-context-retriever logo

langchain-ai

MIT License

Quick Info

GitHub GitHub Stars 770
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

apismcpdocllmsllms txtmcpdoc providesretrieval llm

MCP LLMS-TXT Documentation Server

Introduction

The llms.txt indexing standard serves as a centralized repository providing foundational knowledge, operational guidelines, and actionable URLs pointing to detailed documentation in markdown format. Development environments such as Cursor and Windsurf, alongside applications like Claude Code/Desktop, leverage these llms.txt files to gather requisite context for complex tasks. A notable challenge is the variability in how these applications implement their internal tooling for file ingestion and parsing, leading to opaque retrieval chains where auditing tool invocations or verifying returned context is difficult.

MCP (Model Context Protocol) establishes an open framework granting developers granular governance over the external resources leveraged by these host applications. This specific open-source MCP server implementation offers host applications (e.g., Cursor, Windsurf, Claude Code/Desktop) two core capabilities: (1) utilization of a user-specified inventory of llms.txt manifests, and (2) exposure of a straightforward fetch_docs mechanism to retrieve content from the Uniform Resource Locators (URLs) embedded within those manifests. This architecture mandates full traceability, allowing users to scrutinize every tool execution and the resulting contextual data.

736f8f55_833d_4200_b833_5fca01a09e1b

Manifest Indexing (llms-txt)

Reference URIs for common framework llms.txt files are cataloged below:

Library llms.txt URI
LangGraph Python https://langchain-ai.github.io/langgraph/llms.txt
LangGraph JS https://langchain-ai.github.io/langgraphjs/llms.txt
LangChain Python https://python.langchain.com/llms.txt
LangChain JS https://js.langchain.com/llms.txt

Initial Setup

Installing uv Package Manager

bash curl -LsSf https://astral.sh/uv/install.sh | sh

Selecting Documentation Sources

  • Designate the desired llms.txt manifest(s). As an illustration, the LangGraph Python manifest URI is accessible here.

Security Note: Domain Access Governance

mcpdoc enforces stringent domain whitelisting for retrieval security:

  1. Remote Manifests: Specifying a remote URI (e.g., https://langchain-ai.github.io/langgraph/llms.txt) automatically whitelists the originating host (langchain-ai.github.io). Document fetching is restricted solely to this host.

  2. Local Files: When referencing a file path locally, the system does not auto-whitelist any domains. Users must explicitly define permissible hosts using the --allowed-domains argument.

  3. Expanding Scope: To permit access beyond automatically included hosts:

  4. Supply specific hosts via --allowed-domains domain1.com domain2.com.
  5. Use --allowed-domains '*' for blanket permission (recommended against for security).

This mechanism guarantees that documentation retrieval is confined exclusively to user-sanctioned origins.

(Optional) Local Testing of the MCP Server

To validate the server functionality with your chosen llms.txt sources locally: bash uvx --from mcpdoc mcpdoc \ --urls "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt" "LangChain:https://python.langchain.com/llms.txt" \ --transport sse \ --port 8082 \ --host localhost

  • The service will be accessible at: http://localhost:8082

Screenshot 2025-03-18 at 3 29 30 PM

  • Launch the MCP inspector client and connect to the running service endpoint: bash npx @modelcontextprotocol/inspector

Screenshot 2025-03-18 at 3 30 30 PM

  • The inspector interface permits direct simulation and verification of tool invocations.

Integration with Cursor

  • Navigate to Cursor Settings and select the MCP tab.
  • This action opens the configuration file located at ~/.cursor/mcp.json.

Screenshot 2025-03-19 at 11 01 31 AM

  • Insert the following configuration block, using langgraph-docs-mcp as the server identifier, pointing to the LangGraph manifest:

{ "mcpServers": { "langgraph-docs-mcp": { "command": "uvx", "args": [ "--from", "mcpdoc", "mcpdoc", "--urls", "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt LangChain:https://python.langchain.com/llms.txt", "--transport", "stdio" ] } } }

  • Verify that the server status is active within the Cursor Settings/MCP panel.
  • It is highly recommended to augment Cursor's Global (User) Rules:

for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- + call list_doc_sources tool to get the available llms.txt file + call fetch_docs tool to read it + reflect on the urls in llms.txt + reflect on the input question + call fetch_docs on any urls relevant to the question + use this to answer the question

  • Initiate a chat session using CMD+L (macOS) and ensure the agent mode is selected.

Screenshot 2025-03-18 at 1 56 54 PM

Test with a query, such as:

what are types of memory in LangGraph?

Screenshot 2025-03-18 at 1 58 38 PM

Integration with Windsurf

  • Open the application interface via CMD+L (macOS).
  • Select Configure MCP to access and edit the configuration file: ~/.codeium/windsurf/mcp_config.json.
  • Incorporate the langgraph-docs-mcp definition as demonstrated previously.

Screenshot 2025-03-19 at 11 02 52 AM

  • Update Windsurf Rules/Global rules accordingly:

for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- + call list_doc_sources tool to get the available llms.txt file + call fetch_docs tool to read it + reflect on the urls in llms.txt + reflect on the input question + call fetch_docs on any urls relevant to the question

Screenshot 2025-03-18 at 2 02 12 PM

  • Execute the test query; tool invocations should commence automatically.

Screenshot 2025-03-18 at 2 03 07 PM

Integration with Claude Desktop

  • Access Settings/Developer within the application to modify ~/Library/Application\ Support/Claude/claude_desktop_config.json.
  • Integrate the langgraph-docs-mcp configuration.
  • A restart of the Claude Desktop application is required.

[!Note] Should Python version conflicts arise during MCPDoc tool addition in Claude Desktop, the executable path for Python can be hardcoded within the uvx execution arguments.

Configuration Example

{ "mcpServers": { "langgraph-docs-mcp": { "command": "uvx", "args": [ "--python", "/path/to/python", "--from", "mcpdoc", "mcpdoc", "--urls", "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt", "--transport", "stdio" ] } } }

[!Note] As of 03/21/25, Claude Desktop appears to lack native support for global rule injection; therefore, prepend the following directive to your prompt input:

for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- + call list_doc_sources tool to get the available llms.txt file + call fetch_docs tool to read it + reflect on the urls in llms.txt + reflect on the input question + call fetch_docs on any urls relevant to the question

Screenshot 2025-03-18 at 2 05 54 PM

  • The active tools will be displayed in the lower-right quadrant adjacent to the chat input field.

Screenshot 2025-03-18 at 2 05 39 PM

  • Execute the sample query; the system will prompt for tool call authorization during processing.

Screenshot 2025-03-18 at 2 06 54 PM

Integration with Claude Code

  • Following installation of Claude Code, execute this command in a shell to register the MCP server within your project context:

claude mcp add-json langgraph-docs '{"type":"stdio","command":"uvx" ,"args":["--from", "mcpdoc", "mcpdoc", "--urls", "langgraph:https://langchain-ai.github.io/langgraph/llms.txt", "LangChain:https://python.langchain.com/llms.txt"]}' -s local

  • The ~/.claude.json file will be updated.
  • Verify tool visibility by starting Claude Code and running the command:

$ Claude $ /mcp

Screenshot 2025-03-18 at 2 13 49 PM

[!Note] Similar to Claude Desktop, Claude Code (as of 3/21/25) may not support global rule application. Prepend the directive to your prompt:

for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- + call list_doc_sources tool to get the available llms.txt file + call fetch_docs tool to read it + reflect on the urls in llms.txt + reflect on the input question + call fetch_docs on any urls relevant to the question

  • Tool confirmation prompts will appear during query execution.

Screenshot 2025-03-18 at 2 14 37 PM

Command-Line Interface Reference

The mcpdoc utility facilitates launching the documentation service via a CLI. Documentation sources can be specified through any combination of the following three methodologies:

  1. YAML Configuration File:
  2. This loads documentation definitions from sample_config.yaml within this repository (e.g., LangGraph Python docs).

bash mcpdoc --yaml sample_config.yaml

  1. JSON Configuration File:
  2. This loads documentation definitions from sample_config.json (e.g., LangGraph Python docs).

bash mcpdoc --json sample_config.json

  1. Direct URL Specification:
  2. URIs can be provided raw or paired with an optional label using the name:url syntax.
  3. Multiple sources are registered by invoking the --urls flag repeatedly.
  4. This method was used for the initial MCP server setup described above.

bash mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt

These configuration methods can be aggregated for merged source loading:

bash mcpdoc --yaml sample_config.yaml --json sample_config.json --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt

Supplementary Runtime Parameters

  • --follow-redirects: Instructs the HTTP client to traverse redirects (Default: False).
  • --timeout SECONDS: Specifies the maximum allowed duration in seconds for an HTTP request (Default: 10.0).

Example employing extra parameters:

bash mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15

This command loads documentation from the YAML file, enables redirect following, and sets the request deadline to 15 seconds.

Configuration Structure

Both JSON and YAML input files must contain a list structure defining the documentation sources. Each entry requires the llms_txt URI and optionally accepts a descriptive name field:

YAML Example (sample_config.yaml)

yaml

Configuration schema for the mcp-mcpdoc server

Each element necessitates an llms_txt URI and permits an optional name field

  • name: LangGraph Python llms_txt: https://langchain-ai.github.io/langgraph/llms.txt

JSON Example (sample_config.json)

[ { "name": "LangGraph Python", "llms_txt": "https://langchain-ai.github.io/langgraph/llms.txt" } ]

Programmatic Interface

python from mcpdoc.main import create_server

Assemble the server instance with defined documentation feeds

server = create_server( [ { "name": "LangGraph Python", "llms_txt": "https://langchain-ai.github.io/langgraph/llms.txt", }, # Additional feeds can be appended here: # { # "name": "Another Documentation Set", # "llms_txt": "https://example.com/llms.txt", # }, ], follow_redirects=True, timeout=15.0, )

Initiate server operation using standard I/O transport

server.run(transport="stdio")


Reference: XMLHttpRequest (XHR)

XMLHttpRequest (XHR) defines a JavaScript object interface enabling a web browser environment to dispatch HTTP requests to a remote server. This API allows client-side applications to exchange data with the server asynchronously following initial page load. XHR is foundational to the Ajax programming methodology. Prior to its widespread adoption, server communication relied primarily on standard hyperlink navigation or form submissions, actions that typically necessitated a full page refresh.

== Historical Context == The genesis of the XMLHttpRequest concept traces back to 2000, credited to the team developing Microsoft Outlook. This concept was first realized in Internet Explorer 5 (1999). The initial implementation did not utilize the standardized XMLHttpRequest identifier; rather, it relied on COM object instantiations like ActiveXObject("Msxml2.XMLHTTP") or ActiveXObject("Microsoft.XMLHTTP"). By the release of Internet Explorer 7 (2006), virtually all major browser engines incorporated native support for the official XMLHttpRequest identifier.

This identifier is now the universal protocol across modern browsers, encompassing Mozilla’s Gecko engine (2002), Safari 1.2 (2004), and Opera 8.0 (2005).

=== Standardization Efforts === The World Wide Web Consortium (W3C) published its initial Working Draft specification for the XMLHttpRequest object on April 5, 2006. This was succeeded by the Level 2 Working Draft on February 25, 2008, which introduced enhancements such as progress monitoring, enabling cross-site data exchange, and facilitating the handling of raw byte streams. By late 2011, the Level 2 features were integrated back into the primary specification. Development responsibility transitioned to the WHATWG near the close of 2012, where it is currently maintained as a living document defined using Web IDL.

== Operational Flow == Executing a request via XMLHttpRequest generally involves a predictable sequence of programming steps:

  1. Object Instantiation: Create an XMLHttpRequest instance via its constructor.
  2. Configuration: Invoke the open method to define the request method (GET, POST, etc.), designate the target resource URI, and specify synchronous or asynchronous execution mode.
  3. Asynchronous Listener Setup: For asynchronous calls, a handler must be assigned to monitor state changes throughout the request lifecycle.
  4. Transmission: Trigger the communication sequence by calling the send method, optionally carrying request payload data.
  5. Response Processing: Monitor the state changes within the event listener. Upon server data delivery, it is typically aggregated in the responseText attribute. The process concludes when the state transitions to 4 (the 'done' state).

Beyond these fundamentals, XHR offers extensive control over request framing and response parsing. Custom request headers can be injected to guide server behavior. Data can be transmitted to the server via the send payload. Responses can be deserialized from formats like JSON into usable JavaScript objects immediately, or processed incrementally as data streams arrive. Requests are also capable of being preemptively terminated or configured to fail if a predefined time limit is exceeded.

== Cross-Origin Communication ==

During the early architecture of the World Wide Web, security limitations prevented script-initiated requests from one domain accessing resources on a different domain, an issue that required subsequent invention...

See Also

`