mcp-firecrawl-interface
A utility package for conducting web content retrieval and structured information serialization utilizing the Firecrawl service endpoints, complete with integrated telemetry for latency measurement and fault diagnosis. It facilitates fetching material across diverse serializations and bespoke data models.
Author

codyde
Quick Info
Actions
Tags
MCP Firecrawl Abstraction Layer
This module furnishes an abstraction layer atop Firecrawl's Application Programming Interfaces (APIs) to facilitate the retrieval of website artifacts and the transformation of retrieved content into structured data objects.
Initialization Prerequisites
-
Dependency installation: bash npm install
-
Configuration of requisite environmental parameters in a
.envfile located at the project root:
FIRECRAWL_API_TOKEN=your_secret_key_here SENTRY_DSN=your_monitoring_endpoint_here
FIRECRAWL_API_TOKEN(Mandatory): Authentication credential for accessing Firecrawl services.-
SENTRY_DSN(Optional): The Data Source Name for Sentry integration, enabling operational oversight and performance tracking. -
Launching the service engine: bash npm start
Alternatively, environment variables can be injected inline during execution: bash FIRECRAWL_API_TOKEN=your_secret_key_here npm start
Operational Capabilities
- Webpage Harvesting: Acquisition of digital content from specified Uniform Resource Locators (URLs) in various prescribed output structures.
- Schema-Driven Extraction: Derivation of granular data points conforming to user-defined structural definitions (schemas).
- Telemetry Integration: Seamless connectivity with Sentry services for comprehensive error logging and performance profiling.
Operational Guidance
The running service exposes two distinct functional interfaces (tools) accessible via the MCP framework:
1. scrape-website: For generalized content fetching supporting multiple output formats.
2. extract-data: Dedicated to imposing structure upon extracted content based on semantic instructions and schemas.
Tool Interface: scrape-website
This function executes a web fetch operation and serializes the resulting document payload according to the specified representations.
Input Arguments:
- url (String, Mandatory): The network address of the target document.
- formats (Array of Strings, Optional): A collection of desired output encodings. Permissible values include:
- "markdown" (Default)
- "html"
- "text"
Illustrative deployment via MCP Inspector utility: bash
Default execution (yields markdown)
mcp-inspector --tool scrape-website --args '{ "url": "https://example.com" }'
Specifying all supported encodings
mcp-inspector --tool scrape-website --args '{ "url": "https://example.com", "formats": ["markdown", "html", "text"] }'
Tool Interface: extract-data
This interface parses content from specified URIs, mapping the information to a structure dictated by a descriptive query and a formal schema object.
Input Arguments:
- urls (Array of Strings, Mandatory): A list of network destinations for data harvesting.
- prompt (String, Mandatory): A natural language directive articulating the precise data elements to isolate.
- schema (Object, Mandatory): The blueprint defining the desired output structure.
The schema object maps desired field names (keys) to their corresponding data types (values). Supported atomic types are:
- "string": For textual content representation
- "boolean": For truth values (true/false)
- "number": For quantitative values
- Arrays: Denoted by ["type"], where 'type' references one of the base types.
- Objects: Nested structures defined recursively via their own field-to-type mappings.
Example of a straightforward extraction task (e.g., corporate fundamentals): bash
Extracting essential organizational metrics
mcp-inspector --tool extract-data --args '{ "urls": ["https://example.com"], "prompt": "Isolate the organization\'s core ethos, its support status for Single Sign-On (SSO), and its licensing model.", "schema": { "company_mission": "string", "supports_sso": "boolean", "is_open_source": "boolean" } }'
Example demonstrating composite structure extraction
mcp-inspector --tool extract-data --args '{ "urls": ["https://example.com/offerings", "https://example.com/pricing_tiers"], "prompt": "Gather details pertaining to each available product, including its identifier, monetary cost, and associated attributes.", "schema": { "products": [{ "name": "string", "price": "number", "features": ["string"] }] } }'
Both interfaces furnish informative diagnostic feedback upon operational failure and automatically transmit exception reports to the Sentry monitoring backend, provided the DSN is configured.
Diagnostics and Support
Should operational anomalies arise, consult the following:
- Confirmation that the Firecrawl access token possesses valid credentials.
- Validation of network accessibility for all targeted URLs.
- Scrutiny of complex schema definitions to ensure adherence to the prescribed format syntax.
- Examination of the Sentry console for granular diagnostic traces (if monitoring is active).
