NCBI Model Context Protocol (MCP)

A Python implementation of the Model Context Protocol for interacting with NCBI databases.

Setup

Clone this repository
Install dependencies: pip install -r requirements.txt
Create a .env file with your NCBI API key: NCBI_API_KEY=your_api_key_here NCBI_EMAIL=your_email@example.com

Running the MCP Server

python ncbi_mcp.py

Using with Cursor/Claude

Once the MCP server is running, you can interact with it using natural language in Cursor/Claude.

Using Natural Language Queries

You can use natural language to perform searches and retrieve information:

tools/call
{
  "name": "nlp-query",
  "arguments": {
    "query": "Find research articles about BRCA1"
  }
}

Or more simply, just use the query directly:

@ncbi-mcp Find research articles about BRCA1

Example Natural Language Queries

Here are some example natural language queries you can try:

Gene function information: @ncbi-mcp Please summarize the function of TNF-alpha
Genome size and statistics: @ncbi-mcp How big is the genome for Saccharomyces cerevisiae?
Assembly statistics: @ncbi-mcp What is the reported L50 and N50 statistics for the most recent E.coli genome?
Dataset counts: @ncbi-mcp How many datasets are available in the biosample database for b16f10 mouse melanoma cells?
Search for scientific articles: @ncbi-mcp Find the latest research on COVID-19 vaccines
Get gene information: @ncbi-mcp Tell me about the BRCA1 gene
Fetch genome information: @ncbi-mcp Get genome information for Homo sapiens

Testing

To test the MCP server with various queries, you can use the included test files:

# Test natural language query functionality (default)
.\run_test.bat

# Test all tools
.\run_test.bat all

# Test specific test file
.\run_test.bat test_all_tools.jsonl

# Test high-level tools
.\run_test.bat test_high_level_tools.jsonl

The test script will: 1. Start the MCP server in background 2. Send test requests from the specified file 3. Wait for a few seconds to allow processing 4. Terminate the server and display the output

This approach is used because the MCP server is designed to run continuously as a service. For manual testing without automatic termination, you can use:

# Run manually with any test file
type test_nlp_query.jsonl | python ncbi_mcp.py

The test files contain example JSON-RPC requests that simulate how Cursor/Claude would interact with the MCP server.

Available Tools

The NCBI MCP provides both high-level tools that understand natural language and low-level tools for direct database interaction.

Tool Usage Guidelines for LLMs

Recommended Workflow Patterns

For most biological queries, start with nlp-query - it's the most intelligent tool that can handle complex questions and automatically route to appropriate specialized tools.

Common Research Workflows:

Gene Analysis Workflow:
Start with nlp-query for general gene questions
Use summarize-gene for comprehensive gene information
Use get_gene_info for detailed structured data
Use ncbi-search + ncbi-fetch for specific database queries
Genome Analysis Workflow:
Use genome-stats for organism genome statistics
Use get_genome_info for detailed genome metadata
Use count-datasets to explore available genome assemblies
Literature Research Workflow:
Use nlp-query for natural language literature searches
Use ncbi-search with database="pubmed" for precise searches
Use ncbi-fetch to get full publication details
Dataset Discovery Workflow:
Use count-datasets to assess data availability
Use nlp-query to explore datasets with natural language
Use ncbi-search for systematic database exploration
E-utilities Workflow (Advanced):
Use ncbi-info to discover available databases
Use ncbi-global-query to see which databases contain your search term
Use ncbi-search to find specific UIDs in target databases
Use ncbi-summary to get overview information about records
Use ncbi-fetch to retrieve complete records
Use ncbi-link to find related records across databases
Cross-Database Analysis Workflow:
Use ncbi-search to find genes of interest
Use ncbi-link to find related proteins, structures, or literature
Use ncbi-summary to get metadata about related records
Use ncbi-fetch to retrieve detailed information

Tool Selection Guide

High-Level Tools (Recommended for most users): - nlp-query: Use for general biological questions, complex queries, and when you're unsure which tool to use - summarize-gene: Use for comprehensive gene analysis and understanding gene function - genome-stats: Use for genome size, assembly quality, and organism comparison - count-datasets: Use for research planning and data availability assessment - get_gene_info: Use for detailed, structured gene information - get_genome_info: Use for detailed, structured genome information

Low-Level E-utilities Tools (For advanced users): - ncbi-search (ESearch): Use for precise database searches with specific filters, Boolean operators, and field qualifiers - ncbi-fetch (EFetch): Use to retrieve complete records after searching, supports multiple formats (GenBank, FASTA, XML) - ncbi-summary (ESummary): Use to get document summaries without fetching complete records - ncbi-link (ELink): Use to find related records across databases (e.g., gene to protein, protein to structure) - ncbi-info (EInfo): Use to discover available databases and their capabilities - ncbi-global-query (EGQuery): Use to search across all databases simultaneously - ncbi-spell (ESpell): Use to get spelling suggestions for search terms - ncbi-citation-match (ECitMatch): Use to find PMIDs from citation information

Biological Context and Terminology

Understanding NCBI Databases: - Gene: Contains gene records with symbols, names, functions, and genomic locations - Protein: Contains protein sequences and annotations - Nucleotide: Contains DNA/RNA sequences (genes, transcripts, genomic regions) - PubMed: Contains scientific literature and publications - BioSample: Contains biological sample metadata (tissues, cell lines, etc.) - BioProject: Contains research project information - SRA: Contains raw sequencing data - Assembly: Contains genome assembly information

Common Biological Terms: - Gene Symbol: Short abbreviation (e.g., BRCA1, TP53, TNF) - Gene ID: Unique NCBI identifier (e.g., 672 for BRCA1) - Accession: Unique sequence identifier (e.g., NM_001126114.3) - N50/L50: Assembly quality metrics (larger N50 = better assembly) - Reference Genome: High-quality representative genome for a species - Organism: Use scientific names (Homo sapiens) or common names (human)

Search Strategies: - Use specific gene symbols for precise results - Include organism names to avoid ambiguity - Use Boolean operators (AND, OR, NOT) for complex searches - Use field qualifiers like [Gene], [Organism], [Protein Name] for targeted searches

High-Level Tools

Natural Language Query Processor

tools/call
{
  "name": "nlp-query",
  "arguments": {
    "query": "Please summarize the function of TNF-alpha"
  }
}

Gene Summarizer

tools/call
{
  "name": "summarize-gene",
  "arguments": {
    "gene_name": "BRCA1"
  }
}

Genome Statistics

tools/call
{
  "name": "genome-stats",
  "arguments": {
    "organism": "Escherichia coli"
  }
}

Dataset Counter

tools/call
{
  "name": "count-datasets",
  "arguments": {
    "database": "biosample",
    "query": "mouse melanoma b16f10"
  }
}

Low-Level Tools

Search NCBI Databases

tools/call
{
  "name": "ncbi-search",
  "arguments": {
    "database": "pubmed",
    "term": "BRCA1",
    "filters": {
      "organism": "Homo sapiens",
      "date_range": {
        "start": "2020"
      }
    }
  }
}

Fetch NCBI Records

tools/call
{
  "name": "ncbi-fetch",
  "arguments": {
    "database": "gene",
    "ids": ["70"],
    "rettype": "gb"
  }
}

Get Gene Information

tools/call
{
  "name": "get_gene_info",
  "arguments": {
    "gene_id": "672"
  }
}

Get Genome Information

tools/call
{
  "name": "get_genome_info",
  "arguments": {
    "organism": "Homo sapiens",
    "reference": true
  }
}

License

Apache-2.0

ncbi-mcp

Author

noahzeidenberg

Quick Info

Actions

Tags