logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

ncbi-mcp

Query and analyze NCBI Entrez databases, including PubMed, Gene, and Protein, to retrieve detailed gene information and summaries. Facilitate exploration of gene relationships and integrate with bioinformatics workflows.

Author

ncbi-mcp logo

noahzeidenberg

Apache License 2.0

Quick Info

GitHub GitHub Stars 3
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

bioinformaticsncbigenenoahzeidenberg ncbianalyze ncbiintegrate bioinformatics

NCBI Model Context Protocol (MCP)

A Python implementation of the Model Context Protocol for interacting with NCBI databases.

Setup

  1. Clone this repository
  2. Install dependencies: pip install -r requirements.txt
  3. Create a .env file with your NCBI API key: NCBI_API_KEY=your_api_key_here NCBI_EMAIL=your_email@example.com

Running the MCP Server

python ncbi_mcp.py

Using with Cursor/Claude

Once the MCP server is running, you can interact with it using natural language in Cursor/Claude.

Using Natural Language Queries

You can use natural language to perform searches and retrieve information:

tools/call
{
  "name": "nlp-query",
  "arguments": {
    "query": "Find research articles about BRCA1"
  }
}

Or more simply, just use the query directly:

@ncbi-mcp Find research articles about BRCA1

Example Natural Language Queries

Here are some example natural language queries you can try:

  1. Gene function information: @ncbi-mcp Please summarize the function of TNF-alpha

  2. Genome size and statistics: @ncbi-mcp How big is the genome for Saccharomyces cerevisiae?

  3. Assembly statistics: @ncbi-mcp What is the reported L50 and N50 statistics for the most recent E.coli genome?

  4. Dataset counts: @ncbi-mcp How many datasets are available in the biosample database for b16f10 mouse melanoma cells?

  5. Search for scientific articles: @ncbi-mcp Find the latest research on COVID-19 vaccines

  6. Get gene information: @ncbi-mcp Tell me about the BRCA1 gene

  7. Fetch genome information: @ncbi-mcp Get genome information for Homo sapiens

Testing

To test the MCP server with various queries, you can use the included test files:

# Test natural language query functionality (default)
.\run_test.bat

# Test all tools
.\run_test.bat all

# Test specific test file
.\run_test.bat test_all_tools.jsonl

# Test high-level tools
.\run_test.bat test_high_level_tools.jsonl

The test script will: 1. Start the MCP server in background 2. Send test requests from the specified file 3. Wait for a few seconds to allow processing 4. Terminate the server and display the output

This approach is used because the MCP server is designed to run continuously as a service. For manual testing without automatic termination, you can use:

# Run manually with any test file
type test_nlp_query.jsonl | python ncbi_mcp.py

The test files contain example JSON-RPC requests that simulate how Cursor/Claude would interact with the MCP server.

Available Tools

The NCBI MCP provides both high-level tools that understand natural language and low-level tools for direct database interaction.

Tool Usage Guidelines for LLMs

For most biological queries, start with nlp-query - it's the most intelligent tool that can handle complex questions and automatically route to appropriate specialized tools.

Common Research Workflows:

  1. Gene Analysis Workflow:
  2. Start with nlp-query for general gene questions
  3. Use summarize-gene for comprehensive gene information
  4. Use get_gene_info for detailed structured data
  5. Use ncbi-search + ncbi-fetch for specific database queries

  6. Genome Analysis Workflow:

  7. Use genome-stats for organism genome statistics
  8. Use get_genome_info for detailed genome metadata
  9. Use count-datasets to explore available genome assemblies

  10. Literature Research Workflow:

  11. Use nlp-query for natural language literature searches
  12. Use ncbi-search with database="pubmed" for precise searches
  13. Use ncbi-fetch to get full publication details

  14. Dataset Discovery Workflow:

  15. Use count-datasets to assess data availability
  16. Use nlp-query to explore datasets with natural language
  17. Use ncbi-search for systematic database exploration

  18. E-utilities Workflow (Advanced):

  19. Use ncbi-info to discover available databases
  20. Use ncbi-global-query to see which databases contain your search term
  21. Use ncbi-search to find specific UIDs in target databases
  22. Use ncbi-summary to get overview information about records
  23. Use ncbi-fetch to retrieve complete records
  24. Use ncbi-link to find related records across databases

  25. Cross-Database Analysis Workflow:

  26. Use ncbi-search to find genes of interest
  27. Use ncbi-link to find related proteins, structures, or literature
  28. Use ncbi-summary to get metadata about related records
  29. Use ncbi-fetch to retrieve detailed information

Tool Selection Guide

High-Level Tools (Recommended for most users): - nlp-query: Use for general biological questions, complex queries, and when you're unsure which tool to use - summarize-gene: Use for comprehensive gene analysis and understanding gene function - genome-stats: Use for genome size, assembly quality, and organism comparison - count-datasets: Use for research planning and data availability assessment - get_gene_info: Use for detailed, structured gene information - get_genome_info: Use for detailed, structured genome information

Low-Level E-utilities Tools (For advanced users): - ncbi-search (ESearch): Use for precise database searches with specific filters, Boolean operators, and field qualifiers - ncbi-fetch (EFetch): Use to retrieve complete records after searching, supports multiple formats (GenBank, FASTA, XML) - ncbi-summary (ESummary): Use to get document summaries without fetching complete records - ncbi-link (ELink): Use to find related records across databases (e.g., gene to protein, protein to structure) - ncbi-info (EInfo): Use to discover available databases and their capabilities - ncbi-global-query (EGQuery): Use to search across all databases simultaneously - ncbi-spell (ESpell): Use to get spelling suggestions for search terms - ncbi-citation-match (ECitMatch): Use to find PMIDs from citation information

Biological Context and Terminology

Understanding NCBI Databases: - Gene: Contains gene records with symbols, names, functions, and genomic locations - Protein: Contains protein sequences and annotations - Nucleotide: Contains DNA/RNA sequences (genes, transcripts, genomic regions) - PubMed: Contains scientific literature and publications - BioSample: Contains biological sample metadata (tissues, cell lines, etc.) - BioProject: Contains research project information - SRA: Contains raw sequencing data - Assembly: Contains genome assembly information

Common Biological Terms: - Gene Symbol: Short abbreviation (e.g., BRCA1, TP53, TNF) - Gene ID: Unique NCBI identifier (e.g., 672 for BRCA1) - Accession: Unique sequence identifier (e.g., NM_001126114.3) - N50/L50: Assembly quality metrics (larger N50 = better assembly) - Reference Genome: High-quality representative genome for a species - Organism: Use scientific names (Homo sapiens) or common names (human)

Search Strategies: - Use specific gene symbols for precise results - Include organism names to avoid ambiguity - Use Boolean operators (AND, OR, NOT) for complex searches - Use field qualifiers like [Gene], [Organism], [Protein Name] for targeted searches

High-Level Tools

Natural Language Query Processor

tools/call
{
  "name": "nlp-query",
  "arguments": {
    "query": "Please summarize the function of TNF-alpha"
  }
}

Gene Summarizer

tools/call
{
  "name": "summarize-gene",
  "arguments": {
    "gene_name": "BRCA1"
  }
}

Genome Statistics

tools/call
{
  "name": "genome-stats",
  "arguments": {
    "organism": "Escherichia coli"
  }
}

Dataset Counter

tools/call
{
  "name": "count-datasets",
  "arguments": {
    "database": "biosample",
    "query": "mouse melanoma b16f10"
  }
}

Low-Level Tools

Search NCBI Databases

tools/call
{
  "name": "ncbi-search",
  "arguments": {
    "database": "pubmed",
    "term": "BRCA1",
    "filters": {
      "organism": "Homo sapiens",
      "date_range": {
        "start": "2020"
      }
    }
  }
}

Fetch NCBI Records

tools/call
{
  "name": "ncbi-fetch",
  "arguments": {
    "database": "gene",
    "ids": ["70"],
    "rettype": "gb"
  }
}

Get Gene Information

tools/call
{
  "name": "get_gene_info",
  "arguments": {
    "gene_id": "672"
  }
}

Get Genome Information

tools/call
{
  "name": "get_genome_info",
  "arguments": {
    "organism": "Homo sapiens",
    "reference": true
  }
}

License

Apache-2.0

return

See Also

`