ncbi-mcp
Query and analyze NCBI Entrez databases, including PubMed, Gene, and Protein, to retrieve detailed gene information and summaries. Facilitate exploration of gene relationships and integrate with bioinformatics workflows.
Author

noahzeidenberg
Quick Info
Actions
Tags
NCBI Model Context Protocol (MCP)
A Python implementation of the Model Context Protocol for interacting with NCBI databases.
Setup
- Clone this repository
- Install dependencies:
pip install -r requirements.txt - Create a
.envfile with your NCBI API key:NCBI_API_KEY=your_api_key_here NCBI_EMAIL=your_email@example.com
Running the MCP Server
python ncbi_mcp.py
Using with Cursor/Claude
Once the MCP server is running, you can interact with it using natural language in Cursor/Claude.
Using Natural Language Queries
You can use natural language to perform searches and retrieve information:
tools/call
{
"name": "nlp-query",
"arguments": {
"query": "Find research articles about BRCA1"
}
}
Or more simply, just use the query directly:
@ncbi-mcp Find research articles about BRCA1
Example Natural Language Queries
Here are some example natural language queries you can try:
-
Gene function information:
@ncbi-mcp Please summarize the function of TNF-alpha -
Genome size and statistics:
@ncbi-mcp How big is the genome for Saccharomyces cerevisiae? -
Assembly statistics:
@ncbi-mcp What is the reported L50 and N50 statistics for the most recent E.coli genome? -
Dataset counts:
@ncbi-mcp How many datasets are available in the biosample database for b16f10 mouse melanoma cells? -
Search for scientific articles:
@ncbi-mcp Find the latest research on COVID-19 vaccines -
Get gene information:
@ncbi-mcp Tell me about the BRCA1 gene -
Fetch genome information:
@ncbi-mcp Get genome information for Homo sapiens
Testing
To test the MCP server with various queries, you can use the included test files:
# Test natural language query functionality (default)
.\run_test.bat
# Test all tools
.\run_test.bat all
# Test specific test file
.\run_test.bat test_all_tools.jsonl
# Test high-level tools
.\run_test.bat test_high_level_tools.jsonl
The test script will: 1. Start the MCP server in background 2. Send test requests from the specified file 3. Wait for a few seconds to allow processing 4. Terminate the server and display the output
This approach is used because the MCP server is designed to run continuously as a service. For manual testing without automatic termination, you can use:
# Run manually with any test file
type test_nlp_query.jsonl | python ncbi_mcp.py
The test files contain example JSON-RPC requests that simulate how Cursor/Claude would interact with the MCP server.
Available Tools
The NCBI MCP provides both high-level tools that understand natural language and low-level tools for direct database interaction.
Tool Usage Guidelines for LLMs
Recommended Workflow Patterns
For most biological queries, start with nlp-query - it's the most intelligent tool that can handle complex questions and automatically route to appropriate specialized tools.
Common Research Workflows:
- Gene Analysis Workflow:
- Start with
nlp-queryfor general gene questions - Use
summarize-genefor comprehensive gene information - Use
get_gene_infofor detailed structured data -
Use
ncbi-search+ncbi-fetchfor specific database queries -
Genome Analysis Workflow:
- Use
genome-statsfor organism genome statistics - Use
get_genome_infofor detailed genome metadata -
Use
count-datasetsto explore available genome assemblies -
Literature Research Workflow:
- Use
nlp-queryfor natural language literature searches - Use
ncbi-searchwith database="pubmed" for precise searches -
Use
ncbi-fetchto get full publication details -
Dataset Discovery Workflow:
- Use
count-datasetsto assess data availability - Use
nlp-queryto explore datasets with natural language -
Use
ncbi-searchfor systematic database exploration -
E-utilities Workflow (Advanced):
- Use
ncbi-infoto discover available databases - Use
ncbi-global-queryto see which databases contain your search term - Use
ncbi-searchto find specific UIDs in target databases - Use
ncbi-summaryto get overview information about records - Use
ncbi-fetchto retrieve complete records -
Use
ncbi-linkto find related records across databases -
Cross-Database Analysis Workflow:
- Use
ncbi-searchto find genes of interest - Use
ncbi-linkto find related proteins, structures, or literature - Use
ncbi-summaryto get metadata about related records - Use
ncbi-fetchto retrieve detailed information
Tool Selection Guide
High-Level Tools (Recommended for most users):
- nlp-query: Use for general biological questions, complex queries, and when you're unsure which tool to use
- summarize-gene: Use for comprehensive gene analysis and understanding gene function
- genome-stats: Use for genome size, assembly quality, and organism comparison
- count-datasets: Use for research planning and data availability assessment
- get_gene_info: Use for detailed, structured gene information
- get_genome_info: Use for detailed, structured genome information
Low-Level E-utilities Tools (For advanced users):
- ncbi-search (ESearch): Use for precise database searches with specific filters, Boolean operators, and field qualifiers
- ncbi-fetch (EFetch): Use to retrieve complete records after searching, supports multiple formats (GenBank, FASTA, XML)
- ncbi-summary (ESummary): Use to get document summaries without fetching complete records
- ncbi-link (ELink): Use to find related records across databases (e.g., gene to protein, protein to structure)
- ncbi-info (EInfo): Use to discover available databases and their capabilities
- ncbi-global-query (EGQuery): Use to search across all databases simultaneously
- ncbi-spell (ESpell): Use to get spelling suggestions for search terms
- ncbi-citation-match (ECitMatch): Use to find PMIDs from citation information
Biological Context and Terminology
Understanding NCBI Databases: - Gene: Contains gene records with symbols, names, functions, and genomic locations - Protein: Contains protein sequences and annotations - Nucleotide: Contains DNA/RNA sequences (genes, transcripts, genomic regions) - PubMed: Contains scientific literature and publications - BioSample: Contains biological sample metadata (tissues, cell lines, etc.) - BioProject: Contains research project information - SRA: Contains raw sequencing data - Assembly: Contains genome assembly information
Common Biological Terms: - Gene Symbol: Short abbreviation (e.g., BRCA1, TP53, TNF) - Gene ID: Unique NCBI identifier (e.g., 672 for BRCA1) - Accession: Unique sequence identifier (e.g., NM_001126114.3) - N50/L50: Assembly quality metrics (larger N50 = better assembly) - Reference Genome: High-quality representative genome for a species - Organism: Use scientific names (Homo sapiens) or common names (human)
Search Strategies: - Use specific gene symbols for precise results - Include organism names to avoid ambiguity - Use Boolean operators (AND, OR, NOT) for complex searches - Use field qualifiers like [Gene], [Organism], [Protein Name] for targeted searches
High-Level Tools
Natural Language Query Processor
tools/call
{
"name": "nlp-query",
"arguments": {
"query": "Please summarize the function of TNF-alpha"
}
}
Gene Summarizer
tools/call
{
"name": "summarize-gene",
"arguments": {
"gene_name": "BRCA1"
}
}
Genome Statistics
tools/call
{
"name": "genome-stats",
"arguments": {
"organism": "Escherichia coli"
}
}
Dataset Counter
tools/call
{
"name": "count-datasets",
"arguments": {
"database": "biosample",
"query": "mouse melanoma b16f10"
}
}
Low-Level Tools
Search NCBI Databases
tools/call
{
"name": "ncbi-search",
"arguments": {
"database": "pubmed",
"term": "BRCA1",
"filters": {
"organism": "Homo sapiens",
"date_range": {
"start": "2020"
}
}
}
}
Fetch NCBI Records
tools/call
{
"name": "ncbi-fetch",
"arguments": {
"database": "gene",
"ids": ["70"],
"rettype": "gb"
}
}
Get Gene Information
tools/call
{
"name": "get_gene_info",
"arguments": {
"gene_id": "672"
}
}
Get Genome Information
tools/call
{
"name": "get_genome_info",
"arguments": {
"organism": "Homo sapiens",
"reference": true
}
}
License
Apache-2.0
