mlc-data-fabricator
Facilitates interaction with the MLC Bakery service suite via an MCP-compliant abstraction layer. Enables programmatic discovery of data assets, retrieval of sample records, and verification of descriptive attributes. Supports robust data exploration workflows interfacing with the core MLC Bakery API infrastructure.
Author

jettyio
Quick Info
Actions
Tags
MLC Data Fabricator Service
A core service, engineered in Python leveraging FastAPI and SQLAlchemy, designed to govern the lifecycle and lineage of machine learning artifacts. It possesses built-in capability to validate metadata conforming to the Croissant standard.
Key Capabilities
- Comprehensive asset organization, including grouping mechanisms (collections)
- Tracking and auditing of data entities
- Chronological activity ledger maintenance
- Mapping of dependency and ancestry relationships
- Exposure via standardized Representational State Transfer (REST) interfaces
Deployment via Containerization
-
Configuration Setup: Duplicate the template file to establish runtime environment variables: bash cp env.example .env
-
Orchestration Startup: The fabrication unit requires PostgreSQL for persistent storage and Typesense for indexed searching. The MCP intermediary layer communicates with the primary API service, which in turn interacts with the underlying data persistence layer.
docker compose up -d
-
Schema Initialization: Execute necessary database schema modifications using Alembic. The
uv runutility executes required operational commands within the controlled project environment. bash docker compose exec db psql -U postgres -c "create DATABASE mlcbakery;" docker compose exec api alembic upgrade head
Service Access Points
The primary API endpoint is accessible by default on the local loopback interface.
- Interactive Documentation (Swagger UI): http://bakery.localhost/docs
- Alternative Documentation (ReDoc): http://bakery.localhost/redoc
- MCP Stream Interface: http://mcp.localhost/mcp (Local host file modification may be necessary for seamless connectivity during local build validation)
Local Service Execution
Prerequisites for Local Build
- Runtime environment must possess Python version 3.12 or newer
- The
uvpackage manager utility must be installed (Reference Link)
Setup Procedure
-
Source Code Acquisition: bash git clone git@github.com:jettyio/mlcbakery.git cd mlcbakery
-
Dependency Installation:
uvutilizes the declarative dependency definitions found inpyproject.toml. It automatically provisions an isolated execution context if one is not present. bash curl -LsSf https://astral.sh/uv/install.sh | shpip install poetry uvicorn uv run poetry install --no-interaction --no-ansi --no-root --with mcp
Invoke the FastAPI application server using uvicorn: bash
Verify that the DATABASE_URL configuration is present in your .env file
uv run uvicorn mlcbakery.main:app --reload --host 0.0.0.0 --port 8000
Security Mechanisms
The Bakery enforces request authentication via two distinct methods: standard JSON Web Tokens (JWT) and a designated "Master Administrator Credential." Both authentication vectors are configured via environment settings found in the .env artifact. For both JWTs and the Master Credential, they must be presented in the HTTP request's authorization header, prefixed with the scheme "Bearer".
- ADMIN_AUTH_TOKEN: A static secret that grants elevated, unrestricted access to all service resources.
- JWT_VERIFICATION_STRATEGY: The uniform resource locator (URL) pointing to a trusted authority for validating JWT signatures (e.g., Clerk). A development instance of Clerk is accessible; users can self-register via flows.jetty.io (experimental access) or contact dev@jetty.io for organizational cloud access.
Running Validation Suites
The internal test suite is configured to interact with a PostgreSQL instance specified by the DATABASE_URL environment variable. You may reuse the connection parameters from your development setup or define a distinct database endpoint within .env for isolated testing (adjust connection string accordingly).
bash
Confirm DATABASE_URL variable is established in the execution shell or .env file
uv run pytest
