🐉 Orchestrate Digital Journeys via LLMs and Visual Perception 🐉

Skyvern spearheads the automation of web-based procedures leveraging advanced Large Language Models combined with Computer Vision capabilities. It furnishes a streamlined Application Programming Interface (API) endpoint designed to fully govern manual interactions across an extensive catalog of websites, offering a superior replacement for fragile or inconsistent scripting solutions.

Historically, automating browser functions mandated the creation of bespoke scripts tailored to specific sites, frequently relying on brittle Document Object Model (DOM) inspection or error-prone XPath selectors that failed upon any minor interface revision.

In contrast to selector-dependent interaction models, Skyvern prioritizes the utilization of Vision-enabled LLMs to perceive, learn, and execute web operations.

Operational Paradigm

Skyvern's architectural inspiration stems from the Task-Driven autonomous agent frameworks established by projects like BabyAGI and AutoGPT. Its primary differentiator is the integration of robust browser manipulation capabilities via libraries such as Playwright.

Skyvern deploys a coordinated ensemble of specialized agents responsible for interpreting the web environment, formulating action plans, and executing the required steps:

This composite methodology yields significant benefits:

Zero-Shot Generalization: Skyvern can successfully navigate and operate on novel websites, as it maps visual cues to necessary actions without requiring pre-coded scripts or element identifiers.
Resilience to Change: The system inherently resists breakage caused by website layout modifications because it does not depend on static XPaths or pre-defined selector attributes during navigation.
Workflow Portability: A single defined procedure can be applied across numerous distinct websites, as Skyvern reasons dynamically about the necessary interaction sequence.
Advanced Semantic Reasoning: The incorporation of LLMs enables sophisticated contextual decision-making. For instance:
1. When securing an automotive insurance estimate from a provider, it can logically deduce the answer to a sensitive query like "Were you eligible to drive at age 18?" based on contextual data such as the driver having obtained their license at sixteen.
2. In competitive analysis, it can equate a 22 oz Arnold Palmer from one vendor with a 23 oz can from another, recognizing the minor size variance as trivial noise rather than distinct entities.

A comprehensive technical assessment detailing its capabilities is accessible here.

Demonstration

https://github.com/user-attachments/assets/5cab4668-e8e2-4982-8551-aab05ff73a7f

Metrics & Benchmarking

Skyvern achieves State-of-the-Art (SOTA) performance on the WebBench benchmark with a 64.4% success rate. Detailed methodology and evaluation results are available here

Skyvern excels in operations classified as WRITE tasks (data submission, credential entry, artifact downloading), which are highly relevant for Robotic Process Automation (RPA) adjacent applications.

Getting Started

Skyvern Cloud Offering

Skyvern Cloud provides a fully managed execution environment, eliminating infrastructure management concerns. This service supports concurrent execution of multiple Skyvern instances and includes integrated countermeasures against bot detection, a dedicated proxy network, and CAPTCHA resolution services.

To initiate use, please visit app.skyvern.com to establish an account.

Local Installation and Deployment

Prerequisites: - Python version 3.11.x (Compatible with 3.12; 3.13 support pending) - NodeJS and NPM

Additional requirements for Windows environments: - Rust toolchain - VS Code configured with C++ development tools and the Windows SDK

1. Install Skyvern Package

bash pip install skyvern

2. Initial Setup

This command is essential for the first run, handling database initialization and schema migrations.

bash skyvern quickstart

3. Task Execution

Graphical User Interface (Recommended)

Launch the Skyvern service backend alongside the web interface (assuming the database is operational):

bash skyvern run all

Access the interface at http://localhost:8080 to initiate tasks via the UI.

Programmatic Execution

python from skyvern import Skyvern

skyvern_instance = Skyvern() task_result = await skyvern_instance.run_task(prompt="Identify the leading article on Hacker News today") print(task_result)

Skyvern will launch a browser window to execute the task and automatically terminate it upon completion. Historical records of the task are viewable at http://localhost:8080/history

You can also direct tasks toward different endpoints: python from skyvern import Skyvern

Connection to Skyvern Cloud

skyvern_cloud = Skyvern(api_key="YOUR_SKYVERN_API_KEY")

Connection to a local Skyvern deployment

skyvern_local = Skyvern(base_url="http://localhost:8000", api_key="LOCAL_API_KEY")

task_result = await skyvern_local.run_task(prompt="Identify the leading article on Hacker News today") print(task_result)

Advanced Control Mechanisms

Direct Control of a Local Chrome Instance

⚠️ CAUTION: Beginning with Chrome version 136, direct CDP connections utilizing the default user_data_dir are restricted by default. To leverage existing browser profiles, Skyvern copies the default user_data_dir to ./tmp/user_data_dir upon its initial attempt to connect to your local browser instance. ⚠️

Via Python Scripting python from skyvern import Skyvern

Example path for macOS; adjust for your OS.

browser_executable_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" skyvern_custom = Skyvern( base_url="http://localhost:8000", api_key="YOUR_API_KEY", browser_path=browser_executable_path, ) task_result = await skyvern_custom.run_task( prompt="Determine the highest-ranked story on Hacker News today", )

Via Skyvern Service Configuration

Set the following environment variables in your .env file: bash

Specify the path to your Chrome executable (example for Mac).

CHROME_EXECUTABLE_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" BROWSER_TYPE=cdp-connect

After setting these variables, restart the Skyvern service (skyvern run all) and invoke tasks via the UI or code.

Connecting to Any Remote Browser Instance

Supply the established Chrome DevTools Protocol (CDP) connection URL to Skyvern:

python from skyvern import Skyvern

skyvern_remote = Skyvern(cdp_url="your cdp connection url") task_result = await skyvern_remote.run_task( prompt="Find the top post on hackernews today", )

Enforcing Consistent Output Structure

Achieve predictable JSON output by supplying the data_extraction_schema argument: python from skyvern import Skyvern

skyvern_schema = Skyvern() task_result = await skyvern_schema.run_task( prompt="Extract the primary headline from the current webpage", data_extraction_schema={ "type": "object", "properties": { "title": { "type": "string", "description": "The headline text of the primary content" }, "url": { "type": "string", "description": "The permanent link to the content" }, "points": { "type": "integer", "description": "Numerical rating if applicable" } } }

Task execution proceeds with guaranteed schema adherence.

)

Essential Debugging Commands

bash

Initiate the Skyvern Backend Component

skyvern run server

Launch the Skyvern User Interface

skyvern run ui

Query the operational status of the Skyvern service

skyvern status

Halt all Skyvern components

skyvern stop all

Shut down the Skyvern User Interface

skyvern stop ui

Terminate the Skyvern Backend Component

skyvern stop server

Docker Compose Deployment

Ensure Docker Desktop is installed and active.
Verify that no other PostgreSQL instance is occupying port 5432 (use docker ps to check).
Clone the repository and navigate to the root directory.
Execute skyvern init llm to generate the requisite .env configuration file, which will be injected into the Docker image.
Populate the necessary LLM provider credentials within the docker-compose.yml file. If deploying Skyvern on a remote host, ensure the UI container references the correct server IP in docker-compose.yml.
Execute the deployment command: bash docker compose up -d
Access the operational UI at http://localhost:8080.

Critical Note: Only a single PostgreSQL container can bind to port 5432. If transitioning from a CLI-managed Postgres to Docker Compose, you must first eliminate the existing container: bash docker rm -f postgresql-container

If database errors arise during Docker operation, inspect active containers with docker ps to identify any conflicting Postgres instances.

Core Skyvern Capabilities

Task Execution Units (Skyvern Tasks)

Tasks represent the most granular unit of work within Skyvern. Each task initiates a single instruction set, directing Skyvern to navigate the web and achieve a defined objective. Tasks require specification of a url, a descriptive prompt, and optionally accept a structured output data schema or specific error codes to trigger immediate termination under defined conditions.

Sequential Operations (Skyvern Workflows)

Workflows enable the composition of multiple distinct tasks into a singular, coherent process. For example, automating the retrieval of all invoices dated after a specific threshold involves a workflow that navigates to the invoice repository, applies the date filter, extracts the list of relevant items, and then iteratively downloads each one.

Another compelling use case is e-commerce order automation: a workflow might first locate the desired product, add it to the shopping cart, subsequently verify cart contents, and finally proceed through the complete payment and confirmation sequence.

Supported Workflow Primitives: 1. Browser Navigation Task 1. Browser Interaction Command 1. Data Schema Enforcement 1. State Validation Step 1. Iterative Loops (For Each) 1. File Content Interpretation 1. Email Dispatch Module 1. Natural Language Instruction Block 1. External HTTP Communication 1. Custom Code Injection Block 1. Artifact Upload to Object Storage 1. (Upcoming) Conditional Branching Logic

Real-Time Viewport Streaming

Skyvern provides the capability to stream the browser's visual output directly to your local machine, allowing for direct observation of automated actions. This feature is invaluable for debugging, understanding system behavior, and enabling manual overrides during sensitive operational phases.

Intelligent Form Population

Skyvern possesses native competence in reading context and inputting data into web forms. Providing the required information within the navigation_goal allows the system to semantically interpret the data and populate the corresponding form fields accurately.

Structured Data Retrieval

Data extraction is a core function. You can mandate a specific output structure by embedding a data_extraction_schema directly within the primary instruction prompt, formatted in JSONC. Skyvern guarantees that its extracted results adhere strictly to the schema provided.

Artifact Downloading

Files presented for download on web pages are handled automatically. All retrieved artifacts are seamlessly uploaded to configured block storage solutions, making them accessible via the management interface.

Access Control Management

Skyvern supports several mechanisms to manage access to pages protected by login screens. For integration with advanced authentication flows, please initiate contact via email or our Discord server.

🔐 Multi-Factor Authentication (MFA) Support (TOTP)

Skyvern incorporates support for various MFA protocols necessary for automating sensitive workflows: 1. QR-Code based MFA (e.g., Google Authenticator, Authy) 1. Email token validation 1. SMS verification code processing

🔐 Further details on MFA integration are available here.

Credential Manager Integration

Skyvern currently offers compatibility with the following password management utilities: - [x] Bitwarden - [ ] 1Password - [ ] LastPass

Model Context Protocol (MCP) Adherence

Skyvern fully embraces the Model Context Protocol (MCP), allowing users to integrate any LLM that implements this standard for communication.

Refer to the MCP specifications here.

Integration with Workflow Automation Platforms

Skyvern interfaces seamlessly with major low-code/no-code platforms, including Zapier, Make.com, and N8N, for ecosystem connectivity.

🔐 Review MFA specifics here.

Practical Demonstrations of Skyvern Utility

We actively track real-world deployments of Skyvern. Below are examples illustrating how the platform is utilized for workflow automation:

Bulk Invoice Retrieval Across Diverse Portals

Schedule a live demonstration to witness this capability.

Automating Job Application Submissions

▶️ See this process in action

Streamlining Material Procurement for Industrial Operations

▶️ See this process in action

System Registration and Form Completion on Governmental Sites

▶️ See this process in action

Mass Population of 'Contact Us' Web Forms

▶️ See this process in action

Generating Insurance Quotations from Various Providers

▶️ See this process in action

Developer Environment Setup

Please ensure you have the uv package manager installed. 1. Execute the following command to initialize your isolated development environment (.venv): bash uv sync --group dev

Perform the initial service configuration via the CLI: bash uv run skyvern quickstart
Access the interactive web dashboard at http://localhost:8080. The Skyvern Command Line Interface is fully functional across Windows, WSL, macOS, and Linux platforms.

Comprehensive Reference Material

More detailed documentation is hosted on our 📕 official documentation site. Should you encounter any ambiguities or require new information, please report an issue or contact us via email or Discord.

Supported Model Connectors

Vendor	Compatible Models
OpenAI	gpt4-turbo, gpt-4o, gpt-4o-mini
Anthropic	Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet)
Azure OpenAI	Any GPT variant. Superior multimodal performance with azure/gpt4-o
AWS Bedrock	Anthropic Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet)
Gemini	Gemini 2.5 Pro and Flash, Gemini 2.0
Ollama	Any model hosted locally via Ollama
OpenRouter	Models accessed via the OpenRouter gateway
OpenAI-compatible	Any endpoint adhering to the OpenAI API specification (via liteLLM)

Environment Variable Definitions

OpenAI Configuration

Variable	Purpose	Type	Example Value
`ENABLE_OPENAI`	Activates OpenAI model integration	Boolean	`true`, `false`
`OPENAI_API_KEY`	Your OpenAI Secret Key	String	`sk-1234567890`
`OPENAI_API_BASE`	Alternate API endpoint (optional)	String	`https://openai.api.base`
`OPENAI_ORGANIZATION`	OpenAI Organization Identifier (optional)	String	`your-org-id`

Recommended Primary Key: OPENAI_GPT4O, OPENAI_GPT4O_MINI, OPENAI_GPT4_1, OPENAI_O4_MINI, OPENAI_O3

Anthropic Configuration

Variable	Purpose	Type	Example Value
`ENABLE_ANTHROPIC`	Activates Anthropic model integration	Boolean	`true`, `false`
`ANTHROPIC_API_KEY`	Your Anthropic Secret Key	String	`sk-1234567890`

Recommended Primary Key: ANTHROPIC_CLAUDE3.5_SONNET, ANTHROPIC_CLAUDE3.7_SONNET, ANTHROPIC_CLAUDE4_OPUS, ANTHROPIC_CLAUDE4_SONNET

Azure OpenAI Configuration

Variable	Purpose	Type	Example Value
`ENABLE_AZURE`	Activates Azure OpenAI integration	Boolean	`true`, `false`
`AZURE_API_KEY`	Azure Deployment API Key	String	`sk-1234567890`
`AZURE_DEPLOYMENT`	Azure OpenAI Resource Deployment Name	String	`skyvern-deployment`
`AZURE_API_BASE`	Azure endpoint base URL	String	`https://skyvern-deployment.openai.azure.com/`
`AZURE_API_VERSION`	API Specification Version	String	`2024-02-01`

Recommended Primary Key: AZURE_OPENAI

AWS Bedrock Configuration

Variable	Purpose	Type	Example Value
`ENABLE_BEDROCK`	Activates AWS Bedrock integration. Ensure your AWS credentials are configured first.	Boolean	`true`, `false`

Recommended Primary Key: BEDROCK_ANTHROPIC_CLAUDE3.7_SONNET_INFERENCE_PROFILE, BEDROCK_ANTHROPIC_CLAUDE4_OPUS_INFERENCE_PROFILE, BEDROCK_ANTHROPIC_CLAUDE4_SONNET_INFERENCE_PROFILE

Gemini Configuration

Variable	Purpose	Type	Example Value
`ENABLE_GEMINI`	Activates Gemini model integration	Boolean	`true`, `false`
`GEMINI_API_KEY`	Your Google Gemini API Key	String	`your_google_gemini_api_key`

Recommended Primary Key: GEMINI_2.5_PRO_PREVIEW, GEMINI_2.5_FLASH_PREVIEW

Ollama Configuration

Variable	Purpose	Type	Example Value
`ENABLE_OLLAMA`	Integrates locally hosted models via Ollama	Boolean	`true`, `false`
`OLLAMA_SERVER_URL`	Address for your Ollama instance	String	`http://host.docker.internal:11434`
`OLLAMA_MODEL`	The specific Ollama model to load	String	`qwen2.5:7b-instruct`

Recommended Primary Key: OLLAMA

Note: Vision capabilities are currently unsupported when using Ollama.

OpenRouter Configuration

Variable	Purpose	Type	Example Value
`ENABLE_OPENROUTER`	Activates OpenRouter model gateway	Boolean	`true`, `false`
`OPENROUTER_API_KEY`	Your OpenRouter API Key	String	`sk-1234567890`
`OPENROUTER_MODEL`	Designated OpenRouter model identifier	String	`mistralai/mistral-small-3.1-24b-instruct`
`OPENROUTER_API_BASE`	OpenRouter API service root	String	`https://api.openrouter.ai/v1`

Recommended Primary Key: OPENROUTER

OpenAI-Compatible Endpoints

Variable	Purpose	Type	Example Value
`ENABLE_OPENAI_COMPATIBLE`	Integrates a third-party API conforming to OpenAI standards	Boolean	`true`, `false`
`OPENAI_COMPATIBLE_MODEL_NAME`	Name of the model accessible via the endpoint	String	`yi-34b`, `gpt-3.5-turbo`, `mistral-large`, etc.
`OPENAI_COMPATIBLE_API_KEY`	Authentication token for the endpoint	String	`sk-1234567890`
`OPENAI_COMPATIBLE_API_BASE`	Base URI for the compatible service	String	`https://api.together.xyz/v1`, `http://localhost:8000/v1`, etc.
`OPENAI_COMPATIBLE_API_VERSION`	API protocol version (Optional)	String	`2023-05-15`
`OPENAI_COMPATIBLE_MAX_TOKENS`	Hard limit on output tokens (Optional)	Integer	`4096`, `8192`, etc.
`OPENAI_COMPATIBLE_TEMPERATURE`	Sampling temperature setting (Optional)	Float	`0.0`, `0.5`, `0.7`, etc.
`OPENAI_COMPATIBLE_SUPPORTS_VISION`	Indicator if the model accepts visual inputs (Optional)	Boolean	`true`, `false`

Supported LLM Key for this configuration: OPENAI_COMPATIBLE

General LLM Configuration Overrides

Variable	Description	Type	Example Value
`LLM_KEY`	The designated primary model identifier to utilize	String	Reference keys listed above
`SECONDARY_LLM_KEY`	The designated model identifier for subsidiary agents	String	Reference keys listed above
`LLM_CONFIG_MAX_TOKENS`	Overrides the default maximum token limit for LLM contexts	Integer	`128000`

Development Trajectory

This outlines planned features for the immediate future. We actively solicit suggestions for feature prioritization via email or Discord.

[x] Open Sourcing - Core platform code released publicly.
[x] Chained Operations - Implementation of multi-step workflow execution.
[x] Contextual Enhancement - Improvement in LLM's understanding of surrounding element labels via prompt injection.
[x] Efficiency Gains - Optimization of the context tree transmission to enhance stability and reduce operational expenditures.
[x] Modern UI Platform - Transition from Streamlit to a production-grade React interface for job initiation.
[x] Visual Workflow Editor - Introduction of a graphical tool for constructing and analyzing workflow dependencies.
[x] Real-Time Viewport Feed - Integration of live browser streaming into the new UI.
[x] Historical Run Visualization - React UI replacement for visualizing past execution logs and outcomes.
[X] Autonomous Workflow Generation ("Observer") - Capability for Skyvern to automatically blueprint workflows during navigation.
[x] Prompt Response Caching - Implementation of a memory layer to cache LLM results, drastically cutting recurring computation costs.
[x] Benchmark Integration - Incorporation of standard evaluation datasets to continuously monitor performance metrics.
[ ] Enhanced Debug Mode - A planning phase where the agent requests user approval before executing steps, aiding debugging and prompt refinement.
[ ] Browser Extension Utility - Development of a Chrome Extension for integrated access (voice commands, task saving, etc.).
[ ] Action Recording Feature - Capability for Skyvern to observe a user completing a task and generate the corresponding workflow automatically.
[ ] Interactive Live Feed - Allowing real-time user intervention into the live browser stream.
[ ] LLM Observability Integration - Incorporating tools for back-testing prompt adjustments against specific datasets and visualizing longitudinal performance trends.
[x] Langchain Compatibility - Provision of a langchain_community integration allowing Skyvern to function as a Tool.

Collaboration Guidelines

We enthusiastically welcome Pull Requests and feature proposals! Engage with us via email or Discord. Consult our contribution guide and review the currently open "Help Wanted" issues to find entry points for contribution.

For an architectural overview of the repository structure, build instructions, and usage resolution assistance, utilize Code Sage.

Usage Telemetry

By default, Skyvern gathers aggregate, anonymous usage statistics to inform development priorities. To opt-out of this data collection, set the environment variable SKYVERN_TELEMETRY to the value false.

Licensing Framework

The core operational logic of Skyvern is made available under the permissive AGPL-3.0 License within this open-source repository. Certain proprietary anti-bot mechanisms are reserved for our managed cloud service only.

For any inquiries or clarifications regarding licensing terms, please reach out to support; we are prepared to assist.

Skyvern Automator

Author

Skyvern-AI

Quick Info

Actions

Tags

Operational Paradigm

Demonstration

Metrics & Benchmarking

Efficacy in WRITE Operations (e.g., Form Population, Login, File Retrieval)

Getting Started

Skyvern Cloud Offering

Local Installation and Deployment

1. Install Skyvern Package

2. Initial Setup

3. Task Execution

Graphical User Interface (Recommended)

Programmatic Execution

Connection to Skyvern Cloud

Connection to a local Skyvern deployment

Advanced Control Mechanisms

Direct Control of a Local Chrome Instance

Example path for macOS; adjust for your OS.

Specify the path to your Chrome executable (example for Mac).

Connecting to Any Remote Browser Instance

Enforcing Consistent Output Structure

Task execution proceeds with guaranteed schema adherence.

Essential Debugging Commands

Initiate the Skyvern Backend Component

Launch the Skyvern User Interface

Query the operational status of the Skyvern service

Halt all Skyvern components

Shut down the Skyvern User Interface

Terminate the Skyvern Backend Component

Docker Compose Deployment

Core Skyvern Capabilities

Task Execution Units (Skyvern Tasks)

Sequential Operations (Skyvern Workflows)

Real-Time Viewport Streaming

Intelligent Form Population

Structured Data Retrieval

Artifact Downloading

Access Control Management

🔐 Multi-Factor Authentication (MFA) Support (TOTP)

Credential Manager Integration

Model Context Protocol (MCP) Adherence

Integration with Workflow Automation Platforms

Practical Demonstrations of Skyvern Utility

Bulk Invoice Retrieval Across Diverse Portals

Automating Job Application Submissions

Streamlining Material Procurement for Industrial Operations

System Registration and Form Completion on Governmental Sites

Mass Population of 'Contact Us' Web Forms

Generating Insurance Quotations from Various Providers

Developer Environment Setup

Comprehensive Reference Material

Supported Model Connectors

Environment Variable Definitions

OpenAI Configuration

Anthropic Configuration

Azure OpenAI Configuration

AWS Bedrock Configuration

Gemini Configuration

Ollama Configuration

OpenRouter Configuration

OpenAI-Compatible Endpoints

General LLM Configuration Overrides

Development Trajectory

Collaboration Guidelines

Usage Telemetry

Licensing Framework

See Also