Skyvern Automator
Facilitates the connection of sophisticated AI applications to web browsers, enabling the execution of complex digital tasks such as data entry into forms, secure retrieval of files, and comprehensive online information synthesis. It offers flexibility via a locally deployable setup leveraging a preferred Large Language Model (LLM) or through a robust cloud-based API service.
Author

Skyvern-AI
Quick Info
Actions
Tags
🐉 Orchestrate Digital Journeys via LLMs and Visual Perception 🐉
Skyvern spearheads the automation of web-based procedures leveraging advanced Large Language Models combined with Computer Vision capabilities. It furnishes a streamlined Application Programming Interface (API) endpoint designed to fully govern manual interactions across an extensive catalog of websites, offering a superior replacement for fragile or inconsistent scripting solutions.
Historically, automating browser functions mandated the creation of bespoke scripts tailored to specific sites, frequently relying on brittle Document Object Model (DOM) inspection or error-prone XPath selectors that failed upon any minor interface revision.
In contrast to selector-dependent interaction models, Skyvern prioritizes the utilization of Vision-enabled LLMs to perceive, learn, and execute web operations.
Operational Paradigm
Skyvern's architectural inspiration stems from the Task-Driven autonomous agent frameworks established by projects like BabyAGI and AutoGPT. Its primary differentiator is the integration of robust browser manipulation capabilities via libraries such as Playwright.
Skyvern deploys a coordinated ensemble of specialized agents responsible for interpreting the web environment, formulating action plans, and executing the required steps:
This composite methodology yields significant benefits:
- Zero-Shot Generalization: Skyvern can successfully navigate and operate on novel websites, as it maps visual cues to necessary actions without requiring pre-coded scripts or element identifiers.
- Resilience to Change: The system inherently resists breakage caused by website layout modifications because it does not depend on static XPaths or pre-defined selector attributes during navigation.
- Workflow Portability: A single defined procedure can be applied across numerous distinct websites, as Skyvern reasons dynamically about the necessary interaction sequence.
- Advanced Semantic Reasoning: The incorporation of LLMs enables sophisticated contextual decision-making. For instance:
- When securing an automotive insurance estimate from a provider, it can logically deduce the answer to a sensitive query like "Were you eligible to drive at age 18?" based on contextual data such as the driver having obtained their license at sixteen.
- In competitive analysis, it can equate a 22 oz Arnold Palmer from one vendor with a 23 oz can from another, recognizing the minor size variance as trivial noise rather than distinct entities.
A comprehensive technical assessment detailing its capabilities is accessible here.
Demonstration
https://github.com/user-attachments/assets/5cab4668-e8e2-4982-8551-aab05ff73a7f
Metrics & Benchmarking
Skyvern achieves State-of-the-Art (SOTA) performance on the WebBench benchmark with a 64.4% success rate. Detailed methodology and evaluation results are available here
Efficacy in WRITE Operations (e.g., Form Population, Login, File Retrieval)
Skyvern excels in operations classified as WRITE tasks (data submission, credential entry, artifact downloading), which are highly relevant for Robotic Process Automation (RPA) adjacent applications.
Getting Started
Skyvern Cloud Offering
Skyvern Cloud provides a fully managed execution environment, eliminating infrastructure management concerns. This service supports concurrent execution of multiple Skyvern instances and includes integrated countermeasures against bot detection, a dedicated proxy network, and CAPTCHA resolution services.
To initiate use, please visit app.skyvern.com to establish an account.
Local Installation and Deployment
Prerequisites: - Python version 3.11.x (Compatible with 3.12; 3.13 support pending) - NodeJS and NPM
Additional requirements for Windows environments: - Rust toolchain - VS Code configured with C++ development tools and the Windows SDK
1. Install Skyvern Package
bash pip install skyvern
2. Initial Setup
This command is essential for the first run, handling database initialization and schema migrations.
bash skyvern quickstart
3. Task Execution
Graphical User Interface (Recommended)
Launch the Skyvern service backend alongside the web interface (assuming the database is operational):
bash skyvern run all
Access the interface at http://localhost:8080 to initiate tasks via the UI.
Programmatic Execution
python from skyvern import Skyvern
skyvern_instance = Skyvern() task_result = await skyvern_instance.run_task(prompt="Identify the leading article on Hacker News today") print(task_result)
Skyvern will launch a browser window to execute the task and automatically terminate it upon completion. Historical records of the task are viewable at http://localhost:8080/history
You can also direct tasks toward different endpoints: python from skyvern import Skyvern
Connection to Skyvern Cloud
skyvern_cloud = Skyvern(api_key="YOUR_SKYVERN_API_KEY")
Connection to a local Skyvern deployment
skyvern_local = Skyvern(base_url="http://localhost:8000", api_key="LOCAL_API_KEY")
task_result = await skyvern_local.run_task(prompt="Identify the leading article on Hacker News today") print(task_result)
Advanced Control Mechanisms
Direct Control of a Local Chrome Instance
⚠️ CAUTION: Beginning with Chrome version 136, direct CDP connections utilizing the default user_data_dir are restricted by default. To leverage existing browser profiles, Skyvern copies the default user_data_dir to
./tmp/user_data_dirupon its initial attempt to connect to your local browser instance. ⚠️
- Via Python Scripting python from skyvern import Skyvern
Example path for macOS; adjust for your OS.
browser_executable_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" skyvern_custom = Skyvern( base_url="http://localhost:8000", api_key="YOUR_API_KEY", browser_path=browser_executable_path, ) task_result = await skyvern_custom.run_task( prompt="Determine the highest-ranked story on Hacker News today", )
- Via Skyvern Service Configuration
Set the following environment variables in your .env file:
bash
Specify the path to your Chrome executable (example for Mac).
CHROME_EXECUTABLE_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" BROWSER_TYPE=cdp-connect
After setting these variables, restart the Skyvern service (skyvern run all) and invoke tasks via the UI or code.
Connecting to Any Remote Browser Instance
Supply the established Chrome DevTools Protocol (CDP) connection URL to Skyvern:
python from skyvern import Skyvern
skyvern_remote = Skyvern(cdp_url="your cdp connection url") task_result = await skyvern_remote.run_task( prompt="Find the top post on hackernews today", )
Enforcing Consistent Output Structure
Achieve predictable JSON output by supplying the data_extraction_schema argument:
python
from skyvern import Skyvern
skyvern_schema = Skyvern() task_result = await skyvern_schema.run_task( prompt="Extract the primary headline from the current webpage", data_extraction_schema={ "type": "object", "properties": { "title": { "type": "string", "description": "The headline text of the primary content" }, "url": { "type": "string", "description": "The permanent link to the content" }, "points": { "type": "integer", "description": "Numerical rating if applicable" } } }
Task execution proceeds with guaranteed schema adherence.
)
Essential Debugging Commands
bash
Initiate the Skyvern Backend Component
skyvern run server
Launch the Skyvern User Interface
skyvern run ui
Query the operational status of the Skyvern service
skyvern status
Halt all Skyvern components
skyvern stop all
Shut down the Skyvern User Interface
skyvern stop ui
Terminate the Skyvern Backend Component
skyvern stop server
Docker Compose Deployment
- Ensure Docker Desktop is installed and active.
- Verify that no other PostgreSQL instance is occupying port 5432 (use
docker psto check). - Clone the repository and navigate to the root directory.
- Execute
skyvern init llmto generate the requisite.envconfiguration file, which will be injected into the Docker image. - Populate the necessary LLM provider credentials within the docker-compose.yml file. If deploying Skyvern on a remote host, ensure the UI container references the correct server IP in
docker-compose.yml. -
Execute the deployment command: bash docker compose up -d
-
Access the operational UI at
http://localhost:8080.
Critical Note: Only a single PostgreSQL container can bind to port 5432. If transitioning from a CLI-managed Postgres to Docker Compose, you must first eliminate the existing container: bash docker rm -f postgresql-container
If database errors arise during Docker operation, inspect active containers with docker ps to identify any conflicting Postgres instances.
Core Skyvern Capabilities
Task Execution Units (Skyvern Tasks)
Tasks represent the most granular unit of work within Skyvern. Each task initiates a single instruction set, directing Skyvern to navigate the web and achieve a defined objective. Tasks require specification of a url, a descriptive prompt, and optionally accept a structured output data schema or specific error codes to trigger immediate termination under defined conditions.
Sequential Operations (Skyvern Workflows)
Workflows enable the composition of multiple distinct tasks into a singular, coherent process. For example, automating the retrieval of all invoices dated after a specific threshold involves a workflow that navigates to the invoice repository, applies the date filter, extracts the list of relevant items, and then iteratively downloads each one.
Another compelling use case is e-commerce order automation: a workflow might first locate the desired product, add it to the shopping cart, subsequently verify cart contents, and finally proceed through the complete payment and confirmation sequence.
Supported Workflow Primitives: 1. Browser Navigation Task 1. Browser Interaction Command 1. Data Schema Enforcement 1. State Validation Step 1. Iterative Loops (For Each) 1. File Content Interpretation 1. Email Dispatch Module 1. Natural Language Instruction Block 1. External HTTP Communication 1. Custom Code Injection Block 1. Artifact Upload to Object Storage 1. (Upcoming) Conditional Branching Logic
Real-Time Viewport Streaming
Skyvern provides the capability to stream the browser's visual output directly to your local machine, allowing for direct observation of automated actions. This feature is invaluable for debugging, understanding system behavior, and enabling manual overrides during sensitive operational phases.
Intelligent Form Population
Skyvern possesses native competence in reading context and inputting data into web forms. Providing the required information within the navigation_goal allows the system to semantically interpret the data and populate the corresponding form fields accurately.
Structured Data Retrieval
Data extraction is a core function. You can mandate a specific output structure by embedding a data_extraction_schema directly within the primary instruction prompt, formatted in JSONC. Skyvern guarantees that its extracted results adhere strictly to the schema provided.
Artifact Downloading
Files presented for download on web pages are handled automatically. All retrieved artifacts are seamlessly uploaded to configured block storage solutions, making them accessible via the management interface.
Access Control Management
Skyvern supports several mechanisms to manage access to pages protected by login screens. For integration with advanced authentication flows, please initiate contact via email or our Discord server.
🔐 Multi-Factor Authentication (MFA) Support (TOTP)
Skyvern incorporates support for various MFA protocols necessary for automating sensitive workflows: 1. QR-Code based MFA (e.g., Google Authenticator, Authy) 1. Email token validation 1. SMS verification code processing
🔐 Further details on MFA integration are available here.
Credential Manager Integration
Skyvern currently offers compatibility with the following password management utilities: - [x] Bitwarden - [ ] 1Password - [ ] LastPass
Model Context Protocol (MCP) Adherence
Skyvern fully embraces the Model Context Protocol (MCP), allowing users to integrate any LLM that implements this standard for communication.
Refer to the MCP specifications here.
Integration with Workflow Automation Platforms
Skyvern interfaces seamlessly with major low-code/no-code platforms, including Zapier, Make.com, and N8N, for ecosystem connectivity.
🔐 Review MFA specifics here.
Practical Demonstrations of Skyvern Utility
We actively track real-world deployments of Skyvern. Below are examples illustrating how the platform is utilized for workflow automation:
Bulk Invoice Retrieval Across Diverse Portals
Schedule a live demonstration to witness this capability.
Automating Job Application Submissions
Streamlining Material Procurement for Industrial Operations
System Registration and Form Completion on Governmental Sites
Mass Population of 'Contact Us' Web Forms
Generating Insurance Quotations from Various Providers
Developer Environment Setup
Please ensure you have the uv package manager installed.
1. Execute the following command to initialize your isolated development environment (.venv):
bash
uv sync --group dev
-
Perform the initial service configuration via the CLI: bash uv run skyvern quickstart
-
Access the interactive web dashboard at
http://localhost:8080. The Skyvern Command Line Interface is fully functional across Windows, WSL, macOS, and Linux platforms.
Comprehensive Reference Material
More detailed documentation is hosted on our 📕 official documentation site. Should you encounter any ambiguities or require new information, please report an issue or contact us via email or Discord.
Supported Model Connectors
| Vendor | Compatible Models |
|---|---|
| OpenAI | gpt4-turbo, gpt-4o, gpt-4o-mini |
| Anthropic | Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet) |
| Azure OpenAI | Any GPT variant. Superior multimodal performance with azure/gpt4-o |
| AWS Bedrock | Anthropic Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet) |
| Gemini | Gemini 2.5 Pro and Flash, Gemini 2.0 |
| Ollama | Any model hosted locally via Ollama |
| OpenRouter | Models accessed via the OpenRouter gateway |
| OpenAI-compatible | Any endpoint adhering to the OpenAI API specification (via liteLLM) |
Environment Variable Definitions
OpenAI Configuration
| Variable | Purpose | Type | Example Value |
|---|---|---|---|
ENABLE_OPENAI |
Activates OpenAI model integration | Boolean | true, false |
OPENAI_API_KEY |
Your OpenAI Secret Key | String | sk-1234567890 |
OPENAI_API_BASE |
Alternate API endpoint (optional) | String | https://openai.api.base |
OPENAI_ORGANIZATION |
OpenAI Organization Identifier (optional) | String | your-org-id |
Recommended Primary Key: OPENAI_GPT4O, OPENAI_GPT4O_MINI, OPENAI_GPT4_1, OPENAI_O4_MINI, OPENAI_O3
Anthropic Configuration
| Variable | Purpose | Type | Example Value |
|---|---|---|---|
ENABLE_ANTHROPIC |
Activates Anthropic model integration | Boolean | true, false |
ANTHROPIC_API_KEY |
Your Anthropic Secret Key | String | sk-1234567890 |
Recommended Primary Key: ANTHROPIC_CLAUDE3.5_SONNET, ANTHROPIC_CLAUDE3.7_SONNET, ANTHROPIC_CLAUDE4_OPUS, ANTHROPIC_CLAUDE4_SONNET
Azure OpenAI Configuration
| Variable | Purpose | Type | Example Value |
|---|---|---|---|
ENABLE_AZURE |
Activates Azure OpenAI integration | Boolean | true, false |
AZURE_API_KEY |
Azure Deployment API Key | String | sk-1234567890 |
AZURE_DEPLOYMENT |
Azure OpenAI Resource Deployment Name | String | skyvern-deployment |
AZURE_API_BASE |
Azure endpoint base URL | String | https://skyvern-deployment.openai.azure.com/ |
AZURE_API_VERSION |
API Specification Version | String | 2024-02-01 |
Recommended Primary Key: AZURE_OPENAI
AWS Bedrock Configuration
| Variable | Purpose | Type | Example Value |
|---|---|---|---|
ENABLE_BEDROCK |
Activates AWS Bedrock integration. Ensure your AWS credentials are configured first. | Boolean | true, false |
Recommended Primary Key: BEDROCK_ANTHROPIC_CLAUDE3.7_SONNET_INFERENCE_PROFILE, BEDROCK_ANTHROPIC_CLAUDE4_OPUS_INFERENCE_PROFILE, BEDROCK_ANTHROPIC_CLAUDE4_SONNET_INFERENCE_PROFILE
Gemini Configuration
| Variable | Purpose | Type | Example Value |
|---|---|---|---|
ENABLE_GEMINI |
Activates Gemini model integration | Boolean | true, false |
GEMINI_API_KEY |
Your Google Gemini API Key | String | your_google_gemini_api_key |
Recommended Primary Key: GEMINI_2.5_PRO_PREVIEW, GEMINI_2.5_FLASH_PREVIEW
Ollama Configuration
| Variable | Purpose | Type | Example Value |
|---|---|---|---|
ENABLE_OLLAMA |
Integrates locally hosted models via Ollama | Boolean | true, false |
OLLAMA_SERVER_URL |
Address for your Ollama instance | String | http://host.docker.internal:11434 |
OLLAMA_MODEL |
The specific Ollama model to load | String | qwen2.5:7b-instruct |
Recommended Primary Key: OLLAMA
Note: Vision capabilities are currently unsupported when using Ollama.
OpenRouter Configuration
| Variable | Purpose | Type | Example Value |
|---|---|---|---|
ENABLE_OPENROUTER |
Activates OpenRouter model gateway | Boolean | true, false |
OPENROUTER_API_KEY |
Your OpenRouter API Key | String | sk-1234567890 |
OPENROUTER_MODEL |
Designated OpenRouter model identifier | String | mistralai/mistral-small-3.1-24b-instruct |
OPENROUTER_API_BASE |
OpenRouter API service root | String | https://api.openrouter.ai/v1 |
Recommended Primary Key: OPENROUTER
OpenAI-Compatible Endpoints
| Variable | Purpose | Type | Example Value |
|---|---|---|---|
ENABLE_OPENAI_COMPATIBLE |
Integrates a third-party API conforming to OpenAI standards | Boolean | true, false |
OPENAI_COMPATIBLE_MODEL_NAME |
Name of the model accessible via the endpoint | String | yi-34b, gpt-3.5-turbo, mistral-large, etc. |
OPENAI_COMPATIBLE_API_KEY |
Authentication token for the endpoint | String | sk-1234567890 |
OPENAI_COMPATIBLE_API_BASE |
Base URI for the compatible service | String | https://api.together.xyz/v1, http://localhost:8000/v1, etc. |
OPENAI_COMPATIBLE_API_VERSION |
API protocol version (Optional) | String | 2023-05-15 |
OPENAI_COMPATIBLE_MAX_TOKENS |
Hard limit on output tokens (Optional) | Integer | 4096, 8192, etc. |
OPENAI_COMPATIBLE_TEMPERATURE |
Sampling temperature setting (Optional) | Float | 0.0, 0.5, 0.7, etc. |
OPENAI_COMPATIBLE_SUPPORTS_VISION |
Indicator if the model accepts visual inputs (Optional) | Boolean | true, false |
Supported LLM Key for this configuration: OPENAI_COMPATIBLE
General LLM Configuration Overrides
| Variable | Description | Type | Example Value |
|---|---|---|---|
LLM_KEY |
The designated primary model identifier to utilize | String | Reference keys listed above |
SECONDARY_LLM_KEY |
The designated model identifier for subsidiary agents | String | Reference keys listed above |
LLM_CONFIG_MAX_TOKENS |
Overrides the default maximum token limit for LLM contexts | Integer | 128000 |
Development Trajectory
This outlines planned features for the immediate future. We actively solicit suggestions for feature prioritization via email or Discord.
- [x] Open Sourcing - Core platform code released publicly.
- [x] Chained Operations - Implementation of multi-step workflow execution.
- [x] Contextual Enhancement - Improvement in LLM's understanding of surrounding element labels via prompt injection.
- [x] Efficiency Gains - Optimization of the context tree transmission to enhance stability and reduce operational expenditures.
- [x] Modern UI Platform - Transition from Streamlit to a production-grade React interface for job initiation.
- [x] Visual Workflow Editor - Introduction of a graphical tool for constructing and analyzing workflow dependencies.
- [x] Real-Time Viewport Feed - Integration of live browser streaming into the new UI.
- [x] Historical Run Visualization - React UI replacement for visualizing past execution logs and outcomes.
- [X] Autonomous Workflow Generation ("Observer") - Capability for Skyvern to automatically blueprint workflows during navigation.
- [x] Prompt Response Caching - Implementation of a memory layer to cache LLM results, drastically cutting recurring computation costs.
- [x] Benchmark Integration - Incorporation of standard evaluation datasets to continuously monitor performance metrics.
- [ ] Enhanced Debug Mode - A planning phase where the agent requests user approval before executing steps, aiding debugging and prompt refinement.
- [ ] Browser Extension Utility - Development of a Chrome Extension for integrated access (voice commands, task saving, etc.).
- [ ] Action Recording Feature - Capability for Skyvern to observe a user completing a task and generate the corresponding workflow automatically.
- [ ] Interactive Live Feed - Allowing real-time user intervention into the live browser stream.
- [ ] LLM Observability Integration - Incorporating tools for back-testing prompt adjustments against specific datasets and visualizing longitudinal performance trends.
- [x] Langchain Compatibility - Provision of a
langchain_communityintegration allowing Skyvern to function as a Tool.
Collaboration Guidelines
We enthusiastically welcome Pull Requests and feature proposals! Engage with us via email or Discord. Consult our contribution guide and review the currently open "Help Wanted" issues to find entry points for contribution.
For an architectural overview of the repository structure, build instructions, and usage resolution assistance, utilize Code Sage.
Usage Telemetry
By default, Skyvern gathers aggregate, anonymous usage statistics to inform development priorities. To opt-out of this data collection, set the environment variable SKYVERN_TELEMETRY to the value false.
Licensing Framework
The core operational logic of Skyvern is made available under the permissive AGPL-3.0 License within this open-source repository. Certain proprietary anti-bot mechanisms are reserved for our managed cloud service only.
For any inquiries or clarifications regarding licensing terms, please reach out to support; we are prepared to assist.
