logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

Skyvern Automator

Facilitates the connection of sophisticated AI applications to web browsers, enabling the execution of complex digital tasks such as data entry into forms, secure retrieval of files, and comprehensive online information synthesis. It offers flexibility via a locally deployable setup leveraging a preferred Large Language Model (LLM) or through a robust cloud-based API service.

Author

Skyvern Automator logo

Skyvern-AI

GNU Affero General Public License v3.0

Quick Info

GitHub GitHub Stars 14518
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

skyvernautomationwebskyvern aiai skyvernautomation web


🐉 Orchestrate Digital Journeys via LLMs and Visual Perception 🐉

Website_blue_logo_googlechrome_logoColor_black Docs_yellow_logo_gitbook_logoColor_black 1212486326352617534_logo_discord_label_discord skyvern skyvern skyvernai_style_social Follow_on_LinkedIn_8A2BE2_logo_linkedin

Skyvern spearheads the automation of web-based procedures leveraging advanced Large Language Models combined with Computer Vision capabilities. It furnishes a streamlined Application Programming Interface (API) endpoint designed to fully govern manual interactions across an extensive catalog of websites, offering a superior replacement for fragile or inconsistent scripting solutions.

Historically, automating browser functions mandated the creation of bespoke scripts tailored to specific sites, frequently relying on brittle Document Object Model (DOM) inspection or error-prone XPath selectors that failed upon any minor interface revision.

In contrast to selector-dependent interaction models, Skyvern prioritizes the utilization of Vision-enabled LLMs to perceive, learn, and execute web operations.

Operational Paradigm

Skyvern's architectural inspiration stems from the Task-Driven autonomous agent frameworks established by projects like BabyAGI and AutoGPT. Its primary differentiator is the integration of robust browser manipulation capabilities via libraries such as Playwright.

Skyvern deploys a coordinated ensemble of specialized agents responsible for interpreting the web environment, formulating action plans, and executing the required steps:

This composite methodology yields significant benefits:

  1. Zero-Shot Generalization: Skyvern can successfully navigate and operate on novel websites, as it maps visual cues to necessary actions without requiring pre-coded scripts or element identifiers.
  2. Resilience to Change: The system inherently resists breakage caused by website layout modifications because it does not depend on static XPaths or pre-defined selector attributes during navigation.
  3. Workflow Portability: A single defined procedure can be applied across numerous distinct websites, as Skyvern reasons dynamically about the necessary interaction sequence.
  4. Advanced Semantic Reasoning: The incorporation of LLMs enables sophisticated contextual decision-making. For instance:
    1. When securing an automotive insurance estimate from a provider, it can logically deduce the answer to a sensitive query like "Were you eligible to drive at age 18?" based on contextual data such as the driver having obtained their license at sixteen.
    2. In competitive analysis, it can equate a 22 oz Arnold Palmer from one vendor with a 23 oz can from another, recognizing the minor size variance as trivial noise rather than distinct entities.

A comprehensive technical assessment detailing its capabilities is accessible here.

Demonstration

https://github.com/user-attachments/assets/5cab4668-e8e2-4982-8551-aab05ff73a7f

Metrics & Benchmarking

Skyvern achieves State-of-the-Art (SOTA) performance on the WebBench benchmark with a 64.4% success rate. Detailed methodology and evaluation results are available here

Efficacy in WRITE Operations (e.g., Form Population, Login, File Retrieval)

Skyvern excels in operations classified as WRITE tasks (data submission, credential entry, artifact downloading), which are highly relevant for Robotic Process Automation (RPA) adjacent applications.

Getting Started

Skyvern Cloud Offering

Skyvern Cloud provides a fully managed execution environment, eliminating infrastructure management concerns. This service supports concurrent execution of multiple Skyvern instances and includes integrated countermeasures against bot detection, a dedicated proxy network, and CAPTCHA resolution services.

To initiate use, please visit app.skyvern.com to establish an account.

Local Installation and Deployment

Prerequisites: - Python version 3.11.x (Compatible with 3.12; 3.13 support pending) - NodeJS and NPM

Additional requirements for Windows environments: - Rust toolchain - VS Code configured with C++ development tools and the Windows SDK

1. Install Skyvern Package

bash pip install skyvern

2. Initial Setup

This command is essential for the first run, handling database initialization and schema migrations.

bash skyvern quickstart

3. Task Execution

Launch the Skyvern service backend alongside the web interface (assuming the database is operational):

bash skyvern run all

Access the interface at http://localhost:8080 to initiate tasks via the UI.

Programmatic Execution

python from skyvern import Skyvern

skyvern_instance = Skyvern() task_result = await skyvern_instance.run_task(prompt="Identify the leading article on Hacker News today") print(task_result)

Skyvern will launch a browser window to execute the task and automatically terminate it upon completion. Historical records of the task are viewable at http://localhost:8080/history

You can also direct tasks toward different endpoints: python from skyvern import Skyvern

Connection to Skyvern Cloud

skyvern_cloud = Skyvern(api_key="YOUR_SKYVERN_API_KEY")

Connection to a local Skyvern deployment

skyvern_local = Skyvern(base_url="http://localhost:8000", api_key="LOCAL_API_KEY")

task_result = await skyvern_local.run_task(prompt="Identify the leading article on Hacker News today") print(task_result)

Advanced Control Mechanisms

Direct Control of a Local Chrome Instance

⚠️ CAUTION: Beginning with Chrome version 136, direct CDP connections utilizing the default user_data_dir are restricted by default. To leverage existing browser profiles, Skyvern copies the default user_data_dir to ./tmp/user_data_dir upon its initial attempt to connect to your local browser instance. ⚠️

  1. Via Python Scripting python from skyvern import Skyvern

Example path for macOS; adjust for your OS.

browser_executable_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" skyvern_custom = Skyvern( base_url="http://localhost:8000", api_key="YOUR_API_KEY", browser_path=browser_executable_path, ) task_result = await skyvern_custom.run_task( prompt="Determine the highest-ranked story on Hacker News today", )

  1. Via Skyvern Service Configuration

Set the following environment variables in your .env file: bash

Specify the path to your Chrome executable (example for Mac).

CHROME_EXECUTABLE_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" BROWSER_TYPE=cdp-connect

After setting these variables, restart the Skyvern service (skyvern run all) and invoke tasks via the UI or code.

Connecting to Any Remote Browser Instance

Supply the established Chrome DevTools Protocol (CDP) connection URL to Skyvern:

python from skyvern import Skyvern

skyvern_remote = Skyvern(cdp_url="your cdp connection url") task_result = await skyvern_remote.run_task( prompt="Find the top post on hackernews today", )

Enforcing Consistent Output Structure

Achieve predictable JSON output by supplying the data_extraction_schema argument: python from skyvern import Skyvern

skyvern_schema = Skyvern() task_result = await skyvern_schema.run_task( prompt="Extract the primary headline from the current webpage", data_extraction_schema={ "type": "object", "properties": { "title": { "type": "string", "description": "The headline text of the primary content" }, "url": { "type": "string", "description": "The permanent link to the content" }, "points": { "type": "integer", "description": "Numerical rating if applicable" } } }

Task execution proceeds with guaranteed schema adherence.

)

Essential Debugging Commands

bash

Initiate the Skyvern Backend Component

skyvern run server

Launch the Skyvern User Interface

skyvern run ui

Query the operational status of the Skyvern service

skyvern status

Halt all Skyvern components

skyvern stop all

Shut down the Skyvern User Interface

skyvern stop ui

Terminate the Skyvern Backend Component

skyvern stop server

Docker Compose Deployment

  1. Ensure Docker Desktop is installed and active.
  2. Verify that no other PostgreSQL instance is occupying port 5432 (use docker ps to check).
  3. Clone the repository and navigate to the root directory.
  4. Execute skyvern init llm to generate the requisite .env configuration file, which will be injected into the Docker image.
  5. Populate the necessary LLM provider credentials within the docker-compose.yml file. If deploying Skyvern on a remote host, ensure the UI container references the correct server IP in docker-compose.yml.
  6. Execute the deployment command: bash docker compose up -d

  7. Access the operational UI at http://localhost:8080.

Critical Note: Only a single PostgreSQL container can bind to port 5432. If transitioning from a CLI-managed Postgres to Docker Compose, you must first eliminate the existing container: bash docker rm -f postgresql-container

If database errors arise during Docker operation, inspect active containers with docker ps to identify any conflicting Postgres instances.

Core Skyvern Capabilities

Task Execution Units (Skyvern Tasks)

Tasks represent the most granular unit of work within Skyvern. Each task initiates a single instruction set, directing Skyvern to navigate the web and achieve a defined objective. Tasks require specification of a url, a descriptive prompt, and optionally accept a structured output data schema or specific error codes to trigger immediate termination under defined conditions.

Sequential Operations (Skyvern Workflows)

Workflows enable the composition of multiple distinct tasks into a singular, coherent process. For example, automating the retrieval of all invoices dated after a specific threshold involves a workflow that navigates to the invoice repository, applies the date filter, extracts the list of relevant items, and then iteratively downloads each one.

Another compelling use case is e-commerce order automation: a workflow might first locate the desired product, add it to the shopping cart, subsequently verify cart contents, and finally proceed through the complete payment and confirmation sequence.

Supported Workflow Primitives: 1. Browser Navigation Task 1. Browser Interaction Command 1. Data Schema Enforcement 1. State Validation Step 1. Iterative Loops (For Each) 1. File Content Interpretation 1. Email Dispatch Module 1. Natural Language Instruction Block 1. External HTTP Communication 1. Custom Code Injection Block 1. Artifact Upload to Object Storage 1. (Upcoming) Conditional Branching Logic

Real-Time Viewport Streaming

Skyvern provides the capability to stream the browser's visual output directly to your local machine, allowing for direct observation of automated actions. This feature is invaluable for debugging, understanding system behavior, and enabling manual overrides during sensitive operational phases.

Intelligent Form Population

Skyvern possesses native competence in reading context and inputting data into web forms. Providing the required information within the navigation_goal allows the system to semantically interpret the data and populate the corresponding form fields accurately.

Structured Data Retrieval

Data extraction is a core function. You can mandate a specific output structure by embedding a data_extraction_schema directly within the primary instruction prompt, formatted in JSONC. Skyvern guarantees that its extracted results adhere strictly to the schema provided.

Artifact Downloading

Files presented for download on web pages are handled automatically. All retrieved artifacts are seamlessly uploaded to configured block storage solutions, making them accessible via the management interface.

Access Control Management

Skyvern supports several mechanisms to manage access to pages protected by login screens. For integration with advanced authentication flows, please initiate contact via email or our Discord server.

🔐 Multi-Factor Authentication (MFA) Support (TOTP)

Skyvern incorporates support for various MFA protocols necessary for automating sensitive workflows: 1. QR-Code based MFA (e.g., Google Authenticator, Authy) 1. Email token validation 1. SMS verification code processing

🔐 Further details on MFA integration are available here.

Credential Manager Integration

Skyvern currently offers compatibility with the following password management utilities: - [x] Bitwarden - [ ] 1Password - [ ] LastPass

Model Context Protocol (MCP) Adherence

Skyvern fully embraces the Model Context Protocol (MCP), allowing users to integrate any LLM that implements this standard for communication.

Refer to the MCP specifications here.

Integration with Workflow Automation Platforms

Skyvern interfaces seamlessly with major low-code/no-code platforms, including Zapier, Make.com, and N8N, for ecosystem connectivity.

🔐 Review MFA specifics here.

Practical Demonstrations of Skyvern Utility

We actively track real-world deployments of Skyvern. Below are examples illustrating how the platform is utilized for workflow automation:

Bulk Invoice Retrieval Across Diverse Portals

Schedule a live demonstration to witness this capability.

Automating Job Application Submissions

▶️ See this process in action

Streamlining Material Procurement for Industrial Operations

▶️ See this process in action

System Registration and Form Completion on Governmental Sites

▶️ See this process in action

Mass Population of 'Contact Us' Web Forms

▶️ See this process in action

Generating Insurance Quotations from Various Providers

▶️ See this process in action

▶️ See this process in action

Developer Environment Setup

Please ensure you have the uv package manager installed. 1. Execute the following command to initialize your isolated development environment (.venv): bash uv sync --group dev

  1. Perform the initial service configuration via the CLI: bash uv run skyvern quickstart

  2. Access the interactive web dashboard at http://localhost:8080. The Skyvern Command Line Interface is fully functional across Windows, WSL, macOS, and Linux platforms.

Comprehensive Reference Material

More detailed documentation is hosted on our 📕 official documentation site. Should you encounter any ambiguities or require new information, please report an issue or contact us via email or Discord.

Supported Model Connectors

Vendor Compatible Models
OpenAI gpt4-turbo, gpt-4o, gpt-4o-mini
Anthropic Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet)
Azure OpenAI Any GPT variant. Superior multimodal performance with azure/gpt4-o
AWS Bedrock Anthropic Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet)
Gemini Gemini 2.5 Pro and Flash, Gemini 2.0
Ollama Any model hosted locally via Ollama
OpenRouter Models accessed via the OpenRouter gateway
OpenAI-compatible Any endpoint adhering to the OpenAI API specification (via liteLLM)

Environment Variable Definitions

OpenAI Configuration
Variable Purpose Type Example Value
ENABLE_OPENAI Activates OpenAI model integration Boolean true, false
OPENAI_API_KEY Your OpenAI Secret Key String sk-1234567890
OPENAI_API_BASE Alternate API endpoint (optional) String https://openai.api.base
OPENAI_ORGANIZATION OpenAI Organization Identifier (optional) String your-org-id

Recommended Primary Key: OPENAI_GPT4O, OPENAI_GPT4O_MINI, OPENAI_GPT4_1, OPENAI_O4_MINI, OPENAI_O3

Anthropic Configuration
Variable Purpose Type Example Value
ENABLE_ANTHROPIC Activates Anthropic model integration Boolean true, false
ANTHROPIC_API_KEY Your Anthropic Secret Key String sk-1234567890

Recommended Primary Key: ANTHROPIC_CLAUDE3.5_SONNET, ANTHROPIC_CLAUDE3.7_SONNET, ANTHROPIC_CLAUDE4_OPUS, ANTHROPIC_CLAUDE4_SONNET

Azure OpenAI Configuration
Variable Purpose Type Example Value
ENABLE_AZURE Activates Azure OpenAI integration Boolean true, false
AZURE_API_KEY Azure Deployment API Key String sk-1234567890
AZURE_DEPLOYMENT Azure OpenAI Resource Deployment Name String skyvern-deployment
AZURE_API_BASE Azure endpoint base URL String https://skyvern-deployment.openai.azure.com/
AZURE_API_VERSION API Specification Version String 2024-02-01

Recommended Primary Key: AZURE_OPENAI

AWS Bedrock Configuration
Variable Purpose Type Example Value
ENABLE_BEDROCK Activates AWS Bedrock integration. Ensure your AWS credentials are configured first. Boolean true, false

Recommended Primary Key: BEDROCK_ANTHROPIC_CLAUDE3.7_SONNET_INFERENCE_PROFILE, BEDROCK_ANTHROPIC_CLAUDE4_OPUS_INFERENCE_PROFILE, BEDROCK_ANTHROPIC_CLAUDE4_SONNET_INFERENCE_PROFILE

Gemini Configuration
Variable Purpose Type Example Value
ENABLE_GEMINI Activates Gemini model integration Boolean true, false
GEMINI_API_KEY Your Google Gemini API Key String your_google_gemini_api_key

Recommended Primary Key: GEMINI_2.5_PRO_PREVIEW, GEMINI_2.5_FLASH_PREVIEW

Ollama Configuration
Variable Purpose Type Example Value
ENABLE_OLLAMA Integrates locally hosted models via Ollama Boolean true, false
OLLAMA_SERVER_URL Address for your Ollama instance String http://host.docker.internal:11434
OLLAMA_MODEL The specific Ollama model to load String qwen2.5:7b-instruct

Recommended Primary Key: OLLAMA

Note: Vision capabilities are currently unsupported when using Ollama.

OpenRouter Configuration
Variable Purpose Type Example Value
ENABLE_OPENROUTER Activates OpenRouter model gateway Boolean true, false
OPENROUTER_API_KEY Your OpenRouter API Key String sk-1234567890
OPENROUTER_MODEL Designated OpenRouter model identifier String mistralai/mistral-small-3.1-24b-instruct
OPENROUTER_API_BASE OpenRouter API service root String https://api.openrouter.ai/v1

Recommended Primary Key: OPENROUTER

OpenAI-Compatible Endpoints
Variable Purpose Type Example Value
ENABLE_OPENAI_COMPATIBLE Integrates a third-party API conforming to OpenAI standards Boolean true, false
OPENAI_COMPATIBLE_MODEL_NAME Name of the model accessible via the endpoint String yi-34b, gpt-3.5-turbo, mistral-large, etc.
OPENAI_COMPATIBLE_API_KEY Authentication token for the endpoint String sk-1234567890
OPENAI_COMPATIBLE_API_BASE Base URI for the compatible service String https://api.together.xyz/v1, http://localhost:8000/v1, etc.
OPENAI_COMPATIBLE_API_VERSION API protocol version (Optional) String 2023-05-15
OPENAI_COMPATIBLE_MAX_TOKENS Hard limit on output tokens (Optional) Integer 4096, 8192, etc.
OPENAI_COMPATIBLE_TEMPERATURE Sampling temperature setting (Optional) Float 0.0, 0.5, 0.7, etc.
OPENAI_COMPATIBLE_SUPPORTS_VISION Indicator if the model accepts visual inputs (Optional) Boolean true, false

Supported LLM Key for this configuration: OPENAI_COMPATIBLE

General LLM Configuration Overrides
Variable Description Type Example Value
LLM_KEY The designated primary model identifier to utilize String Reference keys listed above
SECONDARY_LLM_KEY The designated model identifier for subsidiary agents String Reference keys listed above
LLM_CONFIG_MAX_TOKENS Overrides the default maximum token limit for LLM contexts Integer 128000

Development Trajectory

This outlines planned features for the immediate future. We actively solicit suggestions for feature prioritization via email or Discord.

  • [x] Open Sourcing - Core platform code released publicly.
  • [x] Chained Operations - Implementation of multi-step workflow execution.
  • [x] Contextual Enhancement - Improvement in LLM's understanding of surrounding element labels via prompt injection.
  • [x] Efficiency Gains - Optimization of the context tree transmission to enhance stability and reduce operational expenditures.
  • [x] Modern UI Platform - Transition from Streamlit to a production-grade React interface for job initiation.
  • [x] Visual Workflow Editor - Introduction of a graphical tool for constructing and analyzing workflow dependencies.
  • [x] Real-Time Viewport Feed - Integration of live browser streaming into the new UI.
  • [x] Historical Run Visualization - React UI replacement for visualizing past execution logs and outcomes.
  • [X] Autonomous Workflow Generation ("Observer") - Capability for Skyvern to automatically blueprint workflows during navigation.
  • [x] Prompt Response Caching - Implementation of a memory layer to cache LLM results, drastically cutting recurring computation costs.
  • [x] Benchmark Integration - Incorporation of standard evaluation datasets to continuously monitor performance metrics.
  • [ ] Enhanced Debug Mode - A planning phase where the agent requests user approval before executing steps, aiding debugging and prompt refinement.
  • [ ] Browser Extension Utility - Development of a Chrome Extension for integrated access (voice commands, task saving, etc.).
  • [ ] Action Recording Feature - Capability for Skyvern to observe a user completing a task and generate the corresponding workflow automatically.
  • [ ] Interactive Live Feed - Allowing real-time user intervention into the live browser stream.
  • [ ] LLM Observability Integration - Incorporating tools for back-testing prompt adjustments against specific datasets and visualizing longitudinal performance trends.
  • [x] Langchain Compatibility - Provision of a langchain_community integration allowing Skyvern to function as a Tool.

Collaboration Guidelines

We enthusiastically welcome Pull Requests and feature proposals! Engage with us via email or Discord. Consult our contribution guide and review the currently open "Help Wanted" issues to find entry points for contribution.

For an architectural overview of the repository structure, build instructions, and usage resolution assistance, utilize Code Sage.

Usage Telemetry

By default, Skyvern gathers aggregate, anonymous usage statistics to inform development priorities. To opt-out of this data collection, set the environment variable SKYVERN_TELEMETRY to the value false.

Licensing Framework

The core operational logic of Skyvern is made available under the permissive AGPL-3.0 License within this open-source repository. Certain proprietary anti-bot mechanisms are reserved for our managed cloud service only.

For any inquiries or clarifications regarding licensing terms, please reach out to support; we are prepared to assist.

See Also

`