License Check

mcp-appium-visual represents an advanced mobile robotic automation facility integrated with the Model Context Protocol (MCP). It facilitates robust orchestration of Android and iOS environments leveraging the Appium framework, distinguished by its intelligent, vision-based component identification and error correction capabilities.

Core Capabilities

Seamless connectivity to the Appium ecosystem for system management.
AI-driven recognition and remediation for on-screen interface components.
Full compliance with MCP specifications for agent-driven, context-aware testing pipelines.
Cross-platform compatibility spanning both major mobile operating systems.
Optimized specifically for interaction with autonomous AI entities.

Required Dependencies

Runtime Environment: Node.js (minimum version 14).
Java Runtime: Java Development Kit (JDK).
Android Toolchain: Android SDK installation.
iOS Toolchain: Xcode (mandatory for macOS environments).
Automation Server: Running instance of Appium Server.
Target Hardware: Accessible Android physical device/emulator or iOS simulator/device.

Environmental Configuration Steps

Prior to script execution, environment variables must be correctly established:

Update your shell initialization file (e.g., .bash_profile, .zshrc) with essential paths:

bash

Environment variable exemplar for shell profile

export JAVA_HOME=/path/to/your/java/installation export ANDROID_HOME=/path/to/your/android/sdk export PATH=$PATH:$ANDROID_HOME/tools:$ANDROID_HOME/platform-tools

Reload the shell configuration to activate changes:

bash source ~/.bash_profile # For bash users

OR

source ~/.zshrc # For zsh users

Advisory: While the driver initialization routine attempts an automatic configuration load, manual sourcing in a fresh terminal session is strongly advised for consistency.

iOS Specific: Xcode Command Line Utilities Setup

Proper configuration of Xcode CLI tools is crucial for iOS automation tasks:

Installation check/initiation:

bash xcode-select --install

Verification of active path:

bash xcode-select -p

Path reassignment if necessary (useful with multiple Xcode versions):

bash sudo xcode-select -s /Applications/Xcode.app/Contents/Developer

Acceptance of licensing terms:

bash sudo xcodebuild -license accept

For physical iOS device integration, ensure your Apple Developer credentials are set within Xcode (Preferences -> Accounts) and necessary provisioning profiles are downloaded.
Define necessary environment settings for iOS development in your profile:

bash

Append to ~/.bash_profile or ~/.zshrc

export DEVELOPER_DIR="/Applications/Xcode.app/Contents/Developer" export PATH="$DEVELOPER_DIR/usr/bin:$PATH"

Apply configuration updates:

bash source ~/.bash_profile # Bash reload

OR

source ~/.zshrc # Zsh reload

Initializing the System

Install project dependencies:

bash npm install

Install Appium globally and initiate the server process:

bash npm install -g appium appium

Prepare the Android test environment:
Enable Developer Options on the target hardware.
Activate USB Debugging mode.
Connect the device or launch an emulator.
Validate connectivity using the adb devices command.
For macOS/iOS Automation:
Confirm Xcode CLI tools are operational (xcode-select --install).
Provision an iOS simulator or connect a physical device.
Authorize the controlling machine on the iOS device if operating on hardware.

Execution Flow

Compile project assets:

bash npm run build

Power up the MCP orchestration service:

bash npm run dev

In a separate terminal session, initiate the test suite:

bash npm test

Test Configuration Details

Android Specification Adjustments

The default configuration utilizes the Android Settings application for demonstration. To target custom deployments:

Modify the configuration file: examples/appium-test.ts:
Set deviceName to match your connected hardware identifier.
Specify the file system path for your APK in the app property, OR
Define appPackage and appActivity if the application is pre-installed on the device.
Standardized Capabilities Structure (TypeScript):

typescript const capabilities: AppiumCapabilities = { platformName: "Android", deviceName: "YOUR_DEVICE_NAME", automationName: "UiAutomator2", // Option A: Deploying an APK file: app: "./path/to/your/app.apk", // Option B: Targeting an installed application: appPackage: "your.app.package", appActivity: ".MainActivity", noReset: true, };

iOS Specification Adjustments

Leveraging the newly introduced Xcode command line support:

Configuration example within examples/xcode-appium-example.ts:

typescript const capabilities: AppiumCapabilities = { platformName: "iOS", deviceName: "iPhone 13", // Specify simulator or device name automationName: "XCUITest", udid: "DEVICE_UDID", // Obtainable via XcodeCommands.getIosSimulators() // Option A: Deploying an IPA file: app: "./path/to/your/app.app", // Option B: Targeting an installed bundle: bundleId: "com.your.app", noReset: true, };

Executable Operations List

The MCP service exposes a comprehensive set of Appium operations:

Component Interaction:
Element discovery procedures.
Executing taps/clicks utilizing the W3C Actions specification (Refer to "W3C Standard Gestures").
Text input functionality.
Directed scrolling to specific components via W3C Actions.
Initiation of prolonged press events.
Application Lifecycle Management:
Starting or terminating the target application.
Forcing an application state reset.
Retrieval of the currently foreground package name and activity.
Device System Control:
Modification of screen rotation.
Management of the on-screen keyboard visibility.
Locking/unlocking the physical device.
Capturing screen snapshots.
Querying battery status information.
Advanced Functionality Modules:
Switching between Native and WebView contexts.
Remote file system operations.
Interacting with system notifications.
Execution of proprietary gesture sequences.
iOS Toolchain Integration (Exclusive to iOS):
Control over iOS simulators (e.g., power cycle, state management).
Installation and removal of applications on simulators.
Starting and halting application processes.
Snapshot capture utility.
Video recording of interaction sessions.
Creation and disposal of simulated hardware environments.
Retrieval of supported hardware types and runtime versions.

Adherence to W3C Standard Gestures

MCP-Appium now fully implements the W3C WebDriver Actions API for touch interactions, establishing it as the contemporary standard for mobile sequence automation.

W3C Actions for Component Tapping

The tapElement routine employs the W3C Actions API, augmented with intelligent fallback mechanisms:

typescript // Execution sequence preference: // 1. WebdriverIO built-in click() // 2. W3C Actions API execution // 3. Legacy TouchAction API (for backward compatibility needs) await appium.tapElement("//android.widget.Button[@text='OK']"); // The click alias serves identically await appium.click("//android.widget.Button[@text='OK']");

W3C Actions for Scrolling Maneuvers

The scrollToElement method is now powered by the W3C Actions protocol for enhanced reliability:

typescript // Utilizes W3C Actions for dependable scrolling await appium.scrollToElement( "//android.widget.TextView[@text='About phone']", // Selector string "down", // Scroll Vector: "up", "down", "left", "right" "xpath", // Strategy employed 10 // Maximum number of scroll attempts );

Custom W3C Gesture Construction

Customized W3C sequences can be constructed using the executeMobileCommand interface:

typescript // Definition of a custom W3C Actions sequence const w3cActions = { actions: [ { type: "pointer", id: "finger1", parameters: { pointerType: "touch" }, actions: [ // Initial positioning { type: "pointerMove", duration: 0, x: startX, y: startY }, // Input initiation { type: "pointerDown", button: 0 }, // Displacement over time { type: "pointerMove", duration: duration, origin: "viewport", x: endX, y: endY, }, // Input conclusion { type: "pointerUp", button: 0 }, ], }, ], };

// Invoking the W3C Actions via executeScript method await appium.executeMobileCommand("performActions", [w3cActions.actions]);

Consult examples/w3c-actions-swipe-demo.ts for additional examples demonstrating W3C standard gesture integration.

Leveraging Xcode Command Line Utilities

The dedicated XcodeCommands class offers advanced administrative control for iOS testing environments:

typescript import { XcodeCommands } from "../src/lib/xcode/xcodeCommands.js";

// Determine if the necessary CLI tools are present const isInstalled = await XcodeCommands.isXcodeCliInstalled();

// Retrieve list of accessible iOS simulators const simulators = await XcodeCommands.getIosSimulators();

// Initiate power cycle on a specific simulator await XcodeCommands.bootSimulator("SIMULATOR_UDID");

// Deploy application package to a simulator await XcodeCommands.installApp("SIMULATOR_UDID", "/path/to/app.app");

// Start application process on target device await XcodeCommands.launchApp("SIMULATOR_UDID", "com.example.app");

// Capture screen output to a file location await XcodeCommands.takeScreenshot("SIMULATOR_UDID", "/path/to/output.png");

// Halt operation of a simulator await XcodeCommands.shutdownSimulator("SIMULATOR_UDID");

Simplified Click Invocation

The click() wrapper provides a more user-friendly synthetic method compared to the explicit tapElement():

typescript // Using the convenient click method await appium.click("//android.widget.Button[@text='OK']");

// This resolves internally to the same operation: await appium.tapElement("//android.widget.Button[@text='OK']");

Diagnostic and Resolution Guide

Device Connectivity Failure:
Review output from adb devices command.
Confirm USB debugging is toggled ON.
Attempt physical cable re-seating.
Application Deployment Failure:
Validate the specified APK file path.
Check available storage space on the target device.
Verify the application artifact is correctly signed for debugging builds.
Component Locatability Errors:
Use Appium Inspector to validate the accuracy of locator strings.
Ensure the targeted element is currently rendered and visible.
Experiment with alternative element location strategies (e.g., ID instead of XPath).
Communication Interruption:
Confirm the Appium service daemon is actively listening.
Investigate potential port conflicts on the host machine.
Scrutinize the accuracy of the configuration capabilities object.
iOS Simulator Instability:
Re-validate Xcode CLI tool presence: xcode-select -p.
Cross-reference the UDID against xcrun simctl list devices output.
Restart the simulator instance if it exhibits erratic behavior.

Community Participation

We welcome contributions! Please submit bug reports or feature proposals via Issues or Pull Requests.

Licensing

This project is released under the MIT License.

WIKIPEDIA REFERENCE: A headless browser is defined as a web browser operating without any graphical user interface elements. These tools facilitate the programmatic control of web documents within an environment that closely mirrors standard browser functionality, accessible via command-line interfaces or network communication protocols. They are exceptionally valuable for quality assurance, given their capacity to fully interpret and render CSS layouts, fonts, JavaScript execution, and AJAX interactions—capabilities often absent in non-browser automation utilities. Modern versions of Chrome (59+) and Firefox (56+) natively incorporate remote control interfaces, effectively superseding older solutions like PhantomJS.

== Primary Utilization Scenarios == The primary applications for headless browser technology include:

Testing automation for contemporary web architectures. Generating high-fidelity static image captures of web pages. Executing automated validation routines for JavaScript libraries. Automated manipulation and interaction with web page content.

=== Secondary Uses === Web scraping operations benefit significantly from headless execution. Google has indicated that utilizing headless browsing aids in indexing content reliant on Ajax loading mechanisms. Conversely, instances of misuse exist, such as coordinating Denial-of-Service attacks, artificially inflating advertising metrics, or unauthorized automated site interaction (e.g., credential testing). However, recent traffic analysis suggests headless browsers are not disproportionately favored by malicious actors for activities like DDoS or injection attacks.

== Available Automation Frameworks == Due to native headless support in major browsers, several libraries abstract this functionality through unified APIs:

Selenium WebDriver – Implements W3C standards for browser control. Playwright – Node.js library supporting Chromium, Firefox, and WebKit automation. Puppeteer – Node.js utility focused on automating Chrome and Firefox instances.

=== Test Execution Integration === Various software testing suites integrate headless browsing capabilities into their execution methodologies:

Capybara – Employs WebKit or Headless Chrome to simulate user behavior. Jasmine – Defaults to Selenium but supports Headless Chrome or WebKit for browser tests. Cypress, a dedicated frontend testing framework. QF-Test, a tool supporting GUI-based automated testing, often utilizing headless modes.

=== Alternative Methodologies === An alternative pathway involves using software that supplies browser-like APIs without a full rendering engine. Deno incorporates browser APIs natively. For the Node.js ecosystem, jsdom is a prominent choice. While these alternatives often successfully handle HTML parsing, cookies, XHR requests, and basic JavaScript, their DOM event and visual rendering capabilities are typically constrained, resulting in faster execution times compared to full headless rendering.

appium-mcp-driver

Author

Rahulec08

Quick Info

Actions

Tags