logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

ui-interaction-agent-python-toolkit

Facilitates sophisticated operational scripting on Windows environments by programmatically manipulating user interface components, capturing visual state data, and directing web browser instances. This library integrates within Python ecosystems to augment productivity via advanced artificial intelligence routines.

Author

ui-interaction-agent-python-toolkit logo

fstandhartinger

MIT License

Quick Info

GitHub GitHub Stars 1
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

automationbrowserbrowsersbrowser automationautomation webbrowsers integrates

Reference to the MCP Backend Server: Locate it here.

If you are seeking the Python Client Library, proceed below... ;)

Smooth Operator Agent Tools - Python Client Library

This package represents the official Python library implementation for the Smooth Operator Agent Tools suite, a cutting-edge collection of utilities designed for developers creating Computer Interaction Agents targeting Windows operating systems.

Synopsis

The Smooth Operator Agent Tools afford powerful capabilities for managing intricate interactions with the native Windows Automation Structure and the Playwright browser control framework, while simultaneously offering augmented intelligence features such as discerning visual interface components based on screen captures or descriptive text.

This dedicated Python module serves as an accessible facade over the Smooth Operator Tools Server API, enabling seamless incorporation of these functionalities into your custom Python applications.

All provided features are fully verifiable and explorable via an intuitive Windows graphical interface prior to their integration into programmatic workflows. Access the interactive environment at Smooth Operator Tools UI.

Installation Procedure

bash pip install smooth-operator-agent-tools

Dependencies

Google Chrome Requirement

The Smooth Operator Agent Tools library necessitates the presence of Google Chrome (or any equivalent Chromium-based web browser) installed on the host machine to ensure the web navigation control functionalities operate correctly.

Embedded Server Setup

The client library incorporates a localized server component that must be deployed within your application's designated data storage location. The necessary server assets are bundled within the library package and are automatically unpacked upon the initial invocation of the client.

Initial Launch Sequence

Upon the very first execution of the library, the following actions will be automatically performed: 1. Establishment of the directory path %APPDATA%\SmoothOperator\AgentToolsServer (or the corresponding path for the active operating system). 2. Unzipping of the server binaries from the installed package. 3. Initiation of the background server process.

Crucially, for the browser automation features to be functional, you must confirm that Node.js and the Playwright runtime environment are installed according to the specifications detailed in the Dependencies section.

Guidance for Software Installers

If you are constructing an installation package that incorporates this library, it is strongly advised to include preparatory steps to install Node.js and Playwright during your application's setup routine to optimize the end-user experience. Refer to the Dependencies section for the necessary installation protocols.

Operational Usage

python from smooth_operator_agent_tools import SmoothOperatorClient

Instantiate the client, supplying your authentication key (obtainable freely from https://screengrasp.com/api.html)

client = SmoothOperatorClient(api_key="YOUR_API_KEY")

Activate the backend Server - this may require a brief initialization period

client.start_server()

Capture a real-time screen snapshot

screenshot = client.screenshot.take()

Retrieve a summary of the current system state

overview = client.system.get_overview()

Execute a virtual pointer click action

client.mouse.click(500, 300)

Locate and activate a specific interface element via its textual description

client.mouse.click_by_description("Finalize Transaction button")

Input textual data

client.keyboard.type("Greetings, network users!")

Command the Chrome browser instance

client.chrome.open_chrome("https://www.example.com") client.chrome.get_dom()

Many returned objects support transformation into a JSON string via to_json_string()

This string format is optimized for feeding into Large Language Models (LLMs)

to leverage advanced AI capabilities for automated inference and action planning

Core Capabilities

  • Visual Capture & Interpretation: Procure screen images and semantically analyze interface controls.
  • Pointer Manipulation: Execute fine-grained mouse movements using absolute coordinates or AI-derived element targeting.
  • Typing Interface: Inject text strings and send complex keystroke sequences.
  • Chrome Control Module: Direct browser navigation, facilitate element interaction, and run arbitrary JavaScript.
  • OS Interaction Layer: Interface with native Windows application controls and system UI elements.
  • System Status Queries: Initiate applications and manage operational parameters of the host machine.

Comprehensive Resources

For exhaustive details regarding the API endpoints, consult the following documentation portals:

  • Operational Manual: In-depth tutorials and explanations covering typical workflows.
  • Sample Project Repository: Downloadable code repository; follow the step-by-step instructions to deploy your initial automation setup within minutes.
  • API Reference: Complete documentation detailing every endpoint exposed by the internal processing server.

Licensing Information

This software package is distributed under the terms of the MIT License (refer to the LICENSE file for specifics).

WIKIPEDIA: A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication. They are particularly useful for testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of JavaScript and Ajax which are usually not available when using other testing methods. Since version 59 of Google Chrome and version 56 of Firefox, there is native support for remote control of the browser. This made earlier efforts obsolete, notably PhantomJS.

== Use cases == The main use cases for headless browsers are:

Test automation in modern web applications (web testing) Taking screenshots of web pages. Running automated tests for JavaScript libraries. Automating interaction of web pages.

=== Other uses === Headless browsers are also useful for web scraping. Google stated in 2009 that using a headless browser could help their search engine index content from websites that use Ajax. Headless browsers have also been misused in various ways:

Perform DDoS attacks on web sites. Increase advertisement impressions. Automate web sites in unintended ways e.g. for credential stuffing. However, a study of browser traffic in 2018 found no preference by malicious actors for headless browsers. There is no indication that headless browsers are used more frequently than non-headless browsers for malicious purposes, like DDoS attacks, SQL injections or cross-site scripting attacks.

== Usage == As several major browsers natively support headless mode through APIs, some software exists to perform browser automation through a unified interface. These include:

Selenium WebDriver – a W3C compliant implementation of WebDriver Playwright – a Node.js library to automate Chromium, Firefox and WebKit Puppeteer – a Node.js library to automate Chrome or Firefox

=== Test automation === Some test automation software and frameworks include headless browsers as part of their testing apparati.

Capybara uses headless browsing, either via WebKit or Headless Chrome to mimic user behavior in its testing protocols. Jasmine uses Selenium by default, but can use WebKit or Headless Chrome, to run browser tests. Cypress, a frontend testing framework QF-Test, a software tool for automated testing of programs via the graphical user interface where a headless browser can also be used for testing.

=== Alternatives === Another approach is to use software that provides browser APIs. For example, Deno provides browser APIs as part of its design. For Node.js, jsdom is the most complete provider. While most are able to support common browser features (HTML parsing, cookies, XHR, some JavaScript, etc.), they do not render the DOM and have limited support for DOM events. They usually perform faster than

See Also

`