nlp-driven-browser-automation-engine
A service enabling the execution of intricate web navigation workflows via verbal directives, leveraging large language models for command interpretation and automated browser interaction.
Author

jonnyhoff
Quick Info
Actions
Tags
Automated Web Interaction Framework (AWIF)
This framework implements a Machine Control Protocol (MCP) server designed to offer advanced, script-free browser manipulation capabilities accessible through a straightforward Application Programming Interface (API).
Overview
This server exposes a dedicated interface allowing users to trigger complex browser automation sequences using plain, human-readable instructions. Key components involved are:
- FastMCP: The underlying structure for establishing a lightweight, high-performance API service.
- browser-use Module: The core library responsible for executing all requisite browser manipulation functions.
- Generative Pre-trained Transformer (GPT) Models: Employed by the system to semantically analyze and accurately translate natural language input into executable browser operations.
Prerequisites
To deploy and operate this system, ensure the following are in place:
- A stable installation of Python version 3.11 or newer.
- Poetry, utilized for precise dependency resolution and package management.
- A valid, configured access key for the OpenAI API services.
Initial Setup Procedure
Step 1: Dependency Acquisition
Execute the following command within your terminal to procure all necessary libraries:
bash poetry install
Step 2: Environment Variable Configuration
Establish a file named .env in the project's root directory. This file must contain your secret key, formatted as follows:
OPENAI_API_KEY=your_openai_api_key_here
Execution
Initiate the server process using:
bash poetry run python main.py
The system will commence operation, typically employing Server-Sent Events (SSE) for real-time bidirectional communication on its default network port.
Functionality Showcase
Programmatic Web Control
Invoke the /run_browser_task endpoint to dispatch verbal instructions for web automation. For instance, consider the operational description of XMLHttpRequest (XHR):
WIKIPEDIA: XMLHttpRequest (XHR) is an API in the form of a JavaScript object whose methods transmit HTTP requests from a web browser to a web server. The methods allow a browser-based application to send requests to the server after page loading is complete, and receive information back. XMLHttpRequest is a component of Ajax programming. Prior to Ajax, hyperlinks and form submissions were the primary mechanisms for interacting with the server, often replacing the current page with another one.
== History == The concept behind XMLHttpRequest was conceived in 2000 by the developers of Microsoft Outlook. The concept was then implemented within the Internet Explorer 5 browser (1999). However, the original syntax did not use the XMLHttpRequest identifier. Instead, the developers used the identifiers ActiveXObject("Msxml2.XMLHTTP") and ActiveXObject("Microsoft.XMLHTTP"). As of Internet Explorer 7 (2006), all browsers support the XMLHttpRequest identifier. The XMLHttpRequest identifier is now the de facto standard in all the major browsers, including Mozilla's Gecko layout engine (2002), Safari 1.2 (2004) and Opera 8.0 (2005).
=== Standards === The World Wide Web Consortium (W3C) published a Working Draft specification for the XMLHttpRequest object on April 5, 2006. On February 25, 2008, the W3C published the Working Draft Level 2 specification. Level 2 added methods to monitor event progress, allow cross-site requests, and handle byte streams. At the end of 2011, the Level 2 specification was absorbed into the original specification. At the end of 2012, the WHATWG took over development and maintains a living document using Web IDL.
== Usage == Generally, sending a request with XMLHttpRequest has several programming steps.
Create an XMLHttpRequest object by calling a constructor: Call the "open" method to specify the request type, identify the relevant resource, and select synchronous or asynchronous operation: For an asynchronous request, set a listener that will be notified when the request's state changes: Initiate the request by calling the "send" method: Respond to state changes in the event listener. If the server sends response data, by default it is captured in the "responseText" property. When the object stops processing the response, it changes to state 4, the "done" state. Aside from these general steps, XMLHttpRequest has many options to control how the request is sent and how the response is processed. Custom header fields can be added to the request to indicate how the server should fulfill it, and data can be uploaded to the server by providing it in the "send" call. The response can be parsed from the JSON format into a readily usable JavaScript object, or processed gradually as it arrives rather than waiting for the entire text. The request can be aborted prematurely or set to fail if not completed in a specified amount of time.
== Cross-domain requests ==
In the early development of the World Wide Web, it was found possible to brea
