logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

PrestoDistributedSQL

Execute massive-scale, parallelized SQL queries across diverse data repositories, facilitating advanced data insights and big data processing workflows.

Author

PrestoDistributedSQL logo

a2888409

Apache License 2.0

Quick Info

GitHub GitHub Stars 0
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

cloudqueriesdatasetsdistributed sqlbig dataqueries large

Presto Distributed SQL Engine Build Status

Presto serves as a high-performance, horizontally scalable SQL query execution layer designed for handling enormous datasets.

Consult the official User Manual for comprehensive setup guides and operational documentation.

Prerequisites

  • Operating System: Mac OS X or Linux environment required.
  • Runtime Environment: Java Development Kit (JDK) version 8 Update 151 or newer (64-bit). Compatibility extends across both Oracle JDK and OpenJDK distributions.
  • Build Tool: Apache Maven version 3.3.9 or higher is necessary for compilation.
  • Launcher Utility: Python version 2.4 or later is needed for executing the provided launcher script.

Compiling Presto

This project utilizes the standard Maven build system. Navigate to the root directory of the source code and execute:

./mvnw clean install

The initial invocation of Maven will fetch all necessary external dependencies from the central repositories and cache them locally (in ~/.m2/repository). This first compilation cycle might require significant download time; subsequent builds will benefit from the local cache.

Presto includes an extensive suite of unit tests. If time constraints are critical and testing is not immediately required, you can bypass test execution during the build process:

./mvnw clean install -DskipTests

Running within an Integrated Development Environment (IDE)

Conceptual Outline

Following the successful initial build, the project can be imported into an IDE for development work. IntelliJ IDEA is the strongly recommended environment. Given its standard Maven structure, import the project by selecting the root pom.xml file via the File -> Open or Quick Start menu options.

Upon opening in IntelliJ, verify the Java Software Development Kit configuration:

  • Access the File menu and navigate to Project Structure.
  • In the SDKs configuration panel, confirm that a JDK 1.8 distribution is selected (create one if missing).
  • In the Project settings pane, set the Project language level explicitly to 8.0, as Presto leverages several modern Java 8 language constructs.

Presto is packaged with default configuration files suitable for immediate local development execution. To establish a run configuration:

  • Main Class: com.facebook.presto.server.PrestoServer
  • VM Options: -ea -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -Xmx2G -Dconfig=etc/config.properties -Dlog.levels-file=etc/log.properties
  • Working Directory: $MODULE_WORKING_DIR$ or $MODULE_DIR$ (depending on the specific IntelliJ version).
  • Use classpath of module: presto-main

The correct working directory must point to the presto-main subdirectory. In IntelliJ, utilizing $MODULE_DIR$ generally handles this path resolution automatically.

Furthermore, the Hive connector requires explicit configuration pointing to the Hive metastore Thrift endpoint. Append the following to the VM options list, substituting localhost:9083 with your actual endpoint, or leave as is if you are using a temporary local metastore setup:

-Dhive.metastore.uri=thrift://localhost:9083

Accessing Remote Services via SOCKS Proxy

If your Hive metastore or HDFS installation resides on a network segment inaccessible directly from your workstation, SSH port forwarding can establish a secure tunnel. Configure a dynamic SOCKS proxy listening on local port 1080:

ssh -v -N -D 1080 remote_server_host

Then, incorporate this proxy setting into your VM options:

-Dhive.metastore.thrift.client.socks-proxy=localhost:1080

Interacting with the Command Line Interface (CLI)

Launch the CLI application to establish a connection to the running Presto server and execute analytical queries:

presto-cli/target/presto-cli-*-executable.jar

Execute a diagnostic query to inspect the current cluster topology:

SELECT * FROM system.runtime.nodes;

Given the standard sample configuration, the Hive data source is registered under the catalog name hive. You can list tables within the default Hive database using:

SHOW TABLES FROM hive.default;

Development Style Guidelines

We strongly encourage the use of IntelliJ IDEA for development. The project's specific code style template, alongside general Java programming guidelines, is maintained in the codestyle repository. Adherence to the following specific rules is also mandatory:

  • Maintain strict alphabetical ordering for all sections within documentation source files (including tables of contents and standard markdown files). This principle should generally extend to methods and variables where sequential ordering is already established.
  • Leverage the Java 8 Stream API where it enhances readability. However, be cautious: the current stream implementation exhibits performance degradation in tight loops or other high-throughput execution contexts; avoid its use there.
  • Ensure all thrown exceptions are appropriately categorized. For instance, utilize error codes when instantiating PrestoException, e.g., PrestoException(HIVE_TOO_MANY_OPEN_PARTITIONS). This taxonomy enables better monitoring and report generation for failure frequencies.
  • Validate that every source file includes the necessary license header. This can be generated automatically using the command: mvn license:format.
  • Prefer parameterized string formatting (printf-style using Java's Formatter class) for complex string construction, such as: format("Session property %s is invalid: %s", name, value) (Note: the format() method must always be statically imported). For simple concatenation, the standard + operator is acceptable.
  • Refrain from employing the ternary operator (?:) unless the expression is exceptionally simple and self-explanatory.
  • When writing assertions, utilize a corresponding method from Airlift's Assertions class if available, rather than implementing custom checks. Future iterations may transition towards more expressive assertion libraries like AssertJ.
  • Git commit messages must adhere strictly to the official guidelines.

Generating Documentation Artifacts

Refer to the docs README for instructions on building the project documentation.

Compiling the Web Interface

The Presto Web UI is constructed using React components written in JSX and ES6. This source code undergoes compilation into browser-compatible JavaScript, which is then committed into the main Presto source tree (specifically the dist folder). Installation requires both Node.js and Yarn to be present on the system.

To update dependencies after modifying source files:

yarn --cwd presto-main/src/main/resources/webapp/src install

If no changes were made to dependency definitions (i.e., package.json remains untouched), a faster build command suffices:

yarn --cwd presto-main/src/main/resources/webapp/src run package

For rapid development cycles, utilize the watch mode, which triggers automatic recompilation upon detecting source file modifications:

yarn --cwd presto-main/src/main/resources/webapp/src run watch

To expedite iteration, simply trigger a full project rebuild within IntelliJ after the packaging step completes. Resources will typically be hot-reloaded, reflecting changes instantly upon browser refresh.

Documentation for Releases

When submitting a Pull Request, the description must include the relevant details for the release notes. Ensure compliance with the Release Notes Guidelines during preparation.

See Also

`