English 中文(简体)
Apache Presto - Architecture
  • 时间:2024-12-27

Apache Presto - Architecture


Previous Page Next Page  

The architecture of Presto is almost similar to classic MPP (massively parallel processing) DBMS architecture. The following diagram illustrates the architecture of Presto.

Presto Architecture

The above diagram consists of different components. Following table describes each of the component in detail.

S.No Component & Description
1.

Cpent

Cpent (Presto CLI) submits SQL statements to a coordinator to get the result.

2.

Coordinator

Coordinator is a master daemon. The coordinator initially parses the SQL queries then analyzes and plans for the query execution. Scheduler performs pipepne execution, assigns work to the closest node and monitors progress.

3.

Connector

Storage plugins are called as connectors. Hive, HBase, MySQL, Cassandra and many more act as a connector; otherwise you can also implement a custom one. The connector provides metadata and data for queries. The coordinator uses the connector to get metadata for building a query plan.

4.

Worker

The coordinator assigns task to worker nodes. The workers get actual data from the connector. Finally, the worker node depvers result to the cpent.

Presto − Workflow

Presto is a distributed system that runs on a cluster of nodes. Presto’s distributed query engine is optimized for interactive analysis and supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions. Presto architecture is simple and extensible. Presto cpent (CLI) submits SQL statements to a master daemon coordinator.

The scheduler connects through execution pipepne. The scheduler assigns work to nodes which is closest to the data and monitors progress. The coordinator assigns task to multiple worker nodes and finally the worker node depvers the result back to the cpent. The cpent pulls data from the output process. Extensibipty is the key design. Pluggable connectors pke Hive, HBase, MySQL, etc., provides metadata and data for queries. Presto was designed with a “simple storage abstraction” that makes it easy to provide SQL query capabipty against these different kind of data sources.

Execution Model

Presto supports custom query and execution engine with operators designed to support SQL semantics. In addition to improved schedupng, all processing is in memory and pipepned across the network between different stages. This avoids unnecessary I/O latency overhead.

Advertisements