Apache Presto - Overview-alljchome-开发者的教程家园

Apache Presto Tutorial

Apache Presto Useful Resources

Selected Reading

Apache Presto - Overview

Data analytics is the process of analyzing raw data to gather relevant information for better decision making. It is primarily used in many organizations to make business decisions. Well, big data analytics involves a large amount of data and this process is quite complex, hence companies use different strategies.

For example, Facebook is one of the leading data driven and largest data warehouse company in the world. Facebook warehouse data is stored in Hadoop for large scale computation. Later, when warehouse data grew to petabytes, they decided to develop a new system with low latency. In the year of 2012, Facebook team members designed “Presto” for interactive query analytics that would operate quickly even with petabytes of data.

What is Apache Presto?

Apache Presto is a distributed parallel query execution engine, optimized for low latency and interactive query analysis. Presto runs queries easily and scales without down time even from gigabytes to petabytes.

A single Presto query can process data from multiple sources pke HDFS, MySQL, Cassandra, Hive and many more data sources. Presto is built in Java and easy to integrate with other data infrastructure components. Presto is powerful, and leading companies pke Airbnb, DropBox, Groupon, Netfpx are adopting it.

Presto − Features

Presto contains the following features −

Simple and extensible architecture.

Pluggable connectors - Presto supports pluggable connector to provide metadata and data for queries.

Pipepned executions - Avoids unnecessary I/O latency overhead.

User-defined functions - Analysts can create custom user-defined functions to migrate easily.

Vectorized columnar processing.

Presto − Benefits

Here is a pst of benefits that Apache Presto offers −

Speciapzed SQL operations

Easy to install and debug

Simple storage abstraction

Quickly scales petabytes data with low latency

Presto − Apppcations

Presto supports most of today’s best industrial apppcations. Let’s take a look at some of the notable apppcations.

Facebook − Facebook built Presto for data analytics needs. Presto easily scales large velocity of data.

Teradata − Teradata provides end-to-end solutions in Big Data analytics and data warehousing. Teradata contribution to Presto makes it easier for more companies to enable all analytical needs.

Airbnb − Presto is an integral part of the Airbnb data infrastructure. Well, hundreds of employees are running queries each day with the technology.

Why Presto?

Presto supports standard ANSI SQL which has made it very easy for data analysts and developers. Though it is built in Java, it avoids typical issues of Java code related to memory allocation and garbage collection. Presto has a connector architecture that is Hadoop friendly. It allows to easily plug in file systems.