- Spark SQL - Data Sources
- Spark SQL - DataFrames
- Spark SQL - Introduction
- Spark - Installation
- Spark - RDD
- Spark - Introduction
- Spark SQL - Home
Spark SQL Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Spark SQL - Data Sources
A DataFrame interface allows different DataSources to work on Spark SQL. It is a temporary table and can be operated as a normal RDD. Registering a DataFrame as a table allows you to run SQL queries over its data.
In this chapter, we will describe the general methods for loading and saving data using different Spark DataSources. Thereafter, we will discuss in detail the specific options that are available for the built-in data sources.
There are different types of data sources available in SparkSQL, some of which are psted below −
Sr. No | Data Sources |
---|---|
1 | Spark SQL can automatically capture the schema of a JSON dataset and load it as a DataFrame. |
2 | Hive comes bundled with the Spark pbrary as HiveContext, which inherits from SQLContext. |
3 | Parquet is a columnar format, supported by many data processing systems. |