Teradata Basics
- Teradata - SubQueries
- Teradata - Joins
- Teradata - Primary Index
- Teradata - CASE & COALESCE
- Teradata - Aggregate Functions
- Teradata - Built-in Functions
- Teradata - Date/Time Functions
- Teradata - String Manipulation
- Teradata - SET Operators
- Logical & Conditional Operators
- Teradata - SELECT Statement
- Teradata - Data Manipulation
- Teradata - Tables
- Teradata - Data Types
- Teradata - Relational Concepts
- Teradata - Architecture
- Teradata - Installation
- Teradata - Introduction
Teradata Advanced
- Teradata - BTEQ
- Teradata - FastExport
- Teradata - MultiLoad
- Teradata - FastLoad
- Teradata - Performance Tuning
- Teradata - User Management
- Teradata - Data Protection
- Teradata - OLAP Functions
- Teradata - Partitioned Primary Index
- Teradata - JOIN Strategies
- Teradata - Stored Procedure
- Teradata - Macros
- Teradata - Views
- Teradata - Join Index
- Teradata - Hashing Algorithm
- Teradata - Explain
- Teradata - Compression
- Teradata - Statistics
- Teradata - Secondary Index
- Teradata - Space Concepts
- Teradata - Table Types
Teradata Useful Resources
- Teradata - Discussion
- Teradata - Useful Resources
- Teradata - Quick Guide
- Teradata - Questions & Answers
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Teradata - Partitioned Primary Index
Partitioned Primary Index (PPI) is an indexing mechanism that is useful in improving the performance of certain queries. When rows are inserted into a table, they are stored in an AMP and arranged by their row hash order. When a table is defined with PPI, the rows are sorted by their partition number. Within each partition, they are arranged by their row hash. Rows are assigned to a partition based on the partition expression defined.
Advantages
Avoid full table scan for certain queries.
Avoid using secondary index that requires additional physical structure and additional I/O maintenance.
Access a subset of a large table quickly.
Drop the old data quickly and add new data.
Example
Consider the following Orders table with Primary Index on OrderNo.
StoreNo | OrderNo | OrderDate | OrderTotal |
---|---|---|---|
101 | 7501 | 2015-10-01 | 900 |
101 | 7502 | 2015-10-02 | 1,200 |
102 | 7503 | 2015-10-02 | 3,000 |
102 | 7504 | 2015-10-03 | 2,454 |
101 | 7505 | 2015-10-03 | 1201 |
103 | 7506 | 2015-10-04 | 2,454 |
101 | 7507 | 2015-10-05 | 1201 |
101 | 7508 | 2015-10-05 | 1201 |
Assume that the records are distributed between AMPs as shown in the following tables. Recorded are stored in AMPs, sorted based on their row hash.
RowHash | OrderNo | OrderDate |
---|---|---|
1 | 7505 | 2015-10-03 |
2 | 7504 | 2015-10-03 |
3 | 7501 | 2015-10-01 |
4 | 7508 | 2015-10-05 |
RowHash | OrderNo | OrderDate |
---|---|---|
1 | 7507 | 2015-10-05 |
2 | 7502 | 2015-10-02 |
3 | 7506 | 2015-10-04 |
4 | 7503 | 2015-10-02 |
If you run a query to extract the orders for a particular date, then the optimizer may choose to use Full Table Scan, then all the records within the AMP may be accessed. To avoid this, you can define the order date as Partitioned Primary Index. When rows are inserted into orders table, they are partitioned by the order date. Within each partition they will be ordered by their row hash.
The following data shows how the records will be stored in AMPs, if they are partitioned by Order Date. If a query is run to access the records by Order Date, then only the partition that contains the records for that particular order will be accessed.
Partition | RowHash | OrderNo | OrderDate |
---|---|---|---|
0 | 3 | 7501 | 2015-10-01 |
1 | 1 | 7505 | 2015-10-03 |
1 | 2 | 7504 | 2015-10-03 |
2 | 4 | 7508 | 2015-10-05 |
Partition | RowHash | OrderNo | OrderDate |
---|---|---|---|
0 | 2 | 7502 | 2015-10-02 |
0 | 4 | 7503 | 2015-10-02 |
1 | 3 | 7506 | 2015-10-04 |
2 | 1 | 7507 | 2015-10-05 |
Following is an example to create a table with partition primary Index. PARTITION BY clause is used to define the partition.
CREATE SET TABLE Orders ( StoreNo SMALLINT, OrderNo INTEGER, OrderDate DATE FORMAT YYYY-MM-DD , OrderTotal INTEGER ) PRIMARY INDEX(OrderNo) PARTITION BY RANGE_N ( OrderDate BETWEEN DATE 2010-01-01 AND 2016-12-31 EACH INTERVAL 1 DAY );
In the above example, the table is partitioned by OrderDate column. There will be one separate partition for each day.
Advertisements