Database Systems: The Complete Book (2nd Edition)

Category: Programming
Author: Jennifer Widom
This Month Stack Overflow 1


by anonymous   2019-01-13

There is some lack of precision in the definitions of primary & secondary indices.

Using two popular university texts as reference:

Fundamentals of Database Systems, Elmasri & Navathe defines them as:

A primary index as an index on an ordered file where the search key is the same as the sort key

A secondary index provides a secondary means of accessing a data file for which some primary access already exists. The data file records could be ordered, unordered, or hashed.

Database Systems: The Complete Book, Garcia-Molina et. al defines them as:

A primary index determines the location of the records of the data file

The secondary index is distinguished from the primary index in that a secondary index does not determine the placement of records in the data file. Rather, the secondary index tells us the current locations of records; that location may have been decided by a primary index on some other field

Some properties that hold true for either definition above:

  • primary keys can be primary indices
  • there can be at most 1 primary index per table
  • primary indices uniquely determine where a record is kept in physical storage.
  • All other indices are classified as secondary.

However, if the placement of records in the data file is not determined by any field, then a primary index cannot be constructed.

Thus for sorted files, it makes sense to talk about the primary index (which would be the list of fields upon which the sorting is based). I can't find other examples of physical file structures where a primary index can be constructed.

Postgresql utilizes a heap structure for the physical storage for records. Heaps are not sorted (pun alert: they're sorta sorted). Therefore, even the primary keys are implemented using secondary indices, and as such all indices in Postgresql are secondary.

Other RDBMS Systems do implement storage formats that support primary indices:

  • Mysql's InnoDB calls this clustered index
  • MSSQL also refer to the primay index as a clustered index
  • Oracle calls this index-organized tables

The language in the Postgres Documentation is imprecise.

All indexes in PostgreSQL are secondary indexes

This is true.

meaning that each index is stored separately from the table's main data area

This is not why all indices are secondary in Postgresql. Primary indices may also be stored separately from the table's main data area.

by saketuec   2018-11-12