Relational Database Primary Keys

Microsoft Access Articles

GUI Design

Resources

Online Shop

Info

Primary Keys

The essential job of a database table's primary key is to uniquely identify the rows in the table -- nothing more, nothing less. Each primary key value must be unique within a table so the database engine can tell the difference between rows. The same primary key value may appear in another table, but you can't duplicate it within a table. And, the primary key can't be null because the database engine requires a value to locate the record.

The second major job of the primary key is to provide a "hook" for creating table relationships. A primary key field is related to a corresponding field, the foreign key, in another table. The primary key and foreign key values are the same in records joined through a relationship.

A database table's primary key is always indexed. There is no exception to this rule. You must index the primary key so the database engine can quickly locate rows based on the key's value. The advice that indexing a primary key is optional is completely wrong.

A Replication ID makes a terrible choice as a primary key. Access provides the Replication ID data type to uniquely identify records in a replicated database. A Replication ID (which is actually a GUID, or globally unique ID) is guaranteed to be unique in the universe. You generate a GUID from a number of different bits of information, including (among others) the computer's name, the date and time, and the free byte count on the C: drive.

GUIDs are hard to read. Can you quickly tell whether {8EC7B8E3-3B51-4C0C-8481-7AEEF9074EBE} and {8EC7B8E3-3B51-4C0C-8481-7AEEF9074FBE} are the same value? (They aren't.) But, you can quickly and easily tell that 7817 and 7819 are different. GUIDs are hard to work with as primary keys because they're difficult for humans to compare.

Also, a Replication ID value occupies 16 bytes of memory and disk space, while a long integer is only 4 bytes long. Generally, smaller is better because the computer can handle 4 bytes a lot faster than it can handle a 16 byte Replication ID. Modern Pentium-class computers running 32-bit Windows can fetch an entire long integer value in a single process cycle, whereas at least four cycles are required to retrieve a Replication ID value. Machine cycles add up, and slow down an application's performance.

The comment that a Replication ID makes the ideal primary key choice because "the field will be automatically set to be used as an ID" isn't accurate. I don't see any advantage to using a Replication ID data type over the alternative long integer data type.

Microsoft provides the Auto-Number data type in Access databases (and the Identity attribute in SQL Server) so you have a convenient and efficient primary key for your tables.

I know some people say you should always use natural keys (Social Security Number, Employee ID, etc.) as primary keys, but I always use AutoNumbers or Identity fields as my primary keys. Tables are composed of design and data elements. The data is self-evident. The design includes the field names, indexes, constraints (such as validation rules), and key assignments. In my opinion, the choice of the primary key is a design issue, and has nothing to do with data. There is never a need for users to see the primary key. It's there to uniquely identify records and provide an anchor for table relationships.

Long integer AutoNumbers make the ideal primary key because they're automatic, can't be changed, and are guaranteed to be unique within a table.