databasedev.co.uk - database solutions and downloads for microsoft access

Introducing Higher Normal Forms

Are We There Yet?

Designing your tables to comply with 3NF (third normal form) is usually sufficient to ensure good design so, most of the time, you can stop right here. The higher normal forms address less common data problems. They are included here so you'll know what they are and what to do about them if you come across them. These normal forms do get a bit more complicated.

The truth is there are times when you will want to denormalize your data. That means you may sometimes want to put the data in two normalized tables back into one denormalized table. The reasons for doing this are usually associated with performance, for example, the speed at which queries run. But at least it will be a conscious decision and represents one of the beauties of normalization. Properly normalized tables can always be put back together with no loss or gain of data.

Higher Normal Forms - Boyce-Codd Normal Form:

A table is in third normal form (3NF) and all determinants are candidate keys.

Boyce-Codd normal form (BCNF) can be thought of as a "new" third normal form. It was introduced to cover situations that the "old" third normal form did not address. Keep in mind the mean of a determinant (determines the value in another field) and candidate keys (qualify for designation as primary key). This normal form applies to situations where you have overlapping candidate keys.

If a table has no non-key fields, it is automatically in BCNF.

A Practical Approach

Look for potential problems in updating existing data (modification anomaly) and in entering new data (insertion anomaly).

Imagine that you were designing a table for a college to hold information about courses, students, and teaching assistants. You have the following business rules.

  • Each course can have many students.
  • Each student can take many courses.
  • Each course can have multiple teaching assistants (TAs).
  • Each TA is associated with only one course.
  • For each course, each student has one TA.

Some sample data:

COURSES_STUDENTS_TA's
CourseNum Student TA
ENG101 Jones Clark
ENG101 Grayson Chen
ENG101 Samara Chen
MAT350 Grayson Powers
MAT350 Jones O'Shea
MAT350 Berg Powers

To uniquely identify each record, you could choose CourseNum + Student as a primary key. This would satisfy third normal form also because the combination of CourseNum and Student determines the value in TA. Another candidate key would be Student + TA. In this case, you have overlapping candidate keys (Student is in both). The second choice, however, would not comply with third normal form because the CourseNum is not determined by the combination of Student and TA; it only depends on the value in TA (see the business rules). This is the situation that Boyce-Codd normal form addresses; the combination of Student + TA could not be considered to be a candidate key.

If you wanted to assign a TA to a course before any students enrolled, you couldn't because Student is part of the primary key. Also, if the name of a TA changed, you would have to update it in multiple records.

If you assume you have just these fields, this data would be better stored in three tables: one with CourseNum and Student, another with Student and TA, and a third with CourseNum and TA.

COURSES
*CourseNum *Student
ENG101 Jones
ENG101 Grayson
ENG101 Samara
MAT350 Grayson
MAT350 Jones
MAT350 Berg
STUDENTS
*Student *TA
Jones Clark
Grayson Chen
Samara Chen
Grayson Powers
Jones O'Shea
Berg Powers
TA's
*CourseNum *TA
ENG101 Clark
ENG101 Chen
MAT350 O'Shea
MAT350 Powers

Above, showing tables that comply with BCNF

Fourth Normal Form:

A table is in Boyce-Codd normal form (BCNF) and there are no multi-valued dependencies.

A multi-valued dependency occurs when, for each value in field A, there is a set of values for field B and a set of values for field C but fields B and C are not related.

A Practical Approach

Look for repeated or null values in non-key fields.

A multi-valued dependency occurs when the table contains fields that are not logically related. An often used example is the following table:

MOVIES
*Movie *Star *Producer
Once Upon a Time Julie Garland Alfred Brown
Once Upon a Time Mickey Rooney Alfred Brown
Once Upon a Time Julie Garland Muriel Humphreys
Once Upon a Time Mickey Rooney Muriel Humphreys
Moonlight Humphrey Bogart Alfred Brown
Moonlight Julie Garland Alfred Brown

A movie can have more than one star and more than one producer. A star can be in more than one movie. A producer can produce more than one movie. The primary key would have to include all three fields and so this table would be in BCNF. But you have unnecessarily repeated values, with the data maintenance problems that causes, and you would have trouble with deletion anomalies.

The Star and the Producer really aren't logically related. The Movie determines the Star and the Movie determines the Producer. The answer is to have a separate table for each of those logical relationships - one holding Movie and Star and the other with Movie and Producer, as shown below:

STARS
*Movie *Star
Once Upon a Time Julie Garland
Once Upon a Time Mickey Rooney
Moonlight Humphrey Bogart
Moonlight Julie Garland
PRODUCERS
*Movie *Producer
Once Upon a Time Alfred Brown
Once Upon a Time Muriel Humphreys
Moonlight Alfred Brown

Above, showing tables that comply with 4NF

Below is another example of a common design error, and it's easily spotted by all the missing or blank values.

PROJECTS_EQUIPMENT
DeptCode ProjectNum ProjectMgrID Equipment PropertyID
IS 36-272-TC EN1-15 CD-ROM 657
IS     VGA desktop monitor 305
AC 35-152-TC EN1-15    
AC     Dot-matrix printer 358
AC     Calculator with tape 239
TW 30-452-T3 EN1-10 486 PC 275
TW 30-457-T3 EN1-15    
TW 31-124-T3 EN1-15 Laser printer 109
TW 31-238-TC EN1-15 Hand-held scanner 479
RI     Fax machine 775
MK     Laser printer 858
MK     Answering machine 187
TW 31-241-TC EN1-15 Standard 19200 bps modem 386
SL     486 Laptop PC 772
SL     Electronic notebook 458

Above, a table with many null values (note: it also does not comply with 3NF and BCNF)

It's the same problem here because not all of the data is logically related. As usual, the answer is more tables - one to hold the information on the equipment assigned to departments (with PropertyID as the primary key) and another with projects and departments. You'd have to know the business rules to know whether a project might involve more than one department or manager and be able to figure out the primary key. Assuming a project can have only one manager and be associated with only one department, the tables would be as follows.

EQUIPMENT
*PropertyID Equipment DeptCode
657 CD-ROM IS
305 VGA desktop monitor IS
358 Dot-matrix printer AC
239 Calculator with tape AC
275 486 PC TW
109 Laser printer TW
479 Hand-held scanner TW
775 Fax machine RI
858 Laser printer MK
187 Answering machine MK
386 Standard 19200 bps modem TW
772 486 Laptop PC SL
458 Electronic notebook SL
PROJECTS_EQUIPMENT
*ProjectNum ProjectMgrID DeptCode
36-272-TC EN1-15 IS
35-152-TC EN1-15 AC
30-452-T3 EN1-10 TW
30-457-T3 EN1-15 TW
31-124-T3 EN1-15 TW
31-238-TC EN1-15 TW
31-241-TC EN1-15 TW

Above, tables that eliminate the null values and comply with 4NF

Fifth Normal Form:

A table is in fourth normal form (4NF) and there are no cyclic dependencies.

A cyclic dependency can occur only when you have a multi-field primary key consisting of three or more fields. For example, let's say your primary key consists of fields A, B, and C. A cyclic dependency would arise if the values in those fields were related in pairs of A and B, B and C, and A and C.

Fifth normal form is also called projection-join normal form. A projection is a new table holding a subset of fields from an original table. When properly formed projections are joined, they must result in the same set of data that was contained in the original table.

A Practical Approach

Look for the number of records that will have to be added or maintained

Following is some sample data about buyers, the products they buy, and the companies they buy from on behalf of MegaMall, a large department store.

BUYING
*Buyer *Product *Company
Chris jeans Levi
Chris jeans Wrangler
Chris shirts Levi
Lori jeans Levi

Above, a table with cyclic dependencies

The primary key consists of all three fields. One data maintenance problem that occurs is that you need to add a record for every buyer who buys a product for every company that makes that product or they can't buy from them. That may not appear to be a big deal in this sample of 2 buyers, 2 products, and 2 companies (2 X 2 X 2 = 8 total records). But what if you went to 20 buyers, 50 products, and 100 companies (20 X 50 X 100 = 100,000 potential records)? It quickly gets out of hand and becomes impossible to maintain.

You might be tempted to solve this by dividing this into the following two tables.

BUYERS
*Buyer *Product
Chris jeans
Chris shirts
Lori jeans
PRODUCTS
*Product *Company
jeans Wrangler
jeans Levi
shirts Levi

However, if you joined the two tables above on the Product field, it would produce a record not part of the original data set (it would say that Lori buys jeans from Wrangler). This is where the projection-join concept comes in.

The correct solution would be three tables:

BUYERS
*Buyer *Product
Chris jeans
Chris shirts
Lori jeans
PRODUCTS
*Product *Company
jeans Wrangler
jeans Levi
shirts Levi
COMPANIES
*Buyer *Company
Chris Levi
Chris Wrangler
Lori Levi

Above, tables that comply with 5NF

When the first two tables are joined by Product and the result joined to the third table by Buyer and Company, the result is the original set of data.

In our scenario of 20 buyers, 50 products, and 100 companies, you would have, at most, 1000 records in the Buyers table (20 X 50), 5000 records in the Products table (50 X 100), and 2000 records in the Companies table (20 X 100). With a maximum of 8000 records, these tables would be much easier to maintain than the possible 100,000 records we saw earlier.