databasedev.co.uk - database solutions and downloads for microsoft access

Relational Database Design

A simple Database Normalization Primer

A poorly designed database can cause many problems to your application. Redundancy, inaccuracy, consistency problems are just some of the things that you might experience due to bad database design.

Normalisation is the process used to reduce, if not eliminate, these problems with your data. This simple primer will give you an insight into normalising a model to 3NF (Third Normal Form)

1NF (First Normal Forms):

Requires that there be no multi-valued attributes, and that there be no repeating groups. A multi-valued attribute would contain more than one value for that field in each row.

Now take a look at the following example table:

StudentID Course
11111 Word, Excel, Access
22222 PowerPoint, Outlook, Project

In the above table, the Course field is a multi-valued attribute. There is not a single value for each field.

Now take a look at the revised table:

StudentID Course 1 Course 2 Course 3
11111 Word Excel Access
22222 PowerPoint Outlook Project

This displays Course 1, Course 2 and Course 3 fields as repeating groups.

The following table displays how to correctly store this data, and where 1NF is satisfied:

StudentID Course
11111 Word
11111 Excel
11111 Access
22222 PowerPoint
22222 Outlook
22222 Project

The first two tables would make selecting a student enrolled on a certain course quite difficult. If we wish to query the data and find out the students enrolled on the PowerPoint course we would need to do the following:

  1. Design 1: You would need to pull out all of the course data and parse it somehow.
  2. Design 2: You would need to check three different fields to find the PowerPoint course.
  3. Design 3: With the correct design, you can simply create a SELECT statement such as: SELECT StudentID FROM StudentCourses WHERE Course = "PowerPoint"

2NF (Second Normal Forms):

Requires that any non-key field be dependant on the entire key. Consider this with the following example of the StudentCourses table, where we have made the StudentID and the CourseID a compound Primary Key:

StudentID CourseID StudentName CourseLocation Grade
11111 Word John Workshop 1 A
11111 Excel John Workshop 2 B

The StudentName field does not depend on the CourseID field, only on the StudentID field. The CourseLocation field is also only dependant on the CourseID field.

The data needs to be split into the following table designs:

tblStudents -

StudentID StudentName
11111 John

tblCourses -

CourseID CourseLocation
Word Workshop 1
Excel Workshop 2

tblStudentCourses -

StudentID CourseID Grade
11111 Word A
11111 Excel B

In the above example, the Grade field is the only dependant needing the combination of the StudentID and CourseID fields.

If we look at the first table, tblStudents and imagine that the StudentName contained a typo error, for example “Jonh”. If we applied the following SQL statement:

DELETE FROM tblStudents WHERE StudentName = "John"

Because we have erroneous data (“Jonh”) this row will not be deleted. If we try the following in the final design, then we will see that every course that John was taking will be deleted by using the ID field:

DELETE FROM tblStudentCourses WHERE StudentID = 11111

3NF (Third Normal Forms):

Third Normal form prohibits transitive dependencies. A transitive dependency exists when any attribute in a table is dependant upon any other non-key attribute in that table.

Let’s take a look at the following example of the CoursesSections table:

CourseID Section TeacherID TeacherName
Word 1 TA12345 Simon
Excel 1 TA12345 Simon

The CourseID and the Section of the Course uniquely identify the teacher. However, TeacherName depends on TeacherID and has no relation to CourseID or Section.

This data should be correctly stored as follows:

tblTeachers -

TeacherID TeacherName
TA12345 Simon

tblCourseSections -

CourseID Section TeacherID
Word 1 TA12345
Excel 1 TA12345

Splitting this data into the two tables removes the transitive dependency.