A poorly designed database can cause many problems to your application. Redundancy, inaccuracy, consistency problems are just some of the things that you might experience due to bad database design.
Normalisation is the process used to reduce, if not eliminate, these problems with your data. This simple primer will give you an insight into normalising a model to 3NF (Third Normal Form)
Requires that there be no multi-valued attributes, and that there be no repeating groups. A multi-valued attribute would contain more than one value for that field in each row.
Now take a look at the following example table:
StudentID | Course |
---|---|
11111 | Word, Excel, Access |
22222 | PowerPoint, Outlook, Project |
In the above table, the Course field is a multi-valued attribute. There is not a single value for each field.
Now take a look at the revised table:
StudentID | Course 1 | Course 2 | Course 3 |
---|---|---|---|
11111 | Word | Excel | Access |
22222 | PowerPoint | Outlook | Project |
This displays Course 1, Course 2 and Course 3 fields as repeating groups.
The following table displays how to correctly store this data, and where 1NF is satisfied:
StudentID | Course |
---|---|
11111 | Word |
11111 | Excel |
11111 | Access |
22222 | PowerPoint |
22222 | Outlook |
22222 | Project |
The first two tables would make selecting a student enrolled on a certain course quite difficult. If we wish to query the data and find out the students enrolled on the PowerPoint course we would need to do the following:
Requires that any non-key field be dependant on the entire key. Consider this with the following example of the StudentCourses table, where we have made the StudentID and the CourseID a compound Primary Key:
StudentID | CourseID | StudentName | CourseLocation | Grade |
---|---|---|---|---|
11111 | Word | John | Workshop 1 | A |
11111 | Excel | John | Workshop 2 | B |
The StudentName field does not depend on the CourseID field, only on the StudentID field. The CourseLocation field is also only dependant on the CourseID field.
The data needs to be split into the following table designs:
tblStudents -
StudentID | StudentName |
---|---|
11111 | John |
tblCourses -
CourseID | CourseLocation |
---|---|
Word | Workshop 1 |
Excel | Workshop 2 |
tblStudentCourses -
StudentID | CourseID | Grade |
---|---|---|
11111 | Word | A |
11111 | Excel | B |
In the above example, the Grade field is the only dependant needing the combination of the StudentID and CourseID fields.
If we look at the first table, tblStudents and imagine that the StudentName contained a typo error, for example “Jonh”. If we applied the following SQL statement:
DELETE FROM tblStudents WHERE StudentName = "John"
Because we have erroneous data (“Jonh”) this row will not be deleted. If we try the following in the final design, then we will see that every course that John was taking will be deleted by using the ID field:
DELETE FROM tblStudentCourses WHERE StudentID = 11111
Third Normal form prohibits transitive dependencies. A transitive dependency exists when any attribute in a table is dependant upon any other non-key attribute in that table.
Let’s take a look at the following example of the CoursesSections table:
CourseID | Section | TeacherID | TeacherName |
---|---|---|---|
Word | 1 | TA12345 | Simon |
Excel | 1 | TA12345 | Simon |
The CourseID and the Section of the Course uniquely identify the teacher. However, TeacherName depends on TeacherID and has no relation to CourseID or Section.
This data should be correctly stored as follows:
tblTeachers -
TeacherID | TeacherName |
---|---|
TA12345 | Simon |
tblCourseSections -
CourseID | Section | TeacherID |
---|---|---|
Word | 1 | TA12345 |
Excel | 1 | TA12345 |
Splitting this data into the two tables removes the transitive dependency.