Here is a review of the 6 forms of normalisation (yes the 4th is called 3.5 for some reason) – there are lots of summaries on the web of these, but I wanted to re-write them in my own words to ensure I understand them fully. Revision of these post real world experience is particularly enlightening.
Normalisation rules are cumulative ie a database or set of tables can be said to be 1 through to 5 compliant. To rank at the highest level it must comply with all previous forms.
That is to say that a database (or table) to be in the 2nd Normal Form must first fulfill all the criteria of the 1st Normal Form.
A database is said to be properly normalised if the first three normalised forms are inviolate.
Violations of normalisation forms subsequent to the third normal form may be legitimately required and as such are NOT absolutely required for a database to be classified as properly normalised.
Some web explanations do not list forms post 3 I would argue there is still a lot of value in trying to adhere to these less well forms but must admit to not always including them in design usually due to poor knowledge of the application domain.
The differing forms of normalisation start simple becoming increasingly complicated.
If you have a lot of experience in creating relational backends you may have deduced these issues for yourself.
(1NF) First Normal Form – Edgar F Codd – first stated 1970
Eliminate duplicate columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
(2NF) Second Normal Form – Edgar F Codd – first stated 1971
Create relationships between these new tables and their predecessors through the use of foreign keys.
(3NF) Third Normal Form – Edgar F Codd – first stated 1971
Remove columns that are not dependent on the primary key.
My example take on this is that calculated columns should NOT be included in tables
So here imagine you have an invoices table with the following
So according to the first and second normalisation forms there are no duplicated forms and the FKID customer no shows that there are relationships. However a Total figure can be calculated by multiplying unit price by quantity and as such violates the 3rd Normal Form.
The new table would resemble the following
Total is not stored and would be calculated on the form. It is noted that sometimes you wish to store things like titles because for instance unit prices may vary over time and you don’t record want to loose the overall total by updating unit price – especially unit price is referenced from another table. A similar argument relates to things like VAT which can be changed on a yearly basis. In such a case I would argue that you still don’t break the 3NF as the value of column is not dependent on the primary key it is dependent on the value of the VAT rate at any given time.
The 3.5 Normal Form or the Boyce Codd Normal Form
This is a slightly stronger version of the third normal form developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomaly not dealt with by the 3rd Normal form as originally defined.
Only in rare cases does a 3rd Normal Form table not meet the requirements of Boyd Codd Normal Form. A 3rd Normal Form table which does not have multiple overlapping candidate keys is guaranteed to be in Boyce Codd Normal Form.
Web investigation indicates that the most concise description is;
Every determinant must be a candidate key
and that Boyd Codd Normalised Form is not always possible
Person Shop Type Nearest Shop
Mark Jewellers H Samuel
Mark Barbers Mr Man
Natalia Coffee Shop Starbucks
John Coffee Shop Costa
John Barbers Mr Man
John Jewellers H Samuel
The issue above is that Mr Man on record 2 could be changed to a Hairdressers for shop type but the second last record John would not change and as such it would seem to be inconsistent. In this case I would split this table into two with Nearest Shop and Shop Type being linked and there being some kind of table that links People to Nearest shops.
The 4th Normal Form
A table is in the 4th Normal Form if it has no multi-valued dependencies.
Definition: Multivalued dependencies occur when the presence of one or more rows in a table implies the presence of one or more other rows in that same table.
For example lets say you have a you have a shop selling coffee – lets call it Starbucks and you have a table that lists the permutations of cofee sold in its store
CoffeeShop Coffee Product Location
Starbucks Café Late Livingston
Starbucks Americano Livingston
Starbucks Café Late Edinburgh
Starbucks Americano Edinburgh
If we know that all Starbucks sell the same products if there is a new product in one shop then we would expect that new product to be in the other shops so adding a Skinny Late to the Livingston shop implies that there should be another record for Skinny Lates in the Edinburgh shop.
To eliminate the possibility of anomalies we must place the facts about the products and the location of the shops into two tables
Starbucks Café Late
Starbucks Skinny Late
Location of Coffee shops
Alternatively if the coffee types did vary from location to location the original table would satisfy 4NF
The 5th Normal Form
A table is in fifth normal form or Project-Join Normal Form if it is in the Fourth Normal Form and it cannot have a loss-less decomposition into any number of smaller tables.
Or as I like to call it the – I know about it and try to adhere to it but sometimes I’m a bit lazy form.
I certainly try to keep my designs to the full six normalisation principles but there are instances where certain tables do not comply.