Database Normalisation – a review of the rules to refresh my knowledge

Here is a review of the 6 forms of normalisation (yes the 4th is called 3.5 for some reason) – there are lots of summaries on the web of these, but I wanted to re-write them in my own words to ensure I understand them fully. Revision of these post real world experience is particularly enlightening.

Normalisation rules are cumulative ie a database or set of tables can be said to be 1 through to 5 compliant. To rank at the highest level it must comply with all previous forms.

That is to say that a database (or table) to be in the 2nd Normal Form must first fulfill all the criteria of the 1st Normal Form.

A database is said to be properly normalised if the first three normalised forms are inviolate.

Violations of normalisation forms subsequent to the third normal form may be legitimately required and as such are NOT absolutely required for a database to be classified as properly normalised.

Some web explanations do not list forms post 3 I would argue there is still a lot of value in trying to adhere to these less well forms but must admit to not always including them in design usually due to poor knowledge of the application domain.

The differing forms of normalisation start simple becoming increasingly complicated.
If you have a lot of experience in creating relational backends you may have deduced these issues for yourself.

(1NF) First Normal Form – Edgar F Codd – first stated 1970
Eliminate duplicate columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

(2NF) Second Normal Form – Edgar F Codd – first stated 1971
No partial dependency on anything but the key fields

(3NF) Third Normal Form – Edgar F Codd – first stated 1971
Every non prime attribute of the primary key is non transitively dependent on every key of R. The example I saw was that if you had winners of wimbledon some of which repeated you would not store there birth dates. You would have a junction table of the winners and link to the names of the individuals and it would be in this table that you would store the dates of birth.

The 3.5 Normal Form or the Boyce Codd Normal Form
This is a slightly stronger version of the third normal form developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomaly not dealt with by the 3rd Normal form as originally defined.

Only in rare cases does a 3rd Normal Form table not meet the requirements of Boyd Codd Normal Form. A 3rd Normal Form table which does not have multiple overlapping candidate keys is guaranteed to be in Boyce Codd Normal Form.

Web investigation indicates that the most concise description is;
Every determinant must be a candidate key

and that Boyd Codd Normalised Form is not always possible

Nearest Shops
Person Shop Type Nearest Shop
Mark Jewellers H Samuel
Mark Barbers Mr Man
Natalia Coffee Shop Starbucks
John Coffee Shop Costa
John Barbers Mr Man
John Jewellers H Samuel

The issue above is that Mr Man on record 2 could be changed to a Hairdressers for shop type but the second last record John would not change and as such it would seem to be inconsistent. In this case I would split this table into two with Nearest Shop and Shop Type being linked and there being some kind of table that links People to Nearest shops.

The 4th Normal Form
A table is in the 4th Normal Form if it has no multi-valued dependencies.
Definition: Multivalued dependencies occur when the presence of one or more rows in a table implies the presence of one or more other rows in that same table.

For example lets say you have a you have a shop selling coffee – lets call it Starbucks and you have a table that lists the permutations of cofee sold in its store
CoffeeShop Coffee Product Location
Starbucks Café Late Livingston
Starbucks Americano Livingston
Starbucks Café Late Edinburgh
Starbucks Americano Edinburgh

If we know that all Starbucks sell the same products if there is a new product in one shop then we would expect that new product to be in the other shops so adding a Skinny Late to the Livingston shop implies that there should be another record for Skinny Lates in the Edinburgh shop.

To eliminate the possibility of anomalies we must place the facts about the products and the location of the shops into two tables

Products
Starbucks Café Late
Starbucks Americano
Starbucks Skinny Late

Location of Coffee shops
Starbucks Livingston
Starbucks Edinburgh

Alternatively if the coffee types did vary from location to location the original table would satisfy 4NF

The 5th Normal Form
A table is in fifth normal form or Project-Join Normal Form if it is in the Fourth Normal Form and it cannot have a loss-less decomposition into any number of smaller tables.

Or as I like to call it the – I know about it and try to adhere to it but sometimes I’m a bit lazy form.

I certainly try to keep my designs to the full six normalisation principles but there are instances where certain tables do not comply.

Predictable URL design

Being-predictable-allows-others-to-control-youAs we move to more and more web based applications it is going to be more and more important for developers to really think about how users access their web applications. Individual pages or search options may be more important than the simple landing pages. Particularly if you are wanting to get customers and users to be continually coming back to your site. Often the home page is very much a preamble that is likely to be skipped as quickly as possible so maybe if you have a single purpose site you want to have the search right up at the front on that landing page – alternatively if you have multiple applications essentially combined as part of a domain each search page needs to have a lot of care and attention to allow the user to get to his required information as quickly as possible.

In many cases it is particularly useful to have URLs to which can be passed parameters – I have already talked about my use of Google Maps in this respect but the list of sites that have predictable URLs that can be manipulated objectively with outside non-key referenced attributes reads as a list of the A to Z of some of the best companies on the web.

Examples as follows.

AMAZON
Linlithgow can be swapped out for anything and you can go straight to a list of the books you are interested in. Here’s an example of three different search strings Linlithgow, Quantum Mechanics and Programming.
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Ddigital-text&field-keywords=Linlithgow

http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Ddigital-text&field-keywords=Quantum_Mechanics

http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Ddigital-text&field-keywords=Programming

EBAY
The structure can change which is fine, as long as it is predictable and regular, here swapping two locations of the parameter “television” returns items for sale.
http://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2050601.m570.l1313.TR0.TRC0.H0.Xtelevision&_nkw=television&_sacat=0

S1 jobs
Some sites will require links that have some knowledge of the parameters before hand. S1 jobs usually build the search string from a dynamic drop down list.
http://www.s1jobs.com/jobs/aberdeen-aberdeenshire/find?a_or_e=employer&a_or_e=agency&publicsector=publicsector&onlyshowme=datesmart

http://www.s1jobs.com/jobs/edinburgh-and-lothians/find?a_or_e=employer&a_or_e=agency&publicsector=publicsector&onlyshowme=datesmart

In such circumstances a developer will be able to directly link by noting the parameters and making the same parameters available within their application that builds the link url. (example parameters include fife / edinburgh-and-lothians/ inverness /glasgow-dunbartonshire )

Train Line
Most search filters are time specific a good design default is to assume that the user is looking for current information.

http://www.thetrainline.com/live/departures/edinburgh-to-london-kings-cross

http://www.thetrainline.com/live/departures/london-kings-cross-to-edinburgh

Sky Scanner
Certain applications actively require dates – no problem.
http://www.skyscanner.net/transport/flights/edi/mosc/150101/150102/airfares-from-edinburgh-to-moscow-in-january-2015.html?rtn=1

http://www.skyscanner.net/transport/flights/mosc/edi/150101/140102/airfares-from-moscow-to-edinburgh-in-january-2015.html?rtn=1

Air B n B
Has a really nice implementation that is really easy to manipulate – Brevity in a URL is extremely nice thing to have.
https://www.airbnb.co.uk/s/Edinburgh–United-Kingdom?source=bb

https://www.airbnb.co.uk/s/St-Petersburg–Russia?source=bb

https://www.airbnb.co.uk/s/St-Petersburg–America?source=bb

Compare this with something like www.jobs.co.uk – a nice site which will not accept parameters from outside of it.
www.jobs.co.uk

Unfortunately within the UK public bodies just have not reached this level of complication possibly because they often have a much wider remit. I did a brief search and was unable to find an example although I expect there to be some out there.

Graphics vs Text

khmerconSo today it is a bit slow for me so I answered a question from a lad in Phnom Phen in Stack Overflow. He was looking for support for Khmer in MS Access 2013, the official language of Cambodia. He has a database but unfortunately he can’t get records to sort alphabetically for Khmer text. I’m not sure I found a solution I made some suggestions. I’m always really interested in questions like that because a solution for him would be the kind of thing that would be useful to anyone able to write Khmer which is presumably millions. I had a look at the alphabet – wow – computer fonts seem to do a terrible job of displaying it reducing it to the point where at a similar scale to other fonts I need a magnifying glass to distinguish differing letters. I guess you just have to choose larger font sizes but it was tiny at the scales I was seeing examples. It made me wonder how much cultures with difficult or tricky languages are disadvantaged in day to day commerce. Obviously to me their alphabet is only pictures.

But then in reality that is only what all alphabets are.

I always think on this when people say we need more pictures and icons in the UI

OBJECT NAMING CONVENTIONS – My UI Design Patterns

I like to do several things.

Name my tables T001TextVariableInCamelCase incrementing the number
Name my queries Q001TextVariableInCamelCase incrementing the number
Name my forms F001TextVariableInCamelCase incrementing the number

This works well for me – some of my first applications only had 2 numeral positions. I now give new projects 3 digits as the more successful applications generally have more than 100 forms and queries and I like to have everything in order. I have yet to come close to more than 200 forms in an application so I feel confident that 999 is satisfactory. Of course its not a major problem if I go over 999 forms it just doesn’t order so nicely in the database window which tweaks my OCD.

All names have no spaces and are in camel case.

I like to have the forms tagged with this incremental Form Prefix – eg F001 or in the picture below F103. When you have lots of forms it is a lot easier to refer to forms by this unique key. Additionally if you are speaking to a user over the phone you can immediately identify the form that they are on that they may wish additional features on or on which they have found bugs.

These pre-fix conventions are only an implementation of the principles of database design. They are effectively primary keys for differing objects. If I was in the access team I would probably hard wire this feature into the IDE so that code referred to the primary key and text descriptions could be changed retrospectively without breaking links and if possible forms would start with the PKID placed on them. (maybe with the feature available to switch this off)

HIDE MENUS – My UI Design Patterns

Really simple – create a new module and create the following simple functions.

I usually name them TurnMenuOn and TurnMenuOff.

Public Function TurnMenuOn()

   Application.CommandBars("Menu Bar").Enabled = True

End Function

Public Function TurnMenuOff()

   Application.CommandBars("Menu Bar").Enabled = False

End Function

Then you can run at start up by creating a macro that is titled autoexec
And using the Runcode action call the TurnMenuOn() function alternatively you can call it from the form opening.

It is often useful to create a couple of straight Macros that run these functions as well so that when you are in design mode you can quickly run the functions.

SEARCHING RECORDS – My UI Design Patterns

Well if it is good enough for Google

One single variable field with filtering only occurring after return has been hit. Search is across complete range of fields and searches for a string within all of the fields. Search string is maintained after search and results shown below. Don’t make the search case sensitive unless there is a good reason for doing so.

BANISH DOUBLE CLICKING – My UI Design Patterns

doubleclick

I eluded to it in my last post but when I have tabulated forms I like to have the labels activated and set to sort alphabetically ascending on left click and descending on a right click.

I use the mouse down event as trigger for this.

If Button = acLeftButton Then
    
    Me.OrderBy = "Q001Contacts.CompanyName, Q001Contacts.Surname, Q001Contacts.Firstname"
    Me.OrderByOn = True
    
    Else
    
    Me.OrderBy = "Q001Contacts.CompanyName DESC, Q001Contacts.Surname DESC, Q001Contacts.Firstname DESC"
    Me.OrderByOn = True
    
    End If

I use this pretty much on every single tabulated form that I have – This is not so different from Outlook or Itunes. What is different is that I tend to use a slightly more intelligent sort. In the above code you can probably see that for this particular column (its the company column) I first sort on company then I sort on the surname of the individual and then on the first name. I find it frustrating on things like Itunes or Outlook that you can be left searching around within a sort category sometimes as it is unclear what order within the sort that things are arranged by.

I use the left and right click rather than double because I used to find that double clicking would first sort the list one way and then immediately sort the other. A double click I believe sends mixed messages to the system and encourages lag. I still hate the idea of double clicking on things.

If I could change one thing about standard UI design it would be to get rid of the double click!!!

Needless to say I have banished double clicking from all my UIs.