Data Normalisation
The purpose of a relational database is to achieve the aims of a
database approach by defining the relationships between flat file structures or
tables. If data independence is to be achieved then the data stored in the
tables must be normalised. Normalised data has the following attributes:
- Each record in a particular table will have the same number of
fields. (This is called First Normal Form ).
- In addition to being in first normal form (1NF), each non-key data
item in the record is fully dependent on the primary key; i.e. the primary key
alone could be thought of as determining the other data in the record. (This is
called Second Normal Form). The term fully dependent here requires that each
non-key item must depend on the whole of the primary key. In some cases –
particularly when the primary key is concatenated – there may be non-key
data items that are dependent on only part of the primary key. In this case the
record structure is not in 2NF.
- In addition to being in second normal form (2NF), each non-key item
in the record is fully independent of any other non-key data item in the
record. (This is called Third Normal Form and is the usual limit of
normalisation)
The initial two entities given in fig 1 were not in first normal form.
The original attributes of member included the music that had been issued.
Since not every member plays in each item, different members would be issued
different numbers of parts. Similarly different musical works contain different
numbers of parts. Neither of these entities conforms with first normal form
since the number of occurrences of each attribute varies from one occurrence of
the entity to the next.
The redefined entities are all in first normal form because every
occurrence of a particular entity has the same structure as any other.

Fig 5 Member record structure
Let's look at the member entity. This has the attributes shown of
membership number, address and telephone. If mailing labels have to be produced
then the address must be broken down into a number of fields, one for each
line.
This is in first normal form because each occurrence of the entity
member possesses all of the attributes given (although for some the second
address and the telephone attribute may be an empty value).
Since the membership number identifies the member, once the number is
known all the other attribute values are determined. We say that the other
attribute values are functionally dependent on the membership number that is
the key field. The record structure is therefore in second normal form.

Fig 6 Two record structures in 3NF
However it is not in third normal form because most of the address is
functionally dependent on the postcode. Given a postcode the street name, town
and county can be determined. Assuming that Address_1 is used for the house
name - if there is one, and Address_2 is the house number and street name then
we can put Member records into 3NF by creating an additional entity called
Code. Address_2 is split into house number - which is functionally dependent on
Membership Number and so remains in the member record, and Street - which is
functionally dependent on the postcode and moves to the new record along with
town and county. Postcode in the member's record is now a foreign key –
i.e. it is a pointer to another type of record in which it is the primary
key.
You might think that county is functionally dependent on town but this
is not the case since the same town name can occur in several different
counties. Fixing the name of the town does not therefore tell you the name of
the county.
You might like to consider whether telephone number could be split into
two parts, one part of which is functionally dependent on the first part of the
post code and which should not therefore be present in the member record if it
is to be in strict 3NF.
It is important to remember that the purpose of normalisation is to
produce record structures that are independent of the applications that may
need to access them. The applications will access virtual records that will
draw related data from a number of different tables using the foreign keys as
pointers to put the data together
|