Organisation of Data for Effective Retrieval
A data file is made up of records. A single record will hold all the
data relating to one unit of processing. In a customer file one record will
store the data about one customer. The file will be composed of a number of
records with the same structure. Each record will be made up of a series of
fields such as name, credit rating, address etc. A field stores one item of
data for the record.
The structure of the data file is determined when the system that uses
it is designed. It is important that the correct record structure is chosen and
this will normally be done by investigating the output that the system is to
produce, the processing that must be carried out and the input that will be
necessary to enable this processing to be done.
If, for example, a system is required to output address labels for a
mail shot then the file structure must allow records for the intended
recipients to be selected. It must also contain sufficient information for the
address label. This will require fields containing Name, Address_1, Address_2,
Town, County, and Postcode. Note that because of the format required for the
address label, five fields are required to store the address. If in addition to
mail labels the file is required for a mail merge then an additional field
Salutation will be needed.
In many applications it will be necessary to identify a particular
record uniquely. A field that does this is called the key field or sometimes
the primary key field. A key field will normally have to be created specially
for an application in order to ensure uniqueness. Examples of key fields are
account number, part number and membership number. Data that would serve as a
key field in one application may not do so in another. In a bookshop the ISBN
will act as a key field to identify a stock item - a book. In a library this is
not sufficient and an acquisition number is added to the ISBN to uniquely
identify the individual copy of the book.
The designer will also specify the type of data that is to be stored in
the field. This will allow the database management program to check that data
stored in the field is consistent with the field type. This will allow the
database management system to prevent a user placing data of the wrong type in
a field. It will also allow the database management system to store the data in
the most efficient way. Possible data types for a field are:
| Numeric |
A variety of number formats will be
available depending on the size and type of numbers to be stored. Real
(fractional) and Integer are two examples. |
| Text |
Any grouping of characters. Normally a
maximum number of characters will be specified. |
| Date |
Date and time values. Usually can be
stored in a variety of formats. |
| Boolean |
True/False values. Sometimes categorised
as Yes/No |
| Counter |
A special field where the database
management system will automatically insert an increasing value each time a new
record is created. |
File Type and Organisation
Files can be categorised by the type of access that is required. This in
turn decides the way that the records are structured within the file.
| Serial Access |
Records are accessed in the order that
they are stored within the file. This is suitable for a transaction file in a
batch processing system where the records are in whatever order the source
documents in the batch were in originally. |
| Sequential Access |
Records are accessed in the order that
they are stored within the file but they are stored in a sorted order. A
transaction file will normally be sorted so that its records are in order of
the key field for the master file that they will be processed against. |
| Random Access |
The key field is used to generate a
number that is apparently random in that it bears no obvious relation to the
key field value. This number, which will always be the same for the same key
field value, is used to determine which record position in the file will hold
the record with this key. A particular record can then be found by using its
key field to obtain its position in the file. This allows a particular record
to be accessed and processed without accessing any other record in the file.
The software must be able to calculate the position of the record so all the
records must be the same length. Defining the record fields to be fixed length
fields ensures this. Fields such as Name or Description which are not naturally
fixed length are made so by padding them with spaces. Random access provides
fast access to a particular record by way of its key. It is very difficult
however to process a random access file in a pre-determined order. |
| Indexed Sequential |
This method of access allows direct
access to a particular record but also allows records to be processed in order
of say account number. This is done by having an index file in addition to the
main data file. The index file records consist of a master file key field value
together with data to allow the corresponding master file record to be found.
Records in the master file must still be composed of fixed length fields. It is
possible to have a series of index files to allow the master file to be
processed in a number of differently ordered sequences. Fields used to produce
sort orders based on anything other than the key field or which are used to
identify one or a group of records (e.g. surname) are called secondary keys.
The unique key field is then called the primary key. |
Three types of file can be identified:
| Transaction Files |
The records relate to individual
transactions. The transaction file will be used to update the master file in
batch processing applications. Note that changes caused by a transaction will
not be reflected in the master file until the transaction file has been
processed. This may cause problems when a mixture of real time access and batch
processing is used. If for example sales are recorded in a transaction file to
be processed at the end of the day while customer enquiries are handled by a
real time system then an item may appear in stock when in fact the last one has
been sold. Transaction files must be sorted in to master file key field order
before processing. |
| Master File |
Contains data that does not change very
often - e.g. employee names and addresses etc. Also contains transaction data
that has been built up over time or which is historical; e.g. cumulative pay or
previous meter reading. In Batch processing the master file must be accessible
in key field order so it has to be either a sequential file or more likely an
indexed sequential file. |
| Reference Files |
These contain reference data that
changes infrequently. Examples are index files for indexed sequential access or
data such as tax codes and rates. |
|