Mansfield Tutors

Organisation of Data for Effective Retrieval

A data file is made up of records. A single record will hold all the data relating to one unit of processing. In a customer file one record will store the data about one customer. The file will be composed of a number of records with the same structure. Each record will be made up of a series of fields such as name, credit rating, address etc. A field stores one item of data for the record.

The structure of the data file is determined when the system that uses it is designed. It is important that the correct record structure is chosen and this will normally be done by investigating the output that the system is to produce, the processing that must be carried out and the input that will be necessary to enable this processing to be done.

If, for example, a system is required to output address labels for a mail shot then the file structure must allow records for the intended recipients to be selected. It must also contain sufficient information for the address label. This will require fields containing Name, Address_1, Address_2, Town, County, and Postcode. Note that because of the format required for the address label, five fields are required to store the address. If in addition to mail labels the file is required for a mail merge then an additional field Salutation will be needed.

In many applications it will be necessary to identify a particular record uniquely. A field that does this is called the key field or sometimes the primary key field. A key field will normally have to be created specially for an application in order to ensure uniqueness. Examples of key fields are account number, part number and membership number. Data that would serve as a key field in one application may not do so in another. In a bookshop the ISBN will act as a key field to identify a stock item - a book. In a library this is not sufficient and an acquisition number is added to the ISBN to uniquely identify the individual copy of the book.

The designer will also specify the type of data that is to be stored in the field. This will allow the database management program to check that data stored in the field is consistent with the field type. This will allow the database management system to prevent a user placing data of the wrong type in a field. It will also allow the database management system to store the data in the most efficient way. Possible data types for a field are:

Numeric	A variety of number formats will be available depending on the size and type of numbers to be stored. Real (fractional) and Integer are two examples.
Text	Any grouping of characters. Normally a maximum number of characters will be specified.
Date	Date and time values. Usually can be stored in a variety of formats.
Boolean	True/False values. Sometimes categorised as Yes/No
Counter	A special field where the database management system will automatically insert an increasing value each time a new record is created.

File Type and Organisation

Files can be categorised by the type of access that is required. This in turn decides the way that the records are structured within the file.

Serial Access	Records are accessed in the order that they are stored within the file. This is suitable for a transaction file in a batch processing system where the records are in whatever order the source documents in the batch were in originally.
Sequential Access	Records are accessed in the order that they are stored within the file but they are stored in a sorted order. A transaction file will normally be sorted so that its records are in order of the key field for the master file that they will be processed against.
Random Access	The key field is used to generate a number that is apparently random in that it bears no obvious relation to the key field value. This number, which will always be the same for the same key field value, is used to determine which record position in the file will hold the record with this key. A particular record can then be found by using its key field to obtain its position in the file. This allows a particular record to be accessed and processed without accessing any other record in the file. The software must be able to calculate the position of the record so all the records must be the same length. Defining the record fields to be fixed length fields ensures this. Fields such as Name or Description which are not naturally fixed length are made so by padding them with spaces. Random access provides fast access to a particular record by way of its key. It is very difficult however to process a random access file in a pre-determined order.
Indexed Sequential	This method of access allows direct access to a particular record but also allows records to be processed in order of say account number. This is done by having an index file in addition to the main data file. The index file records consist of a master file key field value together with data to allow the corresponding master file record to be found. Records in the master file must still be composed of fixed length fields. It is possible to have a series of index files to allow the master file to be processed in a number of differently ordered sequences. Fields used to produce sort orders based on anything other than the key field or which are used to identify one or a group of records (e.g. surname) are called secondary keys. The unique key field is then called the primary key.

Three types of file can be identified:

Transaction Files	The records relate to individual transactions. The transaction file will be used to update the master file in batch processing applications. Note that changes caused by a transaction will not be reflected in the master file until the transaction file has been processed. This may cause problems when a mixture of real time access and batch processing is used. If for example sales are recorded in a transaction file to be processed at the end of the day while customer enquiries are handled by a real time system then an item may appear in stock when in fact the last one has been sold. Transaction files must be sorted in to master file key field order before processing.
Master File	Contains data that does not change very often - e.g. employee names and addresses etc. Also contains transaction data that has been built up over time or which is historical; e.g. cumulative pay or previous meter reading. In Batch processing the master file must be accessible in key field order so it has to be either a sequential file or more likely an indexed sequential file.
Reference Files	These contain reference data that changes infrequently. Examples are index files for indexed sequential access or data such as tax codes and rates.