Home

Relational database design

Relational database design

 

 

Relational database design

UNIT-3 RELATIONAL DATABASE DESIGN

PITFALLS IN DATABASE DESIGN

  • Redundant  information in tuples
  • Update anomalies

REDUNDANT INFORMATION IN TUPLE:
One goal of schema design is to minimize the storage space that the base relations occupy. Grouping attributes into relation schemas has a significant effect on the storage.
For (eg), compare the space used by the two base relations EMPLOYEE and DEPARTMENT with EMP_DEPT base relation which is the result of applying the NATURAL JOIN operation to EMPLOYEE and DEPARTMENT.
In EMP_DEPT, the attribute values pertaining to a particular department are repeated for every employee who works for that department.
In contrast, each department’s information appears once in DEPARTMENT relation.
i.e, EMP_DEPT-(ename, eid, DOB, Address, dno, dname, dMGRid)

UPDATE ANOMALIES:
Types

  • insertion anomalies
  • Deletion anomalies
  • Modification anomalies

i) Insertion anomalies: These can be differentiated into two types

  • To insert a new employee tuple into EMP_DEPT, we must include the attribute values for the department that the employee works for, or nulls (if the employee does not work for a department as yet).
  • It is a difficult to insert a new department that has no employee as yet in EMP_DEPT relation. The only way is to place null values in the attributes for employee. This causes a problem because eid is a primary of EMP_DEPT.

ii) Deletion anomalies: If we delete for EMP_DEPT an employee tuple that happens to represents the last employee working for a particular department, the information concerning that department is lost from the database.

iii) Modification anomalies: In EMP_DEPT, if we change the value of one of the attributes of a particular department, say the manager of department 10 we must update the tuples of all employees work in that department. Otherwise the database will become inconsistent.

FUNCTIONAL DEPENDENCIES
Functional dependencies play a key role in differentiating good database design from bad database designs. A functional dependency is a type of constraint that is a generalization of the notation of key.

Definition:
A functional dependency is denoted is denoted by X→Y, between two sets of attribute X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R.
The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X] also t1[Y] = t2[Y].
This means that the values of the Y component of a tuple in r depends on, or are determined by, the values of the X component.
The values of the X component of a tuple uniquely (or functionally) determine the values of the Y component. Y is functionally dependent on X.
EMP-PROJ= {eid, pnumber, hours, ename, pname, plocation}
Eg., Consider the relation schema EMP-PROJ from the semantics of the attributes.
The following functional dependencies should hold.

  • Eidèename
  • Pnumberè{pname, plocation}
  • {eid, pnumber}èhours

Diagrammatic notation for displays FD’s

Each FD is displayed as a horizontal line. The LHS attributes of the FDs are connected by vertical lines to the lines representing FD, while the RHS attributes are connected by arrows pointing toward the attributes.

INFERENCE RULES FOR FUNCTIONAL DEPENDENCIES
F is the set of functional dependencies that are specified on relation schema R. Numerous other FDs hold in all legal relation instance that satisfy the dependencies in F.
The set of all such dependencies are called closure of F and is denoted by F.
F= {Eid è {ename, dob, address, dno}
dno è{ dname, dMGRid }
We can infer the following additional FDs from F:
eid è {dname, dMGRid }
dno è dname, eid è eid
An FD XèY is inferred from a set of dependencies F specifies on R if XèY holds in every relation state r that is legal extension of R. (ie) whenever r satisfies all the dependencies in F, XèY also holds in r.
The closure F+ of F is the set of all FDs that can be inferred from F. Inference rules that can be used to infer new dependencies from a given set of dependencies.
IR1 (reflexive rule): If X  Y then X èY.
IR2 (Augmentation rule): {XèY} ≠ XZ èYZ
IR3 (Transitive rule): {XèY, YèZ} ≠XèZ
IR4 (Decomposition (or) Projective Rule):  {XèYZ} ≠ XèY
IR5 (Union or Additive rule): {XèY, XèZ} ≠ XèYZ
IR6 (Pseudo Transitive Rule): {XèY, WYèZ} ≠ WXèZ

  • IR1 states that a set of attributes always determines itself or any of its subsets. Because IR1 generates dependencies that are always true. Such dependencies are called trivial.
  • IR2 says that adding same set of attributes to both LHS and RHS of a dependency results another valid dependency.
  • IR4 we can remove attributes from RHS.

            The inference rules IR1 through IR3 are known Armstrong’s Inference Rules (or) Armstrong’s Axioms. Thus for each set of attributes X, we determine the set X+ of attributes that are functionally determined by X based on F X+ is called the closure of X under F.

Algorithm: Calculate X+ (Closure of Attributes Sets)
X+ :=X;
Repeat
For each functional dependency x in X+ apply reflexivity and augmentation rules on x add the resulting functional dependencies to ‘X’.
For each pair of functional dependencies x1 and x2 can be combined using transitivity add the resulting functional dependency X+.
Until X+ does not change further.

CANONICAL COVER
An attribute of a functional dependency is said to be extraneous if we can remove it without changing the closure of the set of functional dependencies.
Definition: Consider a set F of functional dependencies and the functional dependency αèβ in F.

  • Attribute A is extraneous in α if A € α, and F logically implies {F-{αèβ}} U {{α-A}èβ}.
  • Attribute A is extraneous if β if A € β, and the set of functional dependencies

  {F - {α - β}} U {α {β – A}} logically implies F
A canonical cover Fc for F is a set of dependencies such that F logically implies all dependencies in Fc, and Fc logically implies all dependencies in F. Fc must have the following properties.

  • No functional dependency in Fc contains an extraneous attribute.
  • Each left side of functional dependency in Fc  is unique. (ie) there are no two dependencies α1èβ1 and α2èβ2 in  such that α1=α2.

(eg) Consider the following set F of functional dependencies on schema (A, B, C).
AèBC
BèC
AèB
ABèC
Compute the canonical cover for F.

  • There are two functional dependencies with the same set of attributes on the left side of the arrow.

                                       AèBC
AèB
These FD;s will be combine into AèBC

  • A is extraneous in ABèC because F logically implies {F==.{ AèBC}}U{ BèC). This assertion is true because BèC is already is our set of FDs.
  • C is extraneous in AèBC, since AèBC is logically implied by AèB and BèC

Canonical cover is     AèB
BèC.

Dependency Preservation:
This property ensures that a constraint on the original relation can be maintained by simply enforcing some constraint on each of the smaller relations.

NORMALIZATION
Normalization of data is a process of analyzing the given relation schema based on their FDs and primary keys to archive the desirable properties of

  • Minimizing redundancy
  • Minimizing the insertion, deletion and update anomalies.

First Normal Form (1NF)
It states that the domain of an attribute must include only atomic (Simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.

S#

S_NAME

S_ADDRESS

P#

P_NAME

P_CITY

P_STATUS

QTY

S001

HCL

NORTH STREET,
CHENNAI

P001

MOUSE

DELHI

100

150

S002

IBM

WEST STREET,
MADURAI

P001

MOUSE

MUMBAI

120

200

SOO2

IBM

WEST STREET,
MADURAI

P002

KEY BOARD

PUNE

75

150

S003

DELL

SOUTH STREET,
COIMBATORE

P003

HARD DISK

DELHI

100

180

S004

HCL

EAST STREET, TRICHY

P004

DVD DRIVE

PUNE

75

200

TABLE-A
In the table-A the supplier address contains street and city. 1NF states that the domain of an attribute must include only atomic (Simple, indivisible) values. So we have to split S_ADDRESS into S_STREET and S_CITY.


S#

S_NAME

S_STREET

S_CITY

P#

P_NAME

P_CITY

P_STATUS

QTY

S001

HCL

NORTH

CHENNAI

P001

MOUSE

DELHI

100

150

S002

IBM

WEST

MADURAI

P001

MOUSE

MUMBAI

120

200

S002

IBM

WEST

MADURAI

P002

KEY BOARD

PUNE

75

150

S003

DELL

SOUTH

COIMBATORE

P003

HARD DISK

DELHI

100

180

S004

HCL

EAST

TRICHY

P004

DVD DRIVE

PUNE

75

200

TABLE-B

SECOND NORMAL FORM (2NF)
A relation is said to be in the second normal form, if it is already in the first normal from and it has no partial dependency (or) full functional dependency.
In the above table B the key attributes are S# and P#. The non-key attributes must depend on key attribute. But the Table-B, S-NAME, S_STREET and S-CITY depend on S# and P_NAME, P_CITY and P_STATUS depends on P#. Only QTY depends on both key attributes S# and P#. So we have to split the table.


S#

S_NAME

S_STREET

S_CITY

S001

HCL

NORTH

CHENNAI

S002

IBM

WEST

MADURAI

S003

DELL

SOUTH

COIMBATORE

S004

HCL

EAST

TRICHY

 

 

 

 

P#

P_NAME

P_CITY

P_STATUS

 

P001

MOUSE

DELHI

100

 

P001

MOUSE

MUMBAI

120

 

P002

KEY BOARD

PUNE

75

 

P003

HARD DISK

DELHI

100

 

P004

DVD DRIVE

PUNE

75

 


TABLE-C                                                                                         TABLE-D


S#

P#

QTY

S001

P001

150

S002

P001

200

S002

P002

150

S003

P003

180

S004

P004

200

TABLE-E
THIRD NORMAL FORM (3NF):
A relation is said to be in the third normal form, if it is already in the second normal from and it has no transitive dependency.
In the above table C and E doesn’t having transitive dependency. But in the Table-D the non-key attributes P_NAME, P_CITY and P_STATUS depends on P#. But the P-STATUS transitively depends on P_CITY. So we have to break the table.


P#

P_NAME

P_CITY

P001

MOUSE

DELHI

P001

MOUSE

MUMBAI

P002

KEY BOARD

PUNE

P003

HARD DISK

DELHI

P004

DVD DRIVE

PUNE

P_CITY

P_STATUS

DELHI

100

MUMBAI

120

PUNE

75


BOYCE-CODD NORMAL FORM (BCNF)
A relation is said to be in Boyce-Codd normal form if it is already in the third normal from and every determinant is a candidate key. It is a stronger version of 3NF.

Determinant: It is any field (Simple field or composite field) on which some other field is fully functionally determinant.

Comparison of BCNF and 3NF

  1. 3NF design is always dependency preserving and lossless dependency preserving is difficult to achieve in BCNF sometimes.
  2. BCNF strictly removes transitive dependency.
  3. BCNF relation is in 3NF, but reverse is not possible

FOURTH NORMAL FORM (4NF)
A relation is said to be in the fourth normal form if it is already in BCNF and it has no multi valued dependency.
Consider an unnormalized relation that contains information about supplier, product and quantity. Each tuple contains supplier number, product number repeated for each supplier name and product name repeated for each product name.


S#

P#

QTY

S001

P001

100

 

P002

150

S002

P001

100

 

P002

150

S003

P001

200

Fourth normal form separates independent multivalued facts stored in one table into separate tables. So we rewrite the unnormalized data of the above table into normalized form.


S#

P#

QTY

S001

P001

100

S001

P002

150

S002

P001

100

S002

P002

150

S003

P001

200

FIFTH NORMAL FORM (5NF)
A relation is said to be in 5NF if it is already in 4NF and it has no join dependency.
Consider the Supplier_Product_Project relation.

S#

P#

J#

S001

P001

J001

S001

P001

J002

S001

P002

J001

S002

P001

J001

Supplier_Product_Project is obtainable if all the three projections are joined. Two projections if joined do not give back the original Supplier_Product_Project relations. We’ll get extra tuple while we are joining two projections. If all the three relations are joined, then only we’ll get the exact relations. This is called join dependency.


First we split the Supplier_Product_Project relation into three projections.


S#

P#

S001

P001

S001

P002

S002

P001

a) Supplier_Product


P#

J#

P001

J001

P001

J002

P002

J001

b) Product_Project


J#

S#

J001

S001

J002

S001

J001

S002

c) Project_Supplier



Join over parts


S#

P#

J#

S001

P001

J001

S001

P001

J002

S001

P002

J001

S002

P001

J001

                                                                                                                                   

 

 

                                                                  

 

 

 

Join Supplier_Product & Product_Project with Project_Supplier

 

Source: https://www.snscourseware.org/snsct/files/CW_588842082c824/UNIT-3.doc

Web site to visit: https://www.snscourseware.org

Author of the text: indicated on the source document of the above text

If you are the author of the text above and you not agree to share your knowledge for teaching, research, scholarship (for fair use as indicated in the United States copyrigh low) please send us an e-mail and we will remove your text quickly. Fair use is a limitation and exception to the exclusive right granted by copyright law to the author of a creative work. In United States copyright law, fair use is a doctrine that permits limited use of copyrighted material without acquiring permission from the rights holders. Examples of fair use include commentary, search engines, criticism, news reporting, research, teaching, library archiving and scholarship. It provides for the legal, unlicensed citation or incorporation of copyrighted material in another author's work under a four-factor balancing test. (source: http://en.wikipedia.org/wiki/Fair_use)

The information of medicine and health contained in the site are of a general nature and purpose which is purely informative and for this reason may not replace in any case, the council of a doctor or a qualified entity legally to the profession.

 

Relational database design

 

The texts are the property of their respective authors and we thank them for giving us the opportunity to share for free to students, teachers and users of the Web their texts will used only for illustrative educational and scientific purposes only.

All the information in our site are given for nonprofit educational purposes

 

Relational database design

 

 

Topics and Home
Contacts
Term of use, cookies e privacy

 

Relational database design