DBDM 16 Marks All 5 Units
DBDM 16 Marks All 5 Units
Components include:
o File manager manages allocation of disk space and data structures used to
represent information on disk.
o Database manager: The interface between low-level data and application
programs and queries.
o Query processor translates statements in a query language into low-level
instructions the database manager understands. (May also attempt to find an equivalent
but more efficient form.)
o DML precompiler converts DML statements embedded in an application program
to normal procedure calls in a host language. The precompiler interacts with the query
processor.
o DDL compiler converts DDL statements to a set of tables containing metadata
stored in a data dictionary.
In addition, several data structures are required for physical system implementation:
o Data files: store the database itself.
o Data dictionary: stores information about the structure of the database. It is used
heavily. Great emphasis should be placed on developing a good design and efficient
implementation of the dictionary.
o Indices: provide fast access to data items holding particular values.
Database users and Administrator
A primary goal of a database system is to retrieve information from and store new
information in the database. People who work with a database can be categorized as
database users or database administrators.
External level
The users' view of the database External level describes that part of the database
that is relevant to each user.
The external level consists of a number of different external views of the
database. Each user has a view of the 'real world' represented in a form that is familiar
for that user. The external view includes only those entities, attributes, and relationships
in the real world that the user is interested in.
The use of external models has some very major advantages,
• Makes application programming much easier.
• Simplifies the database designer's task.
• Helps in ensuring the database security.
Conceptual level
The community view of the database conceptual level describes what data is
stored in the database and the relationships among the data.
The middle level in the three level architecture is the conceptual level. This level
contains the logical structure of the entire database as seen by the DBA. It is a complete
view of the data requirements of the organization that is independent of any storage
considerations. The conceptual level represents:
• All entities, their attributes and their relationships
• The constraints on the data
• Semantic information about the data
• Security and integrity information.
The conceptual level supports each external view. However, this level must not
contain any storage-dependent details. For instance, the description of an entity should
contain only data types of attributes and their length, but not any storage consideration
such as the number of bytes occupied.
Internal level
The physical representation of the database on the computer Internal level
describes how the data is stored in the database.
The internal level covers the physical implementation of the database to achieve
optimal runtime performance and storage space utilization. It covers the data structures
and file organizations used to store data on storage devices.
The internal level is concerned with
• Storage space allocation for data and indexes.
• Record descriptions for storage.
• Record placement.
• Data compression and data encryption techniques.
• Below the internal level there is a physical level that may be managed by the
operating system under the direction of the DBMS.
Physical level
• The physical level below the DBMS consists of items only the operating system
knows such as exactly how the sequencing is implemented and whether the fields of
internal records are stored as contiguous bytes on the disk.
The data model is a collection of conceptual tools for describing data, data
relationships, data semantics, and consistency constraints. A data model provides a way
to describe the design of a data base at the physical, logical and view level.
The purpose of a data model is to represent data and to make the data
understandable.
According to the types of concepts used to describe the database structure, there
are three data models:
1. An external data model, to represent each user's view of the organization.
2. A conceptual data model, to represent the logical view that is DBMS independent.
3. An internal data model, to represent the conceptual schema in such a way that it can be
understood by the DBMS.
In the object-oriented data model (OODM) both data and their relationships are
contained in a single structure known as an object.
An object is described by its factual content. An object includes information about
relationships between the facts within the object, as well as information about its
relationships with other objects. Therefore, the facts within the object are given greater
meaning. The OODM is said to be a semantic data model because semantic indicates
meaning.
The OO data model is based on the following components:
An object is an abstraction of a real-world
entity. Attributes describe the properties of an
object.
Once the database schemas are complied and the database is populated with
data, users must have some means to manipulate the database. The DBMS provides a
set of operations or a language called the data manipulation language(DML) for
manipulations include retrieval, insertion, deletion, and modification of the data.
A language that allows the DBA or user to describe and name the entities,
attributes, and relationships required for the application, together with any associated
integrity and security constraints is called DDL.
The storage structure and access methods used by the database system by a set
of statements in a special type of DDL called a data storage and definition language.
These statements define the implementation details of the database schemas, which are
usually hidden from the users.
The data values stored in the database must satisfy certain consistency
constraints. The DDL provides facilities to specify the following constraints. The
database systems check these constraints every time the database is updated.
Domain Constraints:
There are cases to ensure that a value that appears in one relation for a given set
of attributes also appears for a certain set of attributes in another relation. Assertions
An assertion is any condition that the database must always satisfy. Domain
constraints and referential integrity constraints are special forms of assertions. When an
assertion is created, the system tests it for validity. If the assertion is valid then any
future modification to the database is allowed only if it does not cause that assertion to
be violated. Authorization
Insert authorization, which allows insertion of new data, but not modification of
existing data.
Update authorization, which allows modification, but not deletion, of data. Delete
authorization, which allows deletion of data.
We may assign the user all none or a combination of these types of authorization.
The output of the DDL is placed in the data dictionary, which contains metadata
that is, data about data.
DML is a language that provides a set of operations to support the basic data
manipulation operations on the data held in the database. Data Manipulation operations
usually include the following:
The part of a DML that involves data retrieval is called a Query language. A Query is a
statement requesting the retrieval of information. There are basically two types of DML
• Procedural DMLs.
Procedural DML: A language that allows the user to tell the system what data is needed
and exactly how to retrieve the data.
Non Procedural DML: A language that allows the user to state what data is needed
rather than how it is to be retrieved.
6. Briefly explain about Entity-Relationship model:
The semantic aspect of the model lies in its representation of the meaning of the
data. The E-R model is very useful in mapping the meanings and interactions of real-
world enterprises onto a conceptual schema.
The ERDs represent three main components entities, attributes and relationships.
Entity sets:
An entity is a thing or object in the real world that is distinguishable from all
other objects. Example:
An entity has a set of properties, and the values for some set of properties may
uniquely identify an entity. Example:
A person may have a person-id would uniquely identify one particular property whose
value uniquely identifies that person.
An entity set is a set of entities of the same type that share the same properties,
or attributes. Example:
Relationship sets:
A relationship that associates customer smith with loan L-16, specifies that
Smith is a customer with loan number L-16.
The number of entity sets that participate in a relationship set is also the degree
of the relationship set.
Attributes:
For each attribute, there is a set of permitted values, called the domain, or value
set, of that attribute. Example:
The domain of attribute customer name might be the set of all text strings of a
certain length.
An attribute of an entity set is a function that maps from the entity set into a
domain.
• Derived attribute.
The address attribute of the branch entity can be subdivided into street, city, and
postcode attributes.
Single-valued Attributes:
An attribute that holds a single value for each occurrence of an entity type is
called single valued attribute. Example:
Each occurrence of the Branch entity type has a single value for the branch
number (branch No) attribute (for example B003).
Multi-valued Attribute
An attribute that holds multiple values for each occurrence of an entity type is
called multi-valued attribute. Example:
Each occurrence of the Branch entity type can have multiple values for the telNo
attribute (for example, branch number B003 has telephone numbers 0141-339-2178 and
0141-339-4439).
Derived attributes
An attribute that represents a value that is derivable from the value of a related
attribute or set of attributes, not necessarily in the same entity type is called derived
attributes.
Dl Marketing 10M
D2 Development 12M
D3 Research 5M
Employee database: Emp:
El John Dl 40K
E2 Peter Dl 42K
E3 David D2 30K
E4 Sachin D2 35K
Columns in relations (table) have associated data types. The relational model includes
an open-ended set of data types, i.e. users will be able to define their own types as well
as being able to use system-defined or built in types. Every relation value has two pairs
2) A set of rows
The optimizer is the system component that determines how to implement user
requests. The process of navigating around the stored data in order to satisfy the user's
request is performed automatically by the system, not manually by the user. For this
reason, relational systems are sometimes said to perform automatic navigation.
The catalog will typically include two system relvars called TABLE and COLUMN.
The purpose of which is to describe the tables in the database and the columns in those
tables.
8. Explain about Advantages and disadvantages of DBMS:
• Data consistency
• Amount of data
• Sharing of data
• Improved security
• Enforcement of standards
• Economy of scale
• Increased productivity.
• Increased concurrency.
Data consistency
If a data item is stored more than once and the system is aware of this, the
system can ensure that all copies of the item are kept consistent.
With the integration of the operational data, it may be possible for the
organization to derive additional information from the same data. Example:
Sharing of data
The database belongs to the entire organization and can be shared by all
authorized user. In this way. more users share more of the data.
Database integrity refers to the validity and consistency of stored data. Integrity
is expressed in terms of constraints, which are consistency rules that the database is not
permitted to violate. Constraints may apply to data items within a single record or they
may apply to relationships between records. Example:
Improved security
A sales assistant may have access to all data relating to properties but no access to
sensitive data such as staff salary details.
Enforcement of standards
Integration allows the DBA to define and enforce the necessary standards such
as naming conventions, documentation standards, update procedures and access rules.
Economy of scale
Combining all the organization's operational data into one database, and creating a set
of applications that work on this one source of data, can result in cost savings.
Many DBMSs provide query languages or report writers that allow users to ask
ad hoc questions and to obtain the required information almost immediately at their
terminal, without requiring a programmer to write some software to extract this
information from the database.
Increased productivity
A DBMS separates the data descriptions from the applications, thereby making
applications immune to changes in the data descriptions. This is known as data
independence. The provision of data independence simplifies database application
maintenance.
Increased concurrency
Many DBMSs manage concurrent database access and ensure such problems cannot
occur.
Disadvantages of DBMSs
• Complexity
• Size
• Cost of DBMSs
• Cost of conversion
• Performance
• Higher impact of a failure Complexity
The provision of the functionality makes the DBMS an extremely complex piece of
software. Database designers and develops, the data and database administrators, and
end-users must understand this functionality to take full advantage of it. Failure to
understand the system can lead to bad design decisions, which can have serious
consequences for an organization. Size
EnhancedEntityRelationship(EER)Model
EER is a high-level datamodel that incorporates the extensions to the original ERmodel.Enhanced ERD
are high level models that represent the requirements and complexities of complex database.
In addition to ERmodel concepts EE-R includes−
Discuss the correspondence between the ER model construct and the relational model constructs
1. Entity Sets:
o Correspondence: An entity set in the ER model maps to a relation (table) in the relational
model.
o Mapping: Each entity set becomes a table in the relational model, with each entity becoming a
row, and attributes becoming columns.
2. Relationships:
o Correspondence: Relationships in the ER model can be represented using foreign keys in the
relational model.
o Mapping: Relationships become associations between tables, and foreign keys are used to
establish the connections between related entities.
3. Attributes:
o Correspondence: Attributes in the ER model map to columns in the relational model.
o Mapping: Each attribute becomes a column in the corresponding relation.
4. Keys:
o Correspondence: Keys in the ER model are mapped to primary keys in the relational model.
o Mapping: The primary key of a relation uniquely identifies each row in the table, just as a key
in the ER model uniquely identifies each entity.
To take any one example of Entity relation with relevant diagram (7)
Properties of relation,relational keys (6)
13.b) Show how each ER Model construct can be mapped to the relational model .Discuss the option for
mapping EER construct
1. Inheritance Hierarchies:
o Option: Each subclass can be mapped to a separate relation with a foreign key referencing the
superclass.
o Conditions: This option works well when each subclass has distinct attributes and when queries
need to access subclass-specific attributes individually.
2. Aggregation:
o Option: Aggregated objects can be mapped to separate relations, with foreign keys linking
them to the main relation.
o Conditions: This option is useful when complex objects are aggregated into higher-level
entities and when queries need to access the aggregated objects separately.
3. Specialization/Generalization:
o Option: Specialized entities can be mapped to separate relations with a foreign key referencing
the generalized entity.
o Conditions: This option is suitable when specialized entities have additional attributes or
relationships specific to their specialization, and when queries need to access specialization-
specific attributes individually.
4. Union Types:
o Option: Union types can be represented as separate relations, each representing a possible
option in the union.
o Conditions: This option is applicable when entities can have multiple types and when queries
need to access attributes specific to each type.
The choice of mapping options depends on the specific requirements of the application, the nature of the data
model, and the desired querying and performance characteristics. The selected mapping should accurately
represent the semantics of the EER model while ensuring efficient storage, retrieval, and manipulation of data
in the relational model.
14.a) Discuss correlated nested queries? Write a query to find the names of sailors who have not reserved a
red boat?
One of the most powerful features of SQL is nested queries. Anested queryis a query that has another query
embedded within it; the embedded query is called asubquery. When writing a query, we sometimes need to
express a condition that refers to a table that must itself be computed. The query used to compute this
subsidiary table is a subquery and appears as part of the main query. A subquery typically appears within
theWHEREclause of a query.Subqueries can sometimes appear in theFROMclause or theHAVINGclause
SELECTS.sname
FROMSailors S
WHERES.sidIN
(SELECTR.sid
FROMReserves R
WHERER bidIN(SELECTB.bid FROM Boats B WHEREB.color = ‘red’ )
The innermost sub query finds the set of bids of red boats (102 and 104 on instance B1). The sub query one
level above finds the set of sids of sailors who have reserved one of these boats. On instancesB1,R2, andS3,
this set of sids contains 22, 31, and 64. The top-level query finds the names of sailors whosesidis in this set
ofsids. For the example instances, we get Dustin, Lubber, and Horatio. To find the names of sailors who have
not reserved a red boat, we replace the outermost occurrence of IN by NOT IN
14.b) Write a query to find the names of sailors who have reserved a red boat?
SELECTS.sname
FROMSailors S
WHERES.sidIN
(SELECTR.sid
FROMReserves R
WHERER bidIN(SELECTB.bid FROM Boats B WHEREB.color = ‘red’ )
The innermost sub query finds the set of bids of red boats (102 and 104 on instance B1). The sub query one
level above finds the set of sids of sailors who have reserved one of these boats. On instancesB1,R2, andS3,
this set of sids contains 22, 31, and 64. The top-level query finds the names of sailors whosesidis in this set
ofsids. For the example instances, we get Dustin, Lubber, and Horatio. To find the names of sailors who have
not reserved a red boat, we replace the outermost occurrence of IN by NOT IN
15.a) Consider the following relations : EMPLOYEE (ENO, NAME, DATE_BORN, GENDER,
DATE_JOINED, DESIGNATION, BASIC_PAY, DEPARTMENT_NUMBER) DEPARTMENT
(DEPARTMENT NUMBER, NAME) Write SQL queries to perform the following :
(i) List the details of employees belonging to department number ‘CSE’. (3)
(ii) List the employee number, employee name, department number and department name of all
employees. (4)
(iii) List the department number and number of employees in each department. (3)
(iv) (iv) List the details of employees who earn less than the average basic pay of all employees. (3)
Employees table: 1. create an employee’s table with the following fields:
(Emp_id,First_name,Last_name,Phone_No,Hire_date,Job_id,Emp_Salary,Comission_Pct,manager
_id,Department_id)
Query: create table Employees (Emp_id NUMBER(6),First_name CHAR(25),Last_name
CHAR(20),Phone_No NUMBER(12),Hire_date DATE,Job_Id NUMBER(5),Emp_Salary
NUMBER(7),Comission_Pct NUMBER(5),manager_id NUMBER(5),Department_id
NUMBER(5));
2. Insert five records into the table employees:
Queries-5
Statements -4
Table – 4
15. b) A university registrar’s office maintains data about the following entities:
(1) Courses, including number, title, credits, syllabus, and prerequisites;
(2) Course offerings, including course number, year, semester. section number, instructor, timings, and
classroom;
(3) Students, including student-id, name, and program; and
(4) Instructors, including identification number, name, department, and title. Further, the enrolment of
students in courses and grades awarded to students in each course they are enrolled for must be appropriately
modelled. Model an entity relationship diagram for the registrar’s office.
(ii) Model an E-R diagram for a car-insurance company whose customers own one or more cars each. Each
car has associated with it zero to any number of recorded accidents. State any assumptions you make.(8+5)
Part- C(1x15 = 15)
16.a)Assume the following tables:
Degree(degcode, name, subject)
Candidate(seatno, degcode, name, semester, month, year, result)
Marks(seatno, degcode, semester, month, year, papcode, marks) In the degree table,
degcode is the degree code, name is the name of the degree (e.g. M.Sc., M.Com. etc.) and subject is the
subject of the course (e.g. Physics, Math etc.). In the marks table papcode is the paper code (e.g. A-1, A-2
etc.)
Solve the following queries using Relational Algebra / SQL:
Write a SELECT statement to Display
i)Display all the degree codes for which there are no candidates in the order of degcode.
ii)Display the names of all candidates who have appeared for their M.Sc. (Phys) examination in the order of
name.
iii)Display the name, subject and number of candidates for all degrees in which there are less than 5
candidates. iv)Display the names of all the candidates who have got less than 40 marks in exactly two
subjects.
v)Display the names of all the candidates who have got highest total marks in M.Sc. (Math).
16.b) Consider a suitable example of tuples/records for the above mentioned tables and write DML statements
(SQL) to answer for the queries listed below:
(i) Which courses does a specific professor teach?
(ii) (ii)What courses are taught by two specific professors?
(iii) (iii) What teaches a specific course and where is his/her office?
(iv) (iv) For a specific student number, in which courses is the student registered and what is his/her
name?
(v) (v) Who are the professors for a specific student?
(vi) (vi) Who are the students registered in a specific course?
The select, project and rename operations are called unary operations, because they operate on
one relation.
The union, Cartesian product, and set difference operations operate on pairs of relations and
are called binary operations.
Staff
Staffho Name Position Sex DOB Salary Branchno
SL21 John Manager M 1-9-45 30000 B005
SG37 Ann Assistant F 10-11-60 20000 B003
SG14 David Supervisor M 4-3-58 18000 B003
SA9 Mary Assistant F 3-6-40 12000 B007
SG9 Julie Manager F 4-5-70 9000 B003
SL41 Susan Assistant F 6-8-80 20000 B005
Branch
Branchno Street City Postcode
B005 22 Deer Rd London SW1 4EH
B007 16 Argyll St Aberdeen AB2 3SU
B003 163 Main St Glasgow Gil 9QX
B004 32 Manse Rd Bristol BS99INZ
B002 56 Clover Dr London NW10 6EU
PropertyforRent
Property Street City Postcode Type Rooms Rent Owner Staffno Bra
No no a no
Viewing
ciientNo properfyNo viewDate Comment
CR56 PA14 24-05-01 Too small
CR76 PG4 20-04-01 Too remote
CR56 PG4 26-05-01
CR62 PAH 14-05-01 No dining room
CR56 PG36 28-04-01
Private owner
ciientNo branchNo staftNo Datejoined
CR76 BOOS SL41 02-01-01
CR56 B003 SG37 11-04-00
CR74 B003 SG37 16-11-99
CR62 B007 SA9 07-03-00
σ Predicate (R)
Example:
List all staff with a salary greater than 10000.
Sol:
salary > 10000 (Staff).
The input relation is staff and the predicate is salary>10000. The selection operation defines a
relation containing only those staff tuples with a salary greater than 10000.
staffNo Name Position Sex DOB Salary branchNo
SL21 John Manager M 01-10-45 30000 B005
SG37 Ann Assistant F 10-11-60 12000 B003
SGI4 David Supervisor M 24-03-58 18000 B0003
SG5 Susan Manager F 03-06-40 24000 B0003
Projection (π):
The projection operation works on a single relation R and defines a relation that contains a
vertical subset of R, extracting the values of specified attributes and eliminating duplicates. Syntax:
π al,.......an(R)-
Example:
Produce a list of salaries for all staff, showing only the staffNo, name and salary.
Rename (ρ):
Rename operation can rename either the relation name or the attribute names or both Syntax:
Union
The union of two relations R and S defines a relation that contains all the tuples of R or S or
both R and S, duplicate tuples being eliminated. Union is possible only if the schemas of the two
relations match.
Syntax:
RUS
Example:
List all cities where there is either a branch office or a propertyforRent.
City
London
Aberdeen
Glassgow
Bristol
Set difference:
The set difference operation defines a relation consisting of the tuples that are in relation R,
but not in S. R and S must be union-compatible.
Syntax
R-S
Example:
List all cities where there is a branch office but no properties for rent.
Sol.:
Π city (Branch) – π city (propertyforRent).
The result of this operation is
City
Bristol
Intersection
The intersection operation defines a relation consisting of the set of all tuples that are
in both R and S. R and S must be union compatible.
Syntax:
R∩S
Example:
List all cities where there is both a branch office and at least one propertyforRent.
πciity (Branch) ∩ πCjty (propertyforRent)
The result of this operation is
City
London
Aberdeen
Glassgow
Cartesian product:
The Cartesian product operation defines a relation that is the concatenation of every tuple of
relation R with every tuple of relation S.
Syntax:
RXS
Example:
List the names and comments of all clients who have viewed a propertyforRent. Sol.:
The names of clients are held in the client relation and the details of viewings are held in the
viewing relation. To obtain the list of clients and the comments on properties they have viewed, we
need to combine these two relations.
3. Explain about Domain Relational Calculus:
Domain relational calculus uses the variables that take their values from domains of attributes.
An expression in the domain relational calculus has the following general
form
{dl,d2,...... dn/F(dl,d2,...............dm)} m > n
Where dl,d2, ....dn.....,dtn represent domain variables and F(dl,d2,.....dm)
pepresents a formula composed of atoms, where each atom has one of the following forms:
• R(dl,d2,........dn), where R is a relation of degree n and each d; is a domain
variable.
• dj Өdj, where dj and dj are domain variables and 9 is one of the comparison operations (<, <
>, >, =,
• dj Ө C, where d, is a domain variable, C is a constant and 8 is one of the
comparison operators. Recursively build up formulae from atoms using the following rules:
• An atom is a formula.
• If Fl and F2 are formulae, so are their conjunction Fl ∩ F2, their disjunction Fl U F2 and the
negation ~Fl.
Structural query language (SQL) is the standard command set used to communicate with
the relational database management systems. All tasks related to relational data management-
creating tables, querying the database for information.
Advantages of SQL:
• SQL is a high level language that provides a greater degree of abstraction than procedural
languages.
• Increased acceptance and availability of SQL.
• Applications written in SQL can be easily ported across systems.
• SQL as a language is independent of the way it is implemented internally.
• Simple and easy to leam.
• Set-at-a-time feature of the SQL makes it increasingly powerful than the record-at-a-time
processing technique.
• SQL can handle complex situations.
SQL operators:
• Arithmetic operators - are used to add, subtract, multiply, divide and negate data value (+, -,
*, /).
• Comparison operators - are used to compare one expression with another. Some comparison
operators are =, >, >=, <, <=, IN, ANY, ALL, SOME, BETWEEN, EXISTS, and so on.
• Logical operators - are used to produce a single result from combining the two separate
conditions. The logical operators are AND, OR and NOT.
• Set operators - combine the results of two separate queries into a single result. The set
operators are UNION, UNIONALL, INTERSECT, MINUS and so on.
Syntax:
To create a base table that is having the same structure as some existing table.
Syntax:
Create table base-table-name like tablename; Example:
Create table catalog like book;
This creates a table called catalog with the same structure as Book. Alter table
An existing base table can be modified by using the alter table statement.
Syntax:
Alter table base-table-name
Add column datatype {null / not null with default}; Example:
SQL>Alter table book
Add discount integer null; O/P: Table altered.
This adds another column discount with data type integer.
Truncate table
If there is no further use of records stored in a table and the structure has to be retained then
the records alone can be deleted. Syntax
Truncate table table-name; Example:
DESC
Desc command used to view the structure of the table.
Syntax
O/P:
Year integer
Price null integer
Drop table
An existing base table can be deleted at any time by using the drop table statement. Syntax
Data integrity refers to the correctness and completeness of the data in a database, i.e. an
integrity constraint is a mechanism used to prevent invalid data entry into the table.
The various types of integrity constraints are
(or)
Table constraints which refer other columns of the table should be defined at the table level.
Create table <table-name>
(column-name 1 datatype(size),
Column-name n datatype(size),
Constraint <constraint-name> primary key (column-name 1), Constraint
<constraint-name> foreign key
(foreign-column-name) references
referenced-table[(primary-column of referenced table)], constraint
<constraint-name> check(<condition>); Example:
A language in which SQL queries are embedded is referred to as a host language, and the
SQL structures permitted in the host language constitute embedded SQL.
EXEC SQL
<embedded SQL statements Before executing any SQL statements, the program must first
connect to the database by using
* SQL CA
The DBMS uses an SQL communications Area (SQL CA) to report runtime errors to the
application program. The SQLCA is a data structure that contains error variables and status
indicators.
Syntax:
In a homogeneous distributed data base system, each database in the system is by the same
vendor.
In a heterogeneous distributed database system, at least one of the databases will be that of
a different vendor.
Site 1
original relation r.
• Failure of a site
• Loss of messages
• Failure of a communication link
• Network partition
Advantages of DDBMS
• Reflects organizational structure
• Improved shareability
• Improved availability
• Improved reliability
• Improved performance
• Economics
• Modular growth
Disadvantages of DDBMS
• Complexity
• Cost
• Security
• Integrity control more difficult
• Lack of standards
• Lack of experience
• Database design more complex.
Characteristics of DDBMS
• A collection of logically related shared data
• The data is split into a number of fragments.
• Fragments may be replicated.
• Fragments / replicas are allocated to sites.
• The sites are linked by a communications network.
• The data at each site is under the control cf a DBMS.
• The DBMS at each site can handle local applications, autonomously.
• Each DBMS participates in atleast one global application.
The set of all functional dependencies that are implied by a given set of functional
dependencies X is called closure of X.
A set of inference rules, called Armstrong's axioms, specifies how new-functional
dependencies can be inferred from given ones.
Let A, B, C and D be subsets of the attributes of the relation R. Armstrong's axioms are as
follows:
1) Reflexivity
A->A
5) Decomposition
s# Status City
S3 50 London
S4 50 Paris
s# Status
S3 50
S4 50
SC
s# City
S3 London
S4 Paris
SST
s# Status
S3 50
S4 50
STC
Status City
50 London
50 Paris
case a:
In case a, no information is lost. The SST and SC values still tell us that supplier S3 has status 50
and city London, and supplier S4 has status 50 and city Paris. Therefore this first decomposition
is indeed nonloss.
case b:
Information definitely is lost. The SST values still tell us that both suppliers have status 50, but
STC values cannot tell us which supplier has which city. Therefore the second decomposition is
lossy.
Decomposition is really a process of projection.
In case (a) that no information is lost; if we join SST and SC back together again, we get back to
the original relations.
In case(b), if we join SST and SC together again, we do not get back the original relations, and so
we have lost information.
Thus, just as the decomposition operator for normalization purposes is projection, so the
decomposition operator is join.
First normal form is a relation in which the intersection of each row and column contains
one and only one value.
To transform the un-normalized table (a table that contains one or more repeating groups) to
first normal form, identify and remove the repeating groups within the table, (i.e multi valued
attributes, composite attributes, and their combinations). Example:
Multi valued attribute - phone number
Composite attributes - address. There are two common approaches to removing repeating groups
from un-normalized tables:
1) Remove the repeating groups by entering appropriate data in the empty columns of rows
containing the repeating data. This approach is referred to as 'flattening' the table, with this
approach, redundancy is introduced into the resulting relation, which is subsequently removed
during the normalization process.
2) Removing the repeating group by placing the repeating data, along with a copy of the
original key attribute(s), in a separate relation. A primary key is identified for the new
relation.
Example 1: (Multi valued).
Consider the contacts table, which contains the contact tracking information.
The above table contains a repeating group of the date and description of two
conversations.
The only advantage of designing the table like this is that it avoids the need
Example: To locate a conversation on a specific date here both the date columns have to be
searched.
To eliminate the repeating group, the group is moved to another table, which is then related
to the parent table. The primary key of the parent table (contact_ID) is stored in the second table.
First Normal form of contacts table:
Contacts Table
Contact _I D Name
Conversation table:
Contact _I D Contac_tdate Contact_desc
In First Normal form, every table should have a primary key, and each set of repeating groups should
appear in its own table.
Example2:
Consider the department relation.
A functional dependency, denoted by X —>Y, between two sets of attributes X and Y that are
subsets of R specifies a constraint on the possible tuples that can form a relation state r of R.
Example:
The relation holds the following functional dependencies.
FDI {SSN, PNumber} -> Hours.
A combination of SSN and PNumber values uniquely determines the number of Hours the employee
works on the project per week.
FD2 SSN -> EName.
The value of an employee's SSN value uniquely determines EName
FD3 PNumber -> {PName, PLocation}.
The value of a project's number uniquely determines the project Name and
location.
A functional dependency X—>Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold any more. Example:
{SSN, PNumber} ->Hours.
A functional dependency X —> Y is a partial dependency if some attribute A £ X can be
removed from X and the dependency still holds.
Example:
{SSN, PNumber} —> EName is partial because SSN —>EName holds.
Second normal form applies to relations with composite keys, ie. relations with a primary key
composed of two or more attributes. A relation with a single attribute primary key is automatically in
at least 2 NF.
A relation that is in first normal form and every non-primary-key attribute is fully
functionally dependent on the primary key is in Second Normal Form.
The Normalization of I NF relations to 2 NF involve the removal of partial dependencies. If
a partial dependency exists, remove the functionally dependent attributes from the relation by placing
them in a new relation along with a copy of their determinant.
5. Explain about third Normal Form:
Example:
To transform the EMPDept relation into third normal form, first remove the transitive
dependency by creating two new relations EDI and ED2.
6. Explain about Boyce codd Normal Form:
Relations that have redundant data may have problems called update anomalies, which are
classified as insertion, deletion or modification anomalies. These anomalies occur because, when the
data in one table is deleted or updated or new data is inserted, the related data is also not
correspondingly updated or deleted. One of the aims of the normalization is to remove the update
anomalies.
Boyce-codd Normal Form (BCNF) is based on functional dependencies that take into
account all candidate keys in a relation.
A candidate key is a unique identifier of each of the tuple.
For a relation with only one candidate key, third normal form and BCNF are
equivalent. A relation is in BCNF if any only if every determinant is a candidate key.
To test whether a relation is in BCNF, identify all the determinants and make sure that they
are candidate keys. A determinant is an attribute or.a group of attributes on which some other
attribute is fully functionally dependent.
The difference between third normal form and BCNF is that for a functional dependency
A ->B, the third normal form allows this dependency in a relation if 'B' is a primary-key attribute
and 'A' is not a candidate key, whereas BCNF insists that for this dependency to remain in a relation,
'A' must be a candidate key.
(clientNo, interviewDate),
(staffNo, interviewDate, interviewtime),
and (roomNo, interviewDate, interviewTime).
Select (clientNo, interviewDate) to act as the primary key for this relation.
The client interview relation has the following functional dependencies:
fdl: clientNo, interviewdate->
interviewTime, staffNo, roomNo
fd2: staffNo, inerviewdate, interviewTime->
clientNo (Candidatekey).
Fd3: RoomNo, interviewdate, interviewtime —>
staffNo, clientNo(candidate)
fd4: staffNo, interviewdate—> roomNo
As functional dependencies fdl, fd2, and fd3 are all candidate keys for this relation, none of
these dependencies will cause problems for the relation.
This relation is not BCNF due to the presence of the (staffNo, interviewdate) determinant,
which is not a candidate key for the relation.
BCNF requires that all determinants in a relation must be a candidate key for the relat
Interview relation
clientNo Interviewdate interviewTime staffNo
Staffroom relation:
In this, members of staff called Ann Beech and David Ford work at branch B003, and
property owners called Carl Farrel and Tina Murphy are registered at branch B003.
However, as there is no direct relationship between members of staff and property owners.
The MVD in this relation
is branchNo-»Sname
branchNo->> OName
Forth-Normal Form
A relation that is in Boyce-codd normal form and contains no nontrivial multi-valued
dependencies is in Fourth Normal Form.
The normalization of BCNF relations to 4NF involves the removal of the MVD from the
relation by placing the attributes in a new relation along with a copy of the determinant(s).
Example:
Whenever we decompose a relation into two relations the resulting relations have the loss-
less join property. This property refers to the fact that we can rejoin the resulting relations to produce
the original relation.
Example:
The decomposition of the Branch staffowner relation
The decomposition of the Branch staffowner relation
branchNo SName OName
B003 Ann Beech Carol Farrel
B003 David Ford Carol Farrel
B003 Ann Beech Tina Murphy
BOOn David Ford Tina Murphy
into the Branchstaff
branchNo SName
B003 Ann Beech
B003 David Ford
and Branchowner
BranchNo OName
B003 Carol Farrel
B003 Tina Murphy
relation has the lossless-join property.
i.e. the original Branchstaffowner relation can be reconstructed by performing a join
operation.
Property item
propertyNo itemDescription
PG4 Bed
PG4 Chair
PG16 Bed
PG16 Bed
Item supplier
itemDescription suppiierNo
Bed SI
Chair S2
Property supplier
propertyNo suppiierNo
PG4 SI
PG4 S2
PG16 S2
The propertyitemsupplier relation with the form (A,B,C) satisfies the join dependency JD (R1(A,B),
R2(B,C). R3 (A, C)).
i.e. performing the join on all three will recreate the original propertyitemsupplier relation.
UNIT-IV
This is the initial state, the transaction stays in this state while it is executing.
• Partially committed
A transaction is in this state once the normal execution of the transaction cannot
proceed.
• Aborted
A transaction is said to be aborted when the transaction has rolled back and the
database is being restored to the consistent state prior to the start of the transaction.
• Committed
A transaction is in the committed state once it has been successfully executed and the database is
transformed into a new consistent state.
A transaction starts in the active state, A transaction contains a group of statements that form
a logical unit of work. When the transaction has finished executing the last statement, it enters the
partially committed state. At this point the transaction has completed execution, but it is still possible
that it may have to be aborted. This is because the actual output may still be in the main memory and
a hardware failure can still prevent the successful completion. The database system then writes
enough information to the disk. When the last of this information is written, the transaction enters the
committed states.
A transaction enters the failed state once the system determines that the transaction can no
longer proceed with its normal execution. This could be due to hardware failures or logical errors.
Such a transaction should be rolled back. When the roll back is complete, the transaction enters the
aborted state when a transaction aborts, the system has two options as follows:
There are properties that all transactions should possess. The four basic or so-called ACID,
properties of a transaction are
• Atomicity: The 'all or nothing' property. A transaction is an indivisible unit that is either
performed in its entirety or is not performed at all. It is the responsibility of the recovery
subsystem of the DBMS to ensure atomicity.
• Consistency: A transaction must transform the database from one consistent state to another
consistent state. It is the responsibility of both the DBMS and the application developers to
ensure consistency. The DBMS can ensure consistency by enforcing all the constraints that
have been specified on the database schema, such as integrity and enterprise constraints.
However in itself this is insufficient to ensure consistency.
Example:
A transaction that is intended to transfer money from one bank account to another and
the programmer makes an error in the transaction logic and debits one account but credits the
wrong account, then the database is in an inconsistent state.
• Isolation: Transactions execute independently of one another, i.e. the partial effects of
incomplete transactions should not be visible to other transactions. It is the responsibility of
the concurrency control subsystem to ensure isolation.
Centralized database require only one DP (Data processing). AH database operations take
place at only one site, and the consequences of database operations are immediately known to the
DBMS. In contrast distributed databases make it possible for a transaction to access data at several
sites. A final commit must not be issued until all sites have committed their parts of the transaction.
The two-phase commit protocol guarantees that if a portion of a transaction operation cannot be
committed; all changes made at the other sites participating in the transaction will be undone to
maintain a consistent database state.
Each DP maintains its own transaction log. The two-phase commit protocol requires that the
transaction entry log for each DP be written before the database fragment is actually updated.
Therefore, the two-phase commit protocol requires a Do-UNDO-REDO protocol and a write-ahead
protocol.
The DO-UNDO-REDO protocol is used by the DP to roll back and / or roll forward transactions with
the help of the system's transaction log entries. The DO-UNDO-REDO protocol defines three types
of operations:
• Do performs the operation and records the "before" and "after" values in the transaction log.
• UNDO reverses an operation, using the log entries written by the DO portion of the sequence.
• REDO redoes an operation, using the log entries written by the DO portion of the sequence.
To ensure that the DO, UNDO, and REDO operations, can survive a system crash while they
are being executed, a write-ahead protocol is used. The write-ahead protocol forces the log entry to
be written to permanent storage before the actual operation takes place.
The two-phase commit protocol defines the operations between two types of nodes: The
coordinator and one or more subordinates, or cohorts. The participating
nodes agree on a coordinator. Generally, the coordinator role is assigned to the node that
initiates the transaction.
However, different systems implement various, more sophisticated election
methods.
The protocol is implemented in two phases:
Phasel: Preparation
1) The coordinator sends a PREPARE TO COMMIT message to all subordinates.
2) The subordinates receive the message. Write the transaction log, using the write-ahead
protocol and send an acknowledgement (YES / PREPARED TO COMMIT or NO / NOT
PREPARED) message to the coordinator.
3) The coordinator makes sure that all nodes are ready to commit, or it aborts the action.
If all nodes are PREPARED TO COMMIT, the transaction goes to phase-2. If one or more
nodes reply NO or NOT PREPARED, the coordinator broadcasts an ABORT message to all
subordinates.
Phase2: The Final Commit
1) The coordinator broadcasts a COMMIT message to all subordinates and waits for the
replies.
2) Each subordinate receives the COMMIT message, then updates the database using the DO
protocol.
3) The subordinates reply with a COMMITTED or NOT COMMITTED message to the
coordinator.
If one or more subordinates did not COMMIT, the coordinator sends an ABORT message,
thereby forcing them to UNDO all changes.
The objective of the two-phase commit is to ensure that all nodes commit their part of the
transaction, otherwise, the transaction is aborted. If one of the nodes fails to commit, the information
necessary to recover the database is in the transaction log, and the database can be recovered with the
DO-UNDO-REDO protocol.
4. Explain about Locking Protocols:
Locking is a procedure used to control concurrent access to data when one transaction is
accessing the database, a lock may deny access to other transactions to prevent incorrect results.
A transaction must obtain a read or write lock on a data item before it can perform a read or
write operation.
S X
S true False
X false False
The read lock is also called a shared lock. The write lock is also known as an exclusive lock.
The lock depending on its types gives or denies access to other operations on the same data item.
The basic rules for locking are
• If a transaction has a read lock on a data item, it can read the item but not update it.
• If a transition has a read lock on a data item, other transactions can obtain a read lock on the
data item, but no write locks.
• If a transaction has a write lock on a data item, it can both read and update the data item.
• If a transaction has a write lock on a data item, then other transactions cannot obtain either a
read lock or a write lock on the data item.
• All transactions that needs to access a data item must first acquire a read lock or write lock on
the data item depending on whether it is a ready only operation or not.
• If the data item for which the lock is requested is not already locked, the transaction is granted
the requested lock,
• If the item is currently lock, the DBMS determines what kind of lock is the current one. The
DBMS also finds out what lock is requested.
• If a read lock is requested on an item that is already under a read lock, then the requested will
be granted.
• If a read lock or a write lock is requested on an item that is already under a write lock, then
the request is denied and the transaction must wait until the lock is released. m
• A transaction continues to hold the lock until it explicitly releases it either during execution or
when it terminates.
• The effects of a write operation will be visible to other transactions only after the write lock is
released.
Live lock
Suppose a transaction T2 has a shared lock on a data item and another transaction T1 requests
and exclusive lock on the same data item. Ti will have to wait until T2 releases the lock. Mean
while, another transaction T3 request a shared lock on the data item. Since the lock request of
T3 is compatible to the lock granted to T2, T3 will be granted the shared lock on the data
item. At this point even if T2 releases the lock, Ti will have to wait until T3 also releases the
lock. The transaction T| can wait for an exclusive lock endlessly if other transactions continue
to request and acquire shared locks on the data item. The transaction T1 is starved (or is in
live lock), as it is not making any progress.
Two phase locking protocol requires that each transaction issue lock and unlock requests in
two phases:
1. Growing phase
A transaction may obtain locks, but may not release any lock.
2. Shrinking phase
A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once
the transaction releases a lock, it enters the shrinking phase, and it can issue not more lock requests.
The point in the schedule where the transaction has obtained its final lock (the end of its
growing phase) is called the lock point of the transaction.
Example 1:
T, T2
read - lock (y); read - lock (x);
read - item (y); read - item (x);
unlock (y); unlock ( x);
write - lock (x); write - lock (y);
read - item (x); read - item (y);
x = x + y; y = x + y;
write - item (x); write - item (y);
unlock ( x); unlock (y);
This is because the write- lock (x) operation follows the unlock (y) operation in Tj and
similarly the write - lock (y) operation follows the unlock (x) operation in T2.
Cascading rollback may occur under two-phase locking. Consider the partial schedule.
T5 T6 T7
Lock - x (A);
read (A);
lock-S(B);
read (B);
write (A);
Each transaction observes the two phase locking protocol, but the failure of T5 after the read
(A) step of T7 leads to cascading rollback of T6 and T7.
Cascading rollbacks can be avoided by a modification of two-phase locking called the strict
two-phase locking protocol. This protocol requires not only that locking be two phase, but also that
all exclusive mode locks taken by a transaction be held until that transaction commits.
Another variant of two - phase locking is the rigorous two-phase locking protocol* which requires
that all locks be held until the transaction commit.
If lock conversion is allowed then upgrading of locks (from read locked to write - locked)
must be done during the expanding phase, and downgrading of locks (from write-locked to read -
locked) must be done in the shrinking phase.
Strict two-phase locking and rigorous two-phase locking (with lock conversions) are used
extensively in commercial database systems.
In the concurrency - control schemes, each individual data item is used as the until on which
synchronization is performed.
There are circumstances, however where it would be advantages to group several data items,
and to treat them as one individual synchronization unit.
Example:
If a transaction Tj needs to access the entire database, and a locking protocol is used, then Tj
must lock each item in the database clearly, executing these locks is time-consuming. It would be
better if Tj could issue a single lock request to lock the entire database. If transaction Tj needs to
access only a few data items, it should not
Granularity
Granularity is the size of data items chosen as the unit of protection by a concurrency control
protocol.
Hierarchy of granularity
The granularity of locks is represented in a hierarchical structure where each node represents data
items of different sizes.
The root node represents the entire database, the level I Nodes represent files, the level 2
nodes represent pages, the level 3 nodes represent records, and the level 4 represent individual fields.
Database
File 1
File 2 File 3
Page 1
Page 2 Page 3
Record 1 Record 2
Field 1 Field 2
Whenever a node is locked, all its descendents are also locked. If another transaction requests
an incompatible lock on the same node, the DBMS clearly knows that the lock cannot be granted.
If another transaction requests a lock on any of the descendants of the locked node, the DBMS
checks the hierarchical path from the root to the requested node to determine if any of its ancestors
are locked before deciding whether to grant the lock. If it is already locked, it denies the request.
Additionally, a transaction may request a lock on a node and a descendant of the node is
already locked.
To make multiple granularity level locking, additional types of locks, called intention locks, are
needed,
The idea behind intention locks is for a transaction to indicate, along the path from the root to
the desired node, what type of lock it will require from one of the nodes descendants.
1. Intention - Shared (IS) indicates that a shared lock (S) will be requested on some
descendant node (S).
2. Intension - exclusive (IX) indicates that an exclusive lock) will be requested on some
descendant node (S).
3. Shared - intension - exclusive (SIX) indicates that the current node is locked in shared Mode
but an exclusive lock (S) will be requested on some descendant nodes (S).
IS IX S SIX X
SIX yes no no no no
X no no no no no
The multiple granularity locking (MGL) protocol consists of the following rules.
To ensure serializability with locking levels, a two-phase locking protocol is used as follows:
❖ No lock can be granted once any node has been unlocked.
❖ No node may be locked until its parent is locked by an intension lock.
❖ No node may be unlocked until all its descendants are unlocked.
The notation < lock - types > (< item>), is used to display the locking operations in the
schedule.
Deadlock occurs when each transaction T in a set of two or more transactions is waiting for some
item that is locked by some other transaction T in the set.
There is only one way to break deadlock: abort one or more of the transactions. This usually
involves undoing all the changes made by the aborted transactions (S).
Timeouts
A transaction that requests a lock will wait for only a system defined period of time. If the
lock has not been granted within this period, the lock request times out. In this case, the DBMS
assumes the transaction may be deadlocked, even though it may not be, and it aborts and
automatically restarts the transaction.
Deadlock Prevention
Wait - Die algorithm allows only an older transaction to wait for a younger one otherwise the
transaction is aborted and restarted with the same timestamp so that eventually it will become the
oldest active transaction and will not die.
Wound - wait, allows only a younger transaction can wait for an older one. if an older
transaction requests a lock held by a younger one the younger one is aborted.
Deadlock detection and Recovery
Deadlock detection is usually handled by the construction of a wait - for graph (WFG) that
shows the transaction dependencies, that is transaction Tj is dependent on Tj if transaction Tj holds
the lock on a data item that Tj is waiting for,
When a detection algorithm determines that a deadlock exists, the system must recover from
the deadlock. The most common solution is to roll back one or more transactions to break the
deadlock.
Starvation occurs when the same transaction is always chosen as the victim, and the
transaction can never complete.
Schedule is a sequence of the operations by a set of concurrent transactions that preserves the
order of the operations in each of the individual transactions.
Serial schedule is a schedule where the operations of each transaction are executed
consecutively without any interleaved operations from other transactions.
In a serial schedule, the transactions are performed in serial order, ie if Tj and T2 are
transactions, serial order would be Tj followed by T2 or T2 followed by Tj.
Non serial schedule is a schedule where the operations from a set of concurrent transactions
are interleaved.
The objective of serializability is to find non serial schedules that allow transactions to
execute concurrently without interfering with one another, and there by produce a database state that
could be produced by a serial execution.
Conflict serializability:
❖ It two transactions only read a data item, they do not conflict and order is not important.
❖ If two transactions either read or write completely separate data items, they do not conflict
and order is not important.
It one transaction writes a data item and another either reads or writes the same data
item, the order of execution is important.
The instructions I; and Ij conflict if they are operations by different transactions on the
same data item, and atleast one of these instructions is a write operation.
SI
T, T2
read (A)
write (A)
read (A)
write (A)
read (B)
write (B)
read (B)
write (B)
The write (A) instruction of Tj conflicts with the read (A) instruction of T 2. However, the
write (A) instruction of T2 does not conflict with the read (B) instruction of T t, because the two
instructions access different data items.
If a schedule S can be transformed into a schedule S' by a series of swaps of non conflicting
instructions, then S and S' are conflict equivalent.
Example
Consider the previous schedule swap the non conflicting instructions of previous schedule, (ie Si)
Swap the read (B) instruction of T| with the read (A) instruction of T2.
❖ Swap the write (B) instruction of T] with the write (A) instruction of T2.
❖ Swap the write (B) instruction of T] with the read (A) instruction of T2. The final result of
these swaps is a serial schedule.
S2
T, T2
read (A)
write (A)
read (B)
write (B)
read (A)
write (A)
read (B)
write (B)
Si is conflict equivalent to S2. The concept of conflict equivalence leads to the concept of conflict
serializability.
A schedule S is conflict serializable if it is conflict equivalents to a serial schedule.
Thus schedule S2 is conflict serializable, since it is conflict equivalent to the serial schedule S|.
View serializability:
The schedules S and S' are said to be view equivalent if the following conditions met:
For each data item x {< if transaction Ti reads the initial value of x in schedule S, then
transaction Tj must, in schedule S\ also read the initial value of x.
For each data item x , if transaction Tj executes read ( x) in schedule S, and if that
value was produced by a write (x) operation executed by transaction Tj, then the
read ( x) operation of transaction Tj must, in schedule S, also read the value of x
that was produced by the same write (x) operation of transaction T;.
For each data item xt the transaction that performs the final write (x) operation in
schedule S must perform the final write (x) operation in schedule S'.
Example
Schedule 1
T, T2
read (A)
write (A)
read (B)
write (B)
read (A)
write (A)
read (B)
write (B)
Schedule 2
Ti T2
read (A)
write (A)
read (A)
write (A)
read (B)
write (B)
read (B)
write (B)
The concept of view equivalent leads to the concept of view serializability. A schedule S is
view serializable if it is view equivalent to a serial schedule.
UNIT-V