DBMS Notes - All Units
DBMS Notes - All Units
DBMS Notes - All Units
In DBMS, only required information can be access instead of complete file through a simple query.
In DBMS, the related data resides in the same storage location or the same information may not be
duplicated as a result minimizing data redundancy and reduced data inconsistency. In DBMS there
is no redundancy and no inconsistency.
The redundancy of data is low in the DBMS or less data redundancy in DBMS
DBMS provides data consistency through normalization. In DBMS, there is no data integrity
problem.
Example of data integrity: User cannot insert string data in integer column
Database security includes protecting the database and its various applications that access it.
Organizations must secure databases from virus attacks such as cyber security threats, as well as the
misuse of data and databases from those who can access them.
Data can be shared (multi-user can access data from a single machine). DBMS provide multiple
user interfaces
Unauthorized access is restricted in DBMS
DBMS system provides backup and recovery1 of data even if it is lost. DBMS provides a crash
recovery mechanism i.e., DBMS protects the user from the system failure.
2
You can easily query data in a database using the SQL language
Database-System Applications
Databases are widely used. Here are some representative applications:
• Enterprise Information
◦ Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other accounting information.
◦ Human resources: For information about employees, salaries, payroll taxes, and benefits, and for
generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of items in factories,
inventories of items in warehouses and stores, and orders for items
Online retailers: For sales data noted above plus online order tracking, generation of recommendation lists,
and maintenance of online product evaluations.
• Airlines: For reservations and schedule information. Airlines were among the first to use databases in a
geographically distributed manner.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balances on
prepaid calling cards, and storing information about the communication networks
In the early days, database applications were built directly on top of file systems. A DBMS provides users with
a systematic way to create, retrieve, update and manage data. It is a middleware between the databases which
store all the data and the users or applications which need to interact with that stored database. A DBMS can
limit what data the end user sees, as well as how that end user can view the data, providing many views of a
single database schema.
2
3
• Difficulty in accessing data. Suppose that one of the university clerks needs to find out the names of all students who
live within a particular postal-code area. The clerk asks the data-processing department to generate such a list. Because
the designers of the original system did not anticipate this request, there is no application program on hand to meet it.
There is, however, an application program to generate the list of all students. The university clerk has now two choices:
either obtain the list of all students and extract the needed information manually or ask a programmer to write the
necessary application program. Both alternatives are obviously unsatisfactory.
• Data isolation. Because data are scattered in various files, and files may be in different formats, writing new application
programs to retrieve the appropriate data is difficult.
• Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints.
• Atomicity problems. A transaction must happen in its entirety or not at all requirement is called Atomicity problem. It
is difficult to ensure atomicity in a conventional file-processing system.
• Concurrent-access anomalies. For the sake of overall performance of the system and faster response, many systems
allow multiple users to update the data simultaneously, which may lead to concurrent access anomalies.
• Security problems. Not every user of the database system should be able to access all the data.
Uses of DBMS
View of Data
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data.
A major purpose of a database system is to provide users with an abstract view of the data. That
is, the system hides certain details of how the data are stored and maintained.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers
to use complex data structures to represent data in the database. Since many database-system users
are not computer trained, developers hide the complexity from users through several levels of
abstraction, to simplify users’ interactions with the system:
• Physical level. The lowest level of abstraction describes how the data are actually stored. The
physical level describes complex low-level data structures in detail.
• Logical level. The next-higher level of abstraction describes what data are stored in the database,
and what relationships exist among those data.
Although implementation of the simple structures at the logical level may involve complex
physical-level structures, the user of the logical level does not need to be aware of this complexity.
This is referred to as physical data independence.
Database administrators, who must decide what information to keep in the database, use the
logical level of abstraction. 3
• View level. The highest level of abstraction describes only part of the entire database.
4
Figure below shows the relationship among the three levels of abstraction.
View of data
Data models
Data model means to model the data i.e., to give a shape to the data and to give a figure to the
stored data. A data model makes it easier to understand the meaning of the data by its figure.
In simple words, we can define data model as “a collection of high-level data description that hide
many low-level storage details. A data model can also be defined as a collection of conceptual tools
for describing data, data relationships and consistency constraints”. A DBMS allows a user to define
the stored data in terms of data model.
There are various types of data model but the relational model is the most widely used model.
The different Data Models in DBMS are:
1. Hierarchical Model
2. Network Model
3. Relational Model
4. Object-Oriented Model
5. Object-Relational Model
6. Entity-Relationship Model
1. Hierarchical Model
Hierarchical Model was the first DBMS model and one of the oldest Database Model The general
shape of this model is like an Organizational chart (Example-2). A node on the chart represents a
4
particular entity. The terms parent and child are used in describing a hierarchical model. This
5
model organizes the data in the hierarchical tree structure. The hierarchy starts from the root which
has root data and then it expands in the form of a tree adding child node to the parent node.
This model used the tree as its basic structure. A tree is a data structure that consists of a hierarchy of
nodes, with a single node called the root, at the highest level. A node represents a particular entity.
A node may have any number of children, but each child node may have only one parent node.
Syntax
2. Network Model
This model is an extension of the hierarchical model. It was the most popular model before the
relational model. This model is the same as the hierarchical model; the only difference is that a
record can have more than one parent. The network model was developed to overcome the limited
scope of hierarchical model. It replaces the hierarchical tree with a graph.
5
6
In Network Model, multiple parent-child relationships are used. The network model uses a network
structure, which is a data structure of nodes and branches.
The main difference of Network model and hierarchical model is that a network model permits a
child node to have more than one parent nodes, whereas hierarchical model dos not allows a child
node to have multiple parent nodes.
Example-1:
3. Relational Model
Relational Model is the most widely used model. In this model, the data is maintained in the form
of a two-dimensional table. All the information is stored in the form of rows and columns. The
basic structure of a relational model is tables. So, the tables are also called relations in the
relational model.
For example student roll number, student name, student course etc., and columns also known as
attributes. An attribute represents the characteristics of a person, place or thing, for example
Salary attribute in a given below example.
6
7
The most popular and extensively used data model is the relational data model. The data model
allows the data to be stored in tables called a relation. The relations are normalized and the
normalized relation values are known as atomic values. Each of the rows in a relation is called tuples
which contains the unique value. The attributes are the values in each of the columns which are of the
same domain.
7
8
Popular examples of standard relational databases include Microsoft SQL Server, Oracle Database,
MySQL and IBM DB2.
4. Object-Oriented Model
The object-oriented model is based on a collection of objects. An object contains values stored in
variables within the object. An object also contains code that operates on the object. This code is
called methods. Objects that contain the same types of values and the same methods are grouped
together into classes. A class may be viewed as a definition for objects.
The only way in which one object can access the data of another object is by invoking a method of
that other object. This action is called sending a message to the object. An object-oriented data
model is one of the most developed data models which contain video, graphical files, and audio.
In this model, both the data and relationship are present in a single structure known as an object.
We can store audio, video, images, etc in the database which was not possible in the relational
model. Although you can store audio and video in relational database, it is advised not to store in
the relational database. In this model, two or more objects are connected through links. We use this
link to relate one object to other objects.
5. Object-Relational Model
An Object relational model is a combination of a Object oriented model and a Relational model. So,
it supports objects, classes, inheritance etc. just like Object Oriented models and has support for data
types, tabular structures etc. like Relational data model.
One of the major goals of Object relational data model is to close the gap between relational database
and the object oriented database frequently used in many programming languages such as C++, C#,
Java etc.
We can have many advanced features like we can make complex data types according to our
requirements using the existing data types. The problem with this model is that this can get complex
and difficult to handle.
8 Oracle
Example: ORDBMSs include PostgreSQL and
9
6. Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model diagram. In this model,
we represent the real-world problem in the pictorial form to make it easy to understand. It is also
very easy for the developers to understand the system by just looking at the ER diagram. We use the
ER diagram as a visual tool to represent an ER Model.
An ER model is the logical representation of data as objects and relationships among them. These
objects are known as entities, and relationship is an association among these entities. It was widely
used in database designing. A set of attributes describe the entities. ER Model is best used for the
conceptual design of a database.
1. Entity
2. Attribute
3. Relationship
Syntax:
Example:
9
10
.
Database languages
10
11
Database users
11
Database users are categorized based up on their interaction with the data base.
12
2. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They check
whether all the requirements of end users are satisfied.
3. Sophisticated Users :
Sophisticated users can be engineers, scientists, business analyst, who are familiar with the
database. They can develop their own data base applications according to their requirement. They
don’t write the program code but they interact the data base by writing SQL queries directly
through the query processor.
4. Data Base Designers :
Data Base Designers are the users who design the structure of data base which includes tables,
indexes, views, constraints, triggers, stored procedures. He/she controls what data must be stored
and how the data items to be related.
5. Application Program : 12
Application Program are the back end programmers who writes the code for the application
13
programs.They are the computer professionals. These programs could be written in Programming
languages such as Visual Basic, Developer, C, FORTRAN, COBOL etc.
.
Various components of overall DBS architecture
1. Query Processor: It interprets the requests (queries) received from end user via an application
program into instructions. It also executes the user request which is received from the DML
compiler.
Query Processor contains the following components
13 –
14
DML Compiler: It processes the DML statements into low level instruction (machine
language), so that they can be executed.
DDL Interpreter: It processes the DDL statements into a set of table containing meta data
(data about data).
Embedded DML Pre-compiler: It processes DML statements embedded in an application
program into procedural calls.
Query Optimizer: It executes the instruction generated by DML Compiler.
2. Storage Manager: Storage Manager is a program that provides an interface between the data
stored in the database and the queries received. It is also known as Database Control System. It
maintains the consistency and integrity of the database by applying the constraints and executing
the DCL statements. It is responsible for updating, storing, deleting, and retrieving data in the
database.
It contains the following components –
Authorization Manager: It ensures role-based access control, i.e,. checks whether the
particular person is privileged to perform the requested operation or not.
Integrity Manager: It checks the integrity constraints when the database is modified.
File Manager: It manages the file space and the data structure used to represent
information in the database.
Buffer Manager: It is responsible for cache memory and the transfer of data between the
secondary storage and main memory.
Data Dictionary: It contains the information about the structure of any database object. It
14
is the repository of information that governs the metadata.
15
ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used
For example, Suppose we design a school database. In this database, the student will be an entity
with attributes like address, name, id, age, etc. The address can be another entity with attributes like
city, street name, pin code, etc and there will be a relationship between them.
Component of ER Diagram
15
16
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be represented
as rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be taken
as an entity.
a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't contain any
key attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
16
17
a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The composite
attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute. The
double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
17
18
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute like Date
of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to
represent the relationship.
a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known as one to one
relationship.
For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then this is known as a one-to-many relationship.
18
19
For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the right
associates with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
Next
Relational Model was proposed by E.F. Codd to model data in the form of relations or tables. After
designing the conceptual model of Database using ER diagram, we need to convert the conceptual
model in the relational model which can be implemented using any RDBMS languages like Oracle
19
SQL, MySQL etc. So we will see what Relational Model is.
20
Relational Model represents how data is stored in Relational Databases. A relational database
stores data in the form of relations (tables).
Relational model can represent as a table with columns and rows. Each row is known as a tuple. Each
table of the column has a name or attribute.
Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a
domain, dom(Ai)
Relational instance: In the relational database system, the relational instance is represented by a finite
set of tuples. Relation instances do not have duplicate tuples.
Relational schema: A relational schema contains the name of the relation and name of all columns
or attributes.
Relational key: In the relational key, each row has one or more attributes. It can identify the row in
the relation uniquely.
The DDL Commands in Structured Query Language are used to create and modify the schema of the
database and its objects. The syntax of DDL commands is predefined for describing the data. The
commands of Data Definition Language deal with how the data should exist in the database.
2. DROP Command
3. ALTER Command
4. TRUNCATE Command
5. RENAME Command
CREATE Command
CREATE is a DDL command used to create databases, tables, triggers and other database objects.
Example 1: This example describes how to create a new database using the CREATE DDL
command.
Suppose, you want to create a Books database in the SQL database. To do this, you have to write the
following DDL Command:
Example 2: This example describes how to create a new table using the CREATE DDL
command.
Example:
Suppose, you want to create a Student table with five columns in the SQL database. To do this, you
have to write the following DDL command:
Age Int ,
Marks Int ,
);
DROP Command
DROP is a DDL command used to delete/remove the database objects from the SQL database. We
can easily remove the entire table, view, or index from the database using this DDL command.
Example 1: This example describes how to remove a database from the SQL database.
Suppose, you want to delete the Books database from the SQL database. To do this, you have to write
the following DDL command:
Example 2: This example describes how to remove the existing table from the SQL database.
Suppose, you want to delete the Student table from the SQL database. To do this, you have to write
the following DDL command:
Example 3: This example describes how to remove the existing index from the SQL database.
Suppose, you want to delete the index_city from the SQL database. To do this, you have to write the
following DDL command:
ALTER Command
ALTER is a DDL command which changes or modifies the existing structure of the database, and it
22
also changes the schema of database objects.
23
We can also add and drop constraints of the table using the ALTER command.
Example 1: This example shows how to add a new field to the existing table.
Suppose, you want to add the 'Father's_Name' column in the existing Student table. To do this, you
have to write the following DDL command:
Example 2: This example describes how to remove the existing column from the table.
Suppose, you want to remove the Age and Marks column from the existing Student table. To do this,
you have to write the following DDL command:
Example 3: This example describes how to modify the existing column of the existing table.
Suppose, you want to change the character size of the Last_Namefield of the Student table. To do
this, you have to write the following DDL command:
TRUNCATE Command
TRUNCATE is another DDL command which deletes or removes all the records from the table.
This command also removes the space allocated for storing the table records.
Example
Suppose, you want to delete the record of the Student table. To do this, you have to write the
following TRUNCATE DDL command:
The above query successfully removed all the records from the student table. Let's verify it by using
the following SELECT statement:
RENAME Command
RENAME is a DDL command which is used to change the name of the database table.
Example
This query changes the name of the table from Student to Student_Details.
The DML commands in Structured Query Language change the data present in the SQL database.
We can easily access, store, modify, update and delete the existing records from the database using
DML commands.
1. SELECT Command
2. INSERT Command 24
3. UPDATE Command
25
4. DELETE Command
SELECT is the most important data manipulation command in Structured Query Language. The
SELECT command shows the records of the specified table. It also shows the particular record of a
particular column by using the WHERE clause.
Here, column_Name_1, column_Name_2, ….., column_Name_N are the names of those columns
whose data we want to retrieve from the table.
If we want to retrieve the data from all the columns of the table, we have to use the following
SELECT command:
Example 1: This example shows all the values of every column from the table.
This SQL statement displays the following values of the student table:
Example 2: This example shows all the values of a specific column from the table.
This SELECT statement displays all the values of Emp_Salary and Emp_Id column
of Employee table:
Example 3: This example describes how to use the WHERE clause with the SELECT DML
command.
BCA1001 Abhay 80
BCA1003 Bheem 80
BCA1005 Sumit 80
INSERT is another most important data manipulation command in Structured Query Language,
which allows users to insert data in database tables.
Example 1: This example describes how to insert the record in the database table.
Let's take the following student table, which consists of only 2 records of the student.
101 Ramesh 92 20
201 Jatin 83 19
Suppose, you want to insert a new record into the student table. For this, you have to write the
following DML INSERT command:
UPDATE is another most important data manipulation command in Structured Query Language,
which allows users to update or modify the existing data in database tables.
Suppose, you want to update Stu_Marks and Stu_Age of that student whose Stu_Id is 103 and 202.
To do this, you have to write the following DML Update command:
UPDATE Student SET Stu_Marks = 80, Stu_Age = 21 WHERE Stu_Id = 103 AND Stu_Id = 202;
DELETE is a DML command which allows SQL users to remove single or multiple existing records
from the database tables.
This command of Data Manipulation Language does not delete the stored data permanently from the
database. We use the WHERE clause with the DELETE command to select specific rows from the
table.
Suppose, you want to delete that product from the Product table whose Product_Id is P203. To do
this, you have to write the following DML DELETE command:
Suppose, you want to delete the record of those students whose Marks is greater than 70. To do this,
you have to write the following DML Update command:
Data integrity refers to the overall accuracy, completeness, and reliability of data. It can be specified
by the lack of variation between two instances or consecutive updates of a record, indicating that
your information is error-free. It also corresponds to the security and integrity controls and methods
of regulatory compliance.
Data security is important for protecting information from unauthorized access or modification,
which could lead to data loss or corruption.
Data security refers to the protection of data, while data integrity refers to the trustworthiness of
data. Data security focuses on how to minimize the risk of leaking intellectual property, business
documents, healthcare data, emails, trade secrets, and more.
Data integrity and data security are both important for ensuring the safety and reliability of
information. Both data integrity and data security are important for ensuring the safety and
reliability of information. Data integrity is the process of ensuring that data is accurate and
complete. Data security is the process of protecting data from unauthorized access.
UNIT - II
Syllabus:
Relational Approach: Relational algebra and calculus, Relational algebra, selection and
projection, set operations, renaming, joins, division, examples of algebra queries, relational
calculus: Tuple relational calculus, Domain relational calculus, expressive power of algebra and
calculus.
Relational algebra is a procedural query language. It gives a step by step process to obtain the result
of the query. It uses operators to perform queries.
.
What is the need of relational algebra?
The main application of relational algebra is to provide a theoretical foundation for relational
databases, particularly query languages for such databases, chief among which is SQL. Relational
databases store tabular data represented as relations.
1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).
Notation: σ p(r)
Where:
BRANC LOA AM 29
H_NAM N_N OU
30
E O NT
Input:
σ BRANCH_NAME="perryride" (LOAN)
Output:
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result. Rest of the
attributes are eliminated from the table.
o It is denoted by ∏.
Where
E T
Input:
Output:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples that are either in
R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
1. Notation: R ∪ S
Example:
DEPOSITOR RELATION
CUSTOMER_ ACCOUN
NAME T_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NA LOAN_
ME NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
Output:
CUSTOMER_NA
ME
32
Johnson
33
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in
both (common) R & S.
o It is denoted by intersection ∩.
1. Notation: R ∩ S
Input:
Output:
CUSTOMER_NAM
E
Smith
Jones
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in R
but not in S. 33
o It is denoted by intersection minus (-).
34
1. Notation: R - S
Input:
Output:
CUSTOMER_NAM
E
Jackson
Hayes
Willians
Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the other table. It
is also known as a cross product.
o It is denoted by X.
Notation: E X D
Example:
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_ DEPT_NA
NO ME
A Marketing 34
35
B Sales
C Legal
Input:
EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)
Relational Calculus
Relational calculus is a non procedural query language. It uses mathematical predicate calculus
instead of algebra. It provides the description about the query to get the result where as relational
algebra gives the method to get the result. It informs the system what to do with the relation, but does
not inform how to perform it. 35
36
A tuple relational calculus is a non procedural query language which specifies to select the tuples in a
relation. It can select the tuples with range of values or tuples for certain attribute values etc. The
resulting relation can have one or more tuples.
The variable which is used in the condition is called tuple variable.
o The tuple relational calculus is specified to select the tuples in a relation. In TRC, filtering variable
uses the tuples of a relation.
It is denoted as below:
{t | EMPLOYEE (t) and t.SALARY>10000} – implies that it selects the tuples from EMPLOYEE
relation such that resulting employee tuples will have salary greater than 10000. It is example of
selecting a range of values.
{t | EMPLOYEE (t) AND t.DEPT_ID = 10} – this select all the tuples of employee name who work
for Department 10.
In above example t.SALARY and t.DEPT_ID are tuple variables. In the first example above, we have
specified the condition t.SALARY >10000. What is the meaning of it?
For all the SALARY>10000, display the employees. Here the SALARY is called as bound variable.
Any tuple variable with ‘For All’ (?) or ‘there exists’ (?) condition is called bound variable. Here, for
any range of values of SALARY greater than 10000, the meaning of the condition remains the same.
36
37
Bound variables are those ranges of tuple variables whose meaning will not change if the tuple
variable is replaced by another tuple variable.
In the second example, we have used DEPT_ID= 10. That means only for DEPT_ID = 10 display
employee details. Such variable is called free variable. Any tuple variable without any ‘For All’ or
‘there exists’ condition is called Free Variable. If we change DEPT_ID in this condition to some
other variable, say EMP_ID, the meaning of the query changes.
For example, if we change EMP_ID = 10, then above it will result in different result set. Free
variables are those ranges of tuple variables whose meaning will change if the tuple variable is
replaced by another tuple variable.
o The second form of relation is known as Domain relational calculus. In domain relational calculus,
filtering variable uses the domain of attributes.
o Domain relational calculus uses the same operators as tuple calculus. It uses logical connectives ∧
(and), ∨ (or) and ┓ (not).
o It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.
Notation:
Where
In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as done in
TRC, mentioned above).
Here, < x1, x2, …,xn > ϵ r, where ‘r’ is a relation on ‘n’ attributes and x1, x2, …,xn are
domain variables.
Example-1 : Find the Faculty No, Faculty name, DNo, Qual and salaries of all the faculty, who
are drawing more than 30000.
Output :
FNo FName DNo Qual Salary
22 Riya 21 Ph.D 55000
24 Priya 22 M. Tech 50000
25 Zayn 22 M. Tech 42000
30 Sia 23 M. Tech 32000
33 Max 24 Ph.D 53000
35 Fred 24 Ph.D 52000
40 Trent 25 M. Tech 34000
38
39
UNIT - III
Syllabus:
It provides a mechanism for storage and retrieval of data other than tabular relations model used in
relational databases. NoSQL database doesn't use tables for storing data. It is generally used to store
big data and real-time web applications.
In the early 1970, Flat File Systems are used. Data were stored in flat files and the biggest problems
with flat files are each company implement their own flat files and there are no standards. It is very
difficult to store data in the files, retrieve data from files because there is no standard way to store
data.
Then the relational database was created by E.F. Codd and these databases answered the question of
having no standard way to store data. But later relational database also get a problem that it could not
handle big data, due to this problem there was a need of database which can handle every types of
problems then NoSQL database was developed.
NoSQL, is an approach to database design that enables the storage and querying of data outside the
traditional structures found in relational databases. While it can still store data found within relational
database management systems (RDBMS), it just stores it differently compared to an RDBMS. The
decision to use a relational database versus a non-relational database is largely contextual, and it
varies depending on the use case.
NoSQL is also type of distributed database, which means that information is copied and stored on
various servers, which can be remote or local. This ensures availability and reliability of data. If some
of the data goes offline, the rest of the database can continue to run.
Today, companies need to manage large data volumes at high speeds with the ability to scale up
quickly to run modern web applications in nearly every industry. In this era of growth within
cloud, big data, and mobile and web applications, NoSQL databases provide that speed and
scalability, making it a popular choice for their performance and ease of use.
Advantages of NoSQL
o It supports query language.
o It provides fast performance.
o It provides horizontal scalability.
1. Document databases.
2. Key-value stores.
3. Column-oriented databases.
4. Graph databases.
1. Document databases:
A document database stores data in JSON, BSON, or XML documents (not Word documents or
Google Docs, of course). In a document database, documents can be nested. Particular elements can
be indexed for faster querying.
Documents can be stored and retrieved in a form that is much closer to the data objects used in
applications, which means less translation is required to use the data in an application. SQL data must
often be assembled and disassembled when moving back and forth between applications and storage.
Document databases are popular with developers because they have the flexibility to rework their
document structures as needed to suit their application, shaping their data structures as their
application requirements change over time. This flexibility speeds development because, in effect,
data becomes like code and is under the control of developers. In SQL databases, intervention by
database administrators may be required to change the structure of a database.
2. Key-value stores:
The simplest type of NoSQL database is a key-value store. Every data element in the database is
40
stored as a key value pair consisting of an attribute name (or "key") and a value. In a sense, a key-
41
value store is like a relational database with only two columns: the key or attribute name (such as
"state") and the value (such as "Alaska").
Use cases include shopping carts, user preferences, and user profiles.
.
3. Column-oriented databases:
While a relational database stores data in rows and reads data row by row, a column store is
organized as a set of columns. This means that when you want to run analytics on a small number of
columns, you can read those columns directly without consuming memory with the unwanted data.
Columns are often of the same type and benefit from more efficient compression, making reads even
faster. Columnar databases can quickly aggregate the value of a given column (adding up the total
sales for the year, for example). Use cases include analytics.
4. Graph databases:
A graph database focuses on the relationship between data elements. Each element is stored as a node
(such as a person in a social media graph). The connections between elements are called links or
relationships. In a graph database, connections are first-class elements of the database, stored
directly. In relational databases, links are implied, using data to express the relationships.
A graph database is optimized to capture and search the connections between data elements,
overcoming the overhead associated with JOINing multiple tables in SQL.
Very few real-world business systems can survive solely on graph queries. As a result graph
databases are usually run alongside other more traditional databases.
Use cases include fraud detection, social networks, and knowledge graphs.
SQL NoSQL
RELATIONAL DATABASE
MANAGEMENT SYSTEM
(RDBMS) Non-relational or distributed database system.
SQL NoSQL
These databases are best These databases are not so good for complex
suited for complex queries queries
Pitfalls of RDBD:
1. Poor Design/Planning:
Improper planning of the database leaves you with no time to go back and fix errors and leads to
malicious cyber attacks. Therefore, consider sitting down with a paper and drawing a data model as
per business requirements.
Developers can avoid poor planning/design by checking off the following points.
2. Ignoring Normalization:
Normalization or SQL (Structured Query Language) groups data under a single table and indirectly
related data are put under separate tables. These tables are connected with a logical relationship
between child and parent tables.
Lack of normalization reduces data consistency and leads to duplication of data because entire data
isn’t stored in one place.
3. Redundant Records:
Redundancy in a database is a condition in which the same set of data is stored at two different
places. This means two different spots or two separate fields in software. This is a problem for
developers because they have to keep several versions of data updated.
42
43
Redundant records lead to unnecessary growth in database size, which in turn decreases the
efficiency of the database and causes data corruption. Hence, the best option is to avoid preparing
redundant records, unless it’s essential for backups.
The purpose of naming is to allow all future developers or programmers to easily understand the
components of databases and what was their use. This saves time for developers and they need not go
through documents to understand the meaning of a name.
5. Lack of Documentation
Lack of documentation leads to the loss of vital information or a tedious handover process to a new
programmer.
Consider documenting everything you know from day one because any documentation is better than
none. Well-organized documentation throughout the project helps to wrap up everything smoothly
and in turn, helps build robust software.
The goal of documentation is to provide information to the support programmer to detect bugs and
fix them. Documentation starts with naming columns, objects, and tables in a database model.
Clarity of purpose helps to develop database designs, record size, create entities, choose database
engine management policies, and formats. By ignoring these, designs will be flawed fundamentally,
even though they are mathematically and structurally right.
8. Insufficient Indexing:
Insufficient indexing comes from a SQL configuration whose performance is affected due to
improper, excessive, or missing indexes. In case indexes aren’t created properly, the SQL server goes
through more records to retrieve the data that’s requested by the query.
Overall, indexing is a complex decision because too much indexing is bad as little indexing, as it
impacts the final outcome. 43
44
9. Lack of Testing:
The lack of database testing fails to give information on whether the data values stored and received
in the database are valid or not. Testing helps to save transaction data, avoids data loss, and prevents
unauthorized access to information.
Intersections of R1 and R2 cannot be Null. The sub relation must contain a common
attribute. The common attribute must contain unique data.
Functional dependencies:
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
For example:
44
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
45
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if
we know the Emp_Id, we can tell that employee name associated with it.
Emp_Id → Emp_Name
Example:
Example:
ID → Name,
Name → DOB
45
Armstrong axioms:
46
Armstrong’s Axioms are a set of rules of axioms. It was developed by William W.Armstrong in
1974. It is used to infer all the functional dependencies on a relational database.
Primary Rule:
Reflexive Rule
Augmentation Rule
Transitive Rule
Reflexive Rule:
If Y ⊆ X then X -> Y if Y is a subset of the attribute of X.
If X -> Y then XZ -> YZ for a pair of tuple t1 and t2.
∴ t1[X] = t2[X] then it implies that
∴ t1[Y] = t2[Y]
on the other hand,
If t1[XZ] = t2[XZ] then it implies that
∴ t1[YZ] = t2[YZ] then
∴ t1[XZ] = t2[YZ]
Augmentation Rule:
if X -> Y
then XZ -> YZ
Transitive Rule:
If X -> Y and Y -> Z
then If X -> Z
Secondary Rule:
Decomposition Rule
Union Rule
Pseudo Transitive Rule
Decomposition Rule:
If X -> YZ
then X -> Y and X -> Z
Union Rule:
If X -> Y and X -> Z
then X -> YZ
Normalization for relational databases 1st, 2nd and 3rd normal forms:
For a table to be in the First Normal Form, it should follow the following 4 rules:
EXAMPLE:
N
EMPL A JOB STAT HOME
OYEE_ M _CO JO E_CO _STAT
ID E DE B DE E
A
li
c Ch Michig
E001 e J01 ef 26 an
A
li W
c ait Michig
E001 e J02 er 26 an
B W
o ait Wyomi
E002 b J02 er 56 ng
Ba
B rte
o nd Wyomi
E002 b J03 er 56 ng
A
li
c Ch Wyomi
E003 e J01 ef 56 ng
All the entries are atomic and there is a composite primary key (employee_id, job_code) so the table
is in the first normal form (1NF).
This table is not in 2NF.
BCNF (Boyce Codd Normal Form) is the advanced version of 3NF. A table is in BCNF if every
functional dependency X->Y, X is the super key of the table. For BCNF, the table should be in 3NF,
and for every FD. LHS is super key.
Example
Consider a relation R with attributes (student, subject, teacher).
jhansi K.Das C
subbu R.Prasad 49 C
50
Teacher Subject
P.Naresh Database
K.DAS C
R.Prasad C
R2
Student Teacher
Jhansi P.Naresh
Jhansi K.Das
Subbu P.Naresh
Subbu R.Prasad
All the anomalies which were present in R, now removed in the above two relations.
Note
BCNF decomposition does not always satisfy dependency preserving property. After BCNF
decomposition if dependency is not preserved then we have to decide whether we want to remain in
BCNF or rollback to 3NF. This process of rollback is called denormalization.
Thus, it implies that there is a presence of multiple other rows in the very same table. Thus, a
multivalued dependency would prevent the 4NF. Any multivalued dependency would at least involve
three attributes of any table.
o Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a third attribute
that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and
black) of each model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of
each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
Whenever we can recreate a table by simply joining various tables where each of these tables consists
of a subset of the table’s attribute, then this table is known as a Join Dependency. Thus, it is like a
generalization of MVD. We can relate the JD to 5NF. Herein, a relation can be in 5NF only when it’s
already in the 4NF. Remember that it cannot be further decomposed.
Example:
<Student>
Barry PR GK002
We can decompose the table given above into these three tables given below. And thus, it is not in
the Fifth Normal Form.
<Student_Skills>
Stu_Na Stu_Skil
me ls
Tag Marketin
g
Barry PR
Paulo Graphic
Designin
g
<Student_Job>
Stu_Nam Stu_Jo
e b
Tag GK001
52
Barry GK002
53
Paulo GK002
<Job_Skills>
Stu_Skills Stu_Job
Marketing GK001
PR GK002
Graphic GK003
Designing
The relations given above have join dependency. Thus, they do not happen to be in 5NF. It means
that the join relation of the three relations given above is equal to the very original relation
<Student>.
OR
.
4th and 5th normal forms:
53
54
Example:
STUDENT
ST COU HO
U_I RSE BB
D Y
21 Comp Danci
uter ng
21 Math Singi
ng
34 Chemi Danci
stry ng
74 Biolog Crick
y et
59 Physic Hock
s ey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
ST CO
U_ UR
ID SE
21 Com
puter
21 Math
34 Che
mistr
y
74 Biol
54 ogy
55
59 Phys
ics
STUDENT_HOBBY
ST HO
U_ BB
ID Y
21 Dan
cing
21 Sing
ing
34 Dan
cing
74 Cric
ket
59 Hoc
key
Example
r1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math
class for Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be
taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as
a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
.
P1:
SEME SUB
STER JEC
T
Semeste Comp
r1 uter
Semeste Math
r1
Semeste Chemi
r1 stry
Semeste Math
r2
P2:
SUB LECT
JEC URER
T
Comp Anshika
uter
56
Comp John
57
uter
Math John
Math Akash
Chemi Praveen
stry
P3:
SEMST LECTUR
ER ER
Semester Anshika
1
Semester John
1
Semester John
1
Semester Akash
2
Semester Praveen
1
.
.
Difference between 4NF and 5NF :
S.N
O 4NF 5NF
A relation in 4NF
must also be in A relation in 5NF must
BCNF(Boyce Codd also be in 4NF(Fourth
1. Normal Form). Normal Form).
A relation in 4NF
must not have any A relation in 5NF must
multi-valued not have any join
2. dependency. dependency.
3. A relation in 4NF
57 A relation in 5NF is
may or may not be always in 4NF
58
S.N
O 4NF 5NF
in 5NF.
If a relation is in
Fourth Normal Form If a relation is in Fifth
then it will have Normal Form then it
5. more redundancy. will less redundancy.
If a relation is in Fifth
Normal Form then it
If a relation is in cannot be decomposed
Fourth Normal Form further into sub-
then it may be relations without any
decomposed further modification in
6. into sub-relations. meaning or facts.
TOPICS:
Transaction processing
Transaction processing ensures that related data is added to or deleted from the database
simultaneously, thus preserving data integrity in your application. In transaction processing, data is
not written to the database until a commit command is issued. When this happens, data is
permanently written to the database.
Examples include systems that manage sales order entry, airline reservations, payroll, employee
records, manufacturing, and shipping.
Atomic
Either all database changes for an entire transaction are completed or none of the changes are
completed.
Consistent
Isolated
Transactions from concurrent applications do not interfere with each other. The updates from a
transaction are not visible to other transactions that execute concurrently until the transaction
commits.
Durable
Transaction concept
o The transaction is a set of logically related operation. It contains a group of tasks.
Transactions refer to a set of operations that are used for performing a set of logical work. Usually, a
transaction means the data present in the DB has changed. Protecting the user data from system
failures is one of the primary uses of DBMS.
59
60
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account. This
small transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in
main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following operations:
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will be 3500.
60
But it may be possible that because of the failure of hardware, software or power, etc. that transaction
may fail before finished all the operations in the set.
61
For example: If in the above transaction, the debit transaction fails after executing operation 2 then
X's value will remain 4000 in the database which is not acceptable by the bank.
Transaction State
1. Active State –
When the instructions of the transaction are running then the transaction is in active state. If all the
‘read and write’ operations are performed without any error then it goes to the “partially committed
state”; if any instruction fails, it goes to the “failed state”.
2. Partially Committed –
After completion of all the read and write operation the changes are made in main memory or local
buffer. If the changes are made permanent on the DataBase then the state will change to
“committed state” and in case of failure it will go to the “failed state”.
3. Failed State –
When any instruction of the transaction fails, it goes to the “failed state” or if failure occurs in
making a permanent change of data on Data Base.
4. Aborted State – 61
After having any type of failure the transaction goes from “failed state” to “aborted state” and since
62
in previous states, the changes are only made to local buffer or main memory and hence these
changes are deleted or rolled-back.
5. Committed State –
It is the state when the changes are made permanent on the Data Base and the transaction is
complete and therefore terminated in the “terminated state”.
The recovery-management component of a database system can support atomicity and durability by a
variety of schemes.
Here we are going to learn about one of the simplest scheme called Shadow copy.
In the shadow-copy scheme, a transaction that wants to update the database first creates a complete
copy of the database. All updates are done on the new database copy, leaving the original copy, the
shadow copy, untouched. If at any point the transaction has to be aborted, the system merely deletes
the new copy. The old copy of the database has not been affected.
This scheme is based on making copies of the database, called shadow copies, assumes that only
one transaction is active at a time. The scheme also assumes that the database is simply a file on disk.
A pointer called db-pointer is maintained on disk; it points to the current copy of the database.
First, the operating system is asked to make sure that all pages of the new copy of the database have
been written out to disk. (Unix systems use the flush command for this purpose.)
After the operating system has written all the pages to disk, the database system updates the pointer
db-pointer to point to the new copy of the database; the new copy then becomes the current copy of
the database. The old copy of the database is then deleted.
Concurrent executions
In a multi-user system, multiple users can access and use the same database at one time, which is
known as the concurrent execution of the database. It means that the same database is executed
simultaneously on a multi-user system by different users.
Most high-performance transactional systems need to run transactions concurrently to meet their
performance requirements. Thus, without concurrency control such systems can neither provide
correct results nor maintain their databases consistently
For example, deposit of money, balance enquiry, reservation of tickets etc. Every transaction starts
with delimiters begin transaction and terminates with end transaction delimiters. The set of
operations within these two delimiters constitute one transaction.
62
Concurrency Control in DBMS is a procedure of managing simultaneous transactions ensuring their
63
atomicity, isolation, consistency, and serializability. Several problems that arise when numerous
transactions execute simultaneously in a random manner are referred to as concurrency control
problems.
Serializability
Serializability is a property of a system describing how different processes operate on shared data.
A system is serializable if its result is the same as if the operations were executed in some
sequential order, meaning there is no overlap in execution. A database management system
(DBMS) can be accomplished by locking data so that no other process can access it while it is
being read or written.
Types of Serializability
Conflict serializability is a type of serializability in which conflicting operations on the same data
items are executed in an order that preserves database consistency. Each transaction is assigned a
unique number, and the operations within each transaction are executed in order based on that
number. This ensures that no two conflicting operations are executed concurrently. For example,
consider a database with two tables: Customers and Orders. A customer can have multiple orders,
but each order can only be associated with one customer.
View serializability is a type of serializability in which each transaction produces results that are
equivalent to some well-defined sequential execution of all transactions in the system. Unlike
conflict serializability, which focuses on preventing inconsistencies within the database, view
serializability in DBMS focuses on providing users with consistent views of the database.
Recoverability
63
If any transaction that performs a dirty read operation from an uncommitted transaction and also its
64
committed operation becomes delayed till the uncommitted transaction is either committed or
rollback such type of schedules is called as Recoverable Schedules.
Example
Let us consider two transaction schedules as given below −
T1 T2
Read(A)
Write(A)
- Write(A)
Commit
Commit // delayed
The above schedule is a recoverable schedule because of the reasons mentioned below −
The transaction T2 performs dirty read operation on A.
The commit operation of transaction T2 is delayed until transaction T1 commits or rollback.
Transaction commits later.
In the above schedule transaction T2 is now allowed to commit whereas T1 is not yet committed.
In this case transaction T1 is failed, and transaction T2 still has a chance to recover by rollback.
Concurrency Control
Lock-based protocols
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate lock
on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock, then it can't
update the data on the data item.
64
2. Exclusive lock:
65
o In the exclusive lock, the data item can be both reads as well as written by the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow all
the transactions to get the lock on the data before insert or delete or update on it. It will unlock the
data item after completing the transaction.
The timestamp-ordering protocol ensures serializability among transactions in their conflicting read
and write operations. This is the responsibility of the protocol system that the conflicting pair of tasks
should be executed according to the timestamp values of the transactions.
o The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The
65 order of the transaction creation.
order of transaction is nothing but the ascending
66
o The priority of the older transaction is higher that's why it executes first. To determine the
timestamp of the transaction, this protocol uses system time or logical counter.
o The lock-based protocol is used to manage the order between conflicting pairs among transactions
at the execution time. But Timestamp based protocols start working as soon as a transaction is
created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the
system at 007 times and transaction T2 has entered the system at 009 times. T1 has the higher
priority, so it executes first as it is entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation on
a data.
Validation-based protocols
Validation phase is also known as optimistic concurrency control technique. In the validation based
protocol, the transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and executed. It is used to read the value
of various data items and stores them in temporary local variables. It can perform all the write
operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against the
actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results are
written to the database or system otherwise the transaction is rolled back.
Multiple granularities
It can be defined as hierarchically breaking up the database into blocks which can be locked. The
Multiple Granularity protocol enhances concurrency and reduces lock overhead. It maintains the
track of what to lock and how to lock.
o It can be defined as hierarchically breaking up the database into blocks which can be locked.
o The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
66
o It maintains the track of what to lock and how to lock.
67
o It makes easy to decide either to lock a data item or to unlock a data item. This type of hierarchy
can be graphically represented as a tree.
67
68
Multi-version schemes
Multi-version protocol minimizes the delay for reading operation and maintains different versions of
data items. For each writes operation performed, it creates a new version of transaction data so that
whenever any transaction performs read operation to read that data then the appropriate created data
version is selected by the control manager to make that read operation conflict-free and successful.
When the write operation and new version of data is created then that new version contains some
information that is given below
Deadlock handling
68
69
A deadlock is a condition where two or more transactions are waiting indefinitely for one another to
give up locks. Deadlock is said to be one of the most feared complications in DBMS as no task ever
gets finished and is in waiting state forever.
For example: In the student table, transaction T1 holds a lock on some rows and needs to update
some rows in the grade table. Simultaneously, transaction T2 holds locks on some rows in the grade
table and needs to update the rows in the Student table held by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and similarly,
transaction T2 is waiting for T1 to release its lock. All activities come to a halt state and remain at a
standstill. It will remain in a standstill until the DBMS detects the deadlock and aborts one of the
transactions.
Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database rather than
aborting or restating the database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A method like
"wait for graph" is used for detecting the deadlock situation but this method is suitable only for the
smaller database. For the larger database, deadlock prevention method can be used.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should detect
whether the transaction is involved in a deadlock or not. The lock manager maintains a Wait for the
graph to detect the deadlock cycle in the database.
UNIT
69 –5
70
Syllabus:
Data Storage: Overview of physical storage media, magnetic disks, storage access, file
organization, organization of records in files.
Indexing and Hashing: Basic concepts, types of indexing, difference between B and B+ Indexing,
static hashing, Dynamic Hashing.
Several types of data storage exist in most computer systems. They vary in speed of access, cost per
unit of data, and reliability.
o Cache: most costly and fastest form of storage. Usually very small, and managed by the
operating system.
o
o Main Memory (MM): the storage area for data available to be operated on.
General-purpose machine instructions operate on main memory.
Contents of main memory are usually lost in a power failure or ``crash''.
Usually too small (even with megabytes) and too expensive to store the entire
database.
Magnetic disks
A magnetic disk is a storage device that can be assumed as the shape of a Gramophone record. This
disk is coated on both sides with a thin film of Magnetic material. This magnetic material has the
property that it can store either ‘1’ or ‘0] permanently. The magnetic material has square loop
hysteresis (curve) which can remain in one out of two possible directions which correspond to binary
‘1’ or ‘0’.
Bits are saved in the magnetized surface in marks along concentric circles known as tracks. The
tracks are frequently divided into areas known as sectors.
In this system, the lowest quantity of data that can be sent is a sector. The subdivision of one disk
surface into tracks and sectors is displayed in the figure.
71
72
Storage access
Storage Access. A database is mapped into a number of different files, which are maintained by the
underlying operating system. Files are organized into block and a block may contain one or more
data item. A major goal of the DBMS is to minimize the number of block transfers between the disk
and memory.
Data access refers to a user's ability to access or retrieve data stored within a database or other
repository. Users who have data access can store, retrieve, move or manipulate stored data, which can
be stored on a wide range of hard drives and external devices.
1. Each file is partitioned into fixed-length storage units, called blocks, which are the units of
both storage allocation and data transfer.
2. It is desirable to keep as many blocks as possible in main memory. Usually, we cannot keep
all blocks in main memory, so we need to manage the allocation of available main memory space.
3. We need to use disk storage for the database, and to transfer blocks of data between main
memory and disk. We also want to minimize the number of such transfers, as they are time-
consuming.
4. The buffer is the part of main memory available for storage of copies of disk blocks.
File organization
File Organization refers to the logical relationships among various records that constitute the file,
particularly with respect to the means of identification and access to any specific record. In simple
terms, Storing the files in certain order is called file Organization. File Structure refers to the
format of the label and data blocks and of any72
logical control record.
73
File Organization
o The File is a collection of records. Using the primary key, we can access the records. The type and
frequency of access can be determined by the type of file organization which was used for a given set
of records.
o File organization is a logical relationship among various records. This method defines how file
records are mapped onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms of blocks, and
the blocks are placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store only one fixed
length record in any given file. An alternative approach is to structure our files so that we can contain
multiple lengths for records.
o Files of fixed length records are easier to implement than the files of variable length records.
A logical relationship between distinct records is referred to as file organization. This method
specifies how disc blocks are mapped to file records. The word “file organization” refers to the
method by which records are organized into blocks and then placed on a storage media.
Heap file organization. Any record can be placed anywhere in the file where there is space for the
record. There is no ordering of records.
Sequential file organization. Records are stored in sequential order, based on the value of the
search key of each record.
Hashing file organization. A hash function is computed on some attribute of each record. The
result of the function specifies in which block of the file the record should be placed -- to be
discussed in chapter 11 since it is closely related to the indexing structure.
Clustering file organization. Records of several different relations can be stored in the same file.
Related records of the different relations are stored on the same block so that one I/O operation
fetches related records from all the relations.
73
File organization helps:
74
It has an ideal record selection, which means records can be selected as quickly as feasible.
Insert, delete, and update transactions on records should be simple and rapid.
Duplicate records cannot be created by inserting, updating, or deleting records.
Records should be stored efficiently to save money on storage.
.
Basic concepts
1. Indexing:
Indexing, as name suggests, is a technique or mechanism generally used to speed up access of data.
Index is basically a type of data structure that is used to locate and access data in database table
quickly. Indexes can easily be developed or created using one or more columns of database table.
2. Hashing:
Hashing, as name suggests, is a technique or mechanism that uses hash functions with search keys
as parameters to generate address of data record. It calculates direct location of data record on disk
without using index structure. A good hash functions only uses one-way hashing algorithm and
hash cannot be converted back into original key. In simple words, it is a process of converting
given key into another value known as hash value or simply hash.
Types of indexing:
1.Ordered Index.
2.Primary Index.
3.Clustering Index.
4.Sparsing Index.
Ordered index
The indices are usually sorted to make searching faster. The indices which are sorted are known as
ordered indices.
Example: Suppose we have an employee table with thousands of record and each of which is 10
bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.
o In the case of a database with no index, we have to search the disk block from starting till it reaches
543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using
74 indexes and the DBMS will read the record after
reading 542*2= 1084 bytes which are very less compared to the previous case.
75
Primary Index
o If the index is created on the basis of the primary key of the table, then it is known as primary
indexing. These primary keys are unique to each record and contain 1:1 relation between the records.
o As primary keys are stored in sorted order, the performance of the searching operation is quite
efficient.
o The primary index can be classified into two types: Dense index and Sparse index.
Sparse index
o In the data file, index record appears only for a few items. Each item points to a block.
o In this, instead of pointing to each record in the main table, the index points to the records in the
main table in a gap.
Clustering Index
oA clustered index can be defined as an ordered data file. Sometimes the index is created on non-
primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get the unique value
and create index out of them. This method is called a clustering index.
o The records which have similar characteristics are grouped, and indexes are created for these group.
B+ tree eliminates the drawback B-tree used for indexing by storing data pointers only at the
leaf nodes of the tree. Thus, the structure of leaf nodes of a B+ tree is quite different from the
structure of internal nodes of the B tree
Basis of
Comparison B tree B+ tree
All internal and leaf nodes Only leaf nodes have data
Pointers have data pointers pointers
Since all keys are not available All keys are at leaf nodes,
at leaf, search often takes more hence search is faster and more
Search time. accurate.
Static hashing
Static hashing:
It is a hashing technique that enables users to lookup a definite data set. Meaning, the data in the
directory is not changing, it is "Static" or fixed. In this hashing technique, the resulting number of
data buckets in memory remains constant.
Operations Provided by Static Hashing
Static hashing provides the following operations −
Delete − Search a record address and delete a record at the same address or delete a chunk of
records from records for that address in memory.
Insertion − While entering a new record using static hashing, the hash function (h) calculates
bucket address "h(K)" for the search key (k), where the record is going to be stored.
Search − A record can be obtained using a hash function by locating the address of the bucket
where the data is stored.
Update − It supports updating a record once it is traced in the data bucket.
Dynamic Hashing
It is a hashing technique that enables users to lookup a dynamic data set. Means, the data set is
modified by adding data to or removing the data from, on demand hence the name ‘Dynamic’
hashing. Thus, the resulting data bucket keeps increasing or decreasing depending on the number of
records.
In this hashing technique, the resulting number of data buckets in memory is ever-changing.
Operations Provided by Dynamic Hashing
Dynamic hashing provides the following operations −
Delete − Locate the desired location and support deleting data (or a chunk of data) at that location.
Insertion − Support inserting new data into the data bucket if there is a space available in the data
bucket.
Query − Perform querying to compute the bucket address.
Update − Perform a query to update the data.
77