DBMS Notes - All Units

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 77

1

Database Management Systems (DBMS)


UNIT–1 TO UNIT-5 Notes
Syllabus:

Conceptual Modeling Introduction: Introduction to Data bases: Purpose of Database systems,


view of data, data models, Database languages, Database users, various components of overall
DBS architecture, various concepts of ER model, basics of Relational Model.
SQL Query – Basics: SQL – Data Definition commands, Queries with various options,
Data manipulation commands, Views, Joins, views, integrity and security.

Introduction to Data bases

 A database-management system (DBMS) is a collection of interrelated data and a set of


programs to access those data.
 The collection of data, usually referred to as the database, contains information relevant to an
enterprise.
 The primary goal of a DBMS is to provide a way to store and retrieve database information that
is both convenient and efficient.
 Database systems are designed to manage large bodies of information.
 Management of data involves both defining structures for storage of information and providing
mechanisms for the manipulation of information.

In DBMS, only required information can be access instead of complete file through a simple query.
In DBMS, the related data resides in the same storage location or the same information may not be
duplicated as a result minimizing data redundancy and reduced data inconsistency. In DBMS there
is no redundancy and no inconsistency.
The redundancy of data is low in the DBMS or less data redundancy in DBMS
DBMS provides data consistency through normalization. In DBMS, there is no data integrity
problem.
Example of data integrity: User cannot insert string data in integer column
Database security includes protecting the database and its various applications that access it.
Organizations must secure databases from virus attacks such as cyber security threats, as well as the
misuse of data and databases from those who can access them.

Data can be shared (multi-user can access data from a single machine). DBMS provide multiple
user interfaces
Unauthorized access is restricted in DBMS

DBMS system provides backup and recovery1 of data even if it is lost. DBMS provides a crash
recovery mechanism i.e., DBMS protects the user from the system failure.
2

You can easily query data in a database using the SQL language

Allow multiple users to access the database at the same time

Database-System Applications
Databases are widely used. Here are some representative applications:
• Enterprise Information
◦ Sales: For customer, product, and purchase information.

◦ Accounting: For payments, receipts, account balances, assets and other accounting information.

◦ Human resources: For information about employees, salaries, payroll taxes, and benefits, and for
generation of paychecks.

◦ Manufacturing: For management of the supply chain and for tracking production of items in factories,
inventories of items in warehouses and stores, and orders for items

Online retailers: For sales data noted above plus online order tracking, generation of recommendation lists,
and maintenance of online product evaluations.

• Banking and Finance


◦ Banking: For customer information, accounts, loans, and banking transactions.
◦ Credit card transactions: For purchases on credit cards and generation of monthly statements.
◦ Finance: For storing information about holdings, sales, and purchases of financial instruments such
as stocks and bonds; also, for storing real-time market data to enable online trading by customers
and automated trading by the firm.
• Universities: For student information, course registrations, and grades (in addition to standard enterprise
information such as human resources and accounting).

• Airlines: For reservations and schedule information. Airlines were among the first to use databases in a
geographically distributed manner.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balances on
prepaid calling cards, and storing information about the communication networks

Purpose of Database systems

In the early days, database applications were built directly on top of file systems. A DBMS provides users with
a systematic way to create, retrieve, update and manage data. It is a middleware between the databases which
store all the data and the users or applications which need to interact with that stored database. A DBMS can
limit what data the end user sees, as well as how that end user can view the data, providing many views of a
single database schema.

Database + database management system = database system


• Data redundancy and inconsistency. Since different programmers create the files and application programs over a
long period, the various files are likely to have different structures and the programs may be written in several
programming languages.

2
3

• Difficulty in accessing data. Suppose that one of the university clerks needs to find out the names of all students who
live within a particular postal-code area. The clerk asks the data-processing department to generate such a list. Because
the designers of the original system did not anticipate this request, there is no application program on hand to meet it.
There is, however, an application program to generate the list of all students. The university clerk has now two choices:
either obtain the list of all students and extract the needed information manually or ask a programmer to write the
necessary application program. Both alternatives are obviously unsatisfactory.

• Data isolation. Because data are scattered in various files, and files may be in different formats, writing new application
programs to retrieve the appropriate data is difficult.

• Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints.

• Atomicity problems. A transaction must happen in its entirety or not at all requirement is called Atomicity problem. It
is difficult to ensure atomicity in a conventional file-processing system.

• Concurrent-access anomalies. For the sake of overall performance of the system and faster response, many systems
allow multiple users to update the data simultaneously, which may lead to concurrent access anomalies.

• Security problems. Not every user of the database system should be able to access all the data.

Uses of DBMS

The main uses of DBMS are as follows −

 Data independence and efficient access of data.


 Application Development time reduces.
 Security and data integrity.
 Uniform data administration.
 Concurrent access and recovery from crashes.

View of Data
 A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data.
 A major purpose of a database system is to provide users with an abstract view of the data. That
is, the system hides certain details of how the data are stored and maintained.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers
to use complex data structures to represent data in the database. Since many database-system users
are not computer trained, developers hide the complexity from users through several levels of
abstraction, to simplify users’ interactions with the system:

• Physical level. The lowest level of abstraction describes how the data are actually stored. The
physical level describes complex low-level data structures in detail.
• Logical level. The next-higher level of abstraction describes what data are stored in the database,
and what relationships exist among those data.

Although implementation of the simple structures at the logical level may involve complex
physical-level structures, the user of the logical level does not need to be aware of this complexity.
This is referred to as physical data independence.
Database administrators, who must decide what information to keep in the database, use the
logical level of abstraction. 3
• View level. The highest level of abstraction describes only part of the entire database.
4

Figure below shows the relationship among the three levels of abstraction.

View of data

Data models

Data model means to model the data i.e., to give a shape to the data and to give a figure to the
stored data. A data model makes it easier to understand the meaning of the data by its figure.

In simple words, we can define data model as “a collection of high-level data description that hide
many low-level storage details. A data model can also be defined as a collection of conceptual tools
for describing data, data relationships and consistency constraints”. A DBMS allows a user to define
the stored data in terms of data model.

There are various types of data model but the relational model is the most widely used model.
The different Data Models in DBMS are:

1. Hierarchical Model
2. Network Model
3. Relational Model
4. Object-Oriented Model
5. Object-Relational Model
6. Entity-Relationship Model

1. Hierarchical Model
Hierarchical Model was the first DBMS model and one of the oldest Database Model The general
shape of this model is like an Organizational chart (Example-2). A node on the chart represents a
4
particular entity. The terms parent and child are used in describing a hierarchical model. This
5

model organizes the data in the hierarchical tree structure. The hierarchy starts from the root which
has root data and then it expands in the form of a tree adding child node to the parent node.

This model used the tree as its basic structure. A tree is a data structure that consists of a hierarchy of
nodes, with a single node called the root, at the highest level. A node represents a particular entity.
A node may have any number of children, but each child node may have only one parent node.

Syntax

Example: Organization Chart

2. Network Model
This model is an extension of the hierarchical model. It was the most popular model before the
relational model. This model is the same as the hierarchical model; the only difference is that a
record can have more than one parent. The network model was developed to overcome the limited
scope of hierarchical model. It replaces the hierarchical tree with a graph.

5
6

In Network Model, multiple parent-child relationships are used. The network model uses a network
structure, which is a data structure of nodes and branches.

The main difference of Network model and hierarchical model is that a network model permits a
child node to have more than one parent nodes, whereas hierarchical model dos not allows a child
node to have multiple parent nodes.

Example-1:

3. Relational Model

Relational Model is the most widely used model. In this model, the data is maintained in the form
of a two-dimensional table. All the information is stored in the form of rows and columns. The
basic structure of a relational model is tables. So, the tables are also called relations in the
relational model.

For example student roll number, student name, student course etc., and columns also known as
attributes. An attribute represents the characteristics of a person, place or thing, for example
Salary attribute in a given below example.

Example: In the given example, we have an Employee table.

6
7

The most popular and extensively used data model is the relational data model. The data model
allows the data to be stored in tables called a relation. The relations are normalized and the
normalized relation values are known as atomic values. Each of the rows in a relation is called tuples
which contains the unique value. The attributes are the values in each of the columns which are of the
same domain.

7
8

Popular examples of standard relational databases include Microsoft SQL Server, Oracle Database,
MySQL and IBM DB2.

4. Object-Oriented Model

The object-oriented model is based on a collection of objects. An object contains values stored in
variables within the object. An object also contains code that operates on the object. This code is
called methods. Objects that contain the same types of values and the same methods are grouped
together into classes. A class may be viewed as a definition for objects.

The only way in which one object can access the data of another object is by invoking a method of
that other object. This action is called sending a message to the object. An object-oriented data
model is one of the most developed data models which contain video, graphical files, and audio.

In this model, both the data and relationship are present in a single structure known as an object.
We can store audio, video, images, etc in the database which was not possible in the relational
model. Although you can store audio and video in relational database, it is advised not to store in
the relational database. In this model, two or more objects are connected through links. We use this
link to relate one object to other objects.

5. Object-Relational Model

An Object relational model is a combination of a Object oriented model and a Relational model. So,
it supports objects, classes, inheritance etc. just like Object Oriented models and has support for data
types, tabular structures etc. like Relational data model.

One of the major goals of Object relational data model is to close the gap between relational database
and the object oriented database frequently used in many programming languages such as C++, C#,
Java etc.

We can have many advanced features like we can make complex data types according to our
requirements using the existing data types. The problem with this model is that this can get complex
and difficult to handle.

8 Oracle
Example: ORDBMSs include PostgreSQL and
9

6. Entity-Relationship Model

Entity-Relationship Model or simply ER Model is a high-level data model diagram. In this model,
we represent the real-world problem in the pictorial form to make it easy to understand. It is also
very easy for the developers to understand the system by just looking at the ER diagram. We use the
ER diagram as a visual tool to represent an ER Model.

An ER model is the logical representation of data as objects and relationships among them. These
objects are known as entities, and relationship is an association among these entities. It was widely
used in database designing. A set of attributes describe the entities. ER Model is best used for the
conceptual design of a database.

The following ER diagram syntax has the three components:

1. Entity
2. Attribute
3. Relationship

Syntax:

Example:

9
10

.
Database languages

The different types of DBMS languages are as follows −

 Data Definition Language (DDL) − Create, Drop, Truncate, Rename.


 Data Manipulation language (DML) − Select, Insert, Delete, Update.
 Data Control Language (DCL) − Revoke, Grant.
 Transaction Control Language (TCL) − Rollback, Commit.
The DBMS languages are pictorially represented as follows −

10
11

Data Definition Language (DDL)


It is a language that allows the user to define the data and their relationship to other types of data.
The DDL commands are: Create, Alter, Rename, Drop, Truncate.

Data Manipulation Language (DML)


It is a language that provides a set of operations to support the basic data manipulation operation on
data held in the database. The DML commands are: Insert, delete, update, select, merge, call.

Data Control Language (DCL)


DCL is used to access the stored data. It is mainly used for revoke and grant the user access to a
database. The DCL commands are: Grant, Revoke.

Transaction Control Language (TCL)


TCL is a language which manages the transactions within the database. It is used to execute the
changes made by the data manipulation language statements. The TCL commands are: Commit,
Rollback.

Database users
11

Database users are categorized based up on their interaction with the data base.
12

These are different types of data base users in DBMS.

1. Database Administrator (DBA) :


Database Administrator (DBA) is a person/team who defines the schema and also controls the 3
levels of database.
The DBA will then create a new account id and password for the user if he/she need to access the
data base.
DBA is also responsible for providing security to the data base and he allows only the authorized
users to access/modify the data base.
 DBA also monitors the recovery and back up and provide technical support.
 The DBA has a DBA account in the DBMS which called a system or superuser account.
 DBA repairs damage caused due to hardware and/or software failures.

1. Naive / Parametric End Users :


Parametric End Users are the unsophisticated who don’t have any DBMS knowledge but they
frequently use the data base applications in their daily life to get the desired results.
For examples, Railway’s ticket booking users are naive users. Clerks in any bank is a naive user
because they don’t have any DBMS knowledge but they still use the database and perform their
given task.

2. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They check
whether all the requirements of end users are satisfied.
3. Sophisticated Users :
Sophisticated users can be engineers, scientists, business analyst, who are familiar with the
database. They can develop their own data base applications according to their requirement. They
don’t write the program code but they interact the data base by writing SQL queries directly
through the query processor.
4. Data Base Designers :
Data Base Designers are the users who design the structure of data base which includes tables,
indexes, views, constraints, triggers, stored procedures. He/she controls what data must be stored
and how the data items to be related.

5. Application Program : 12
Application Program are the back end programmers who writes the code for the application
13

programs.They are the computer professionals. These programs could be written in Programming
languages such as Visual Basic, Developer, C, FORTRAN, COBOL etc.

.
Various components of overall DBS architecture

1. Query Processor: It interprets the requests (queries) received from end user via an application
program into instructions. It also executes the user request which is received from the DML
compiler.
Query Processor contains the following components
13 –
14

 DML Compiler: It processes the DML statements into low level instruction (machine
language), so that they can be executed.

 DDL Interpreter: It processes the DDL statements into a set of table containing meta data
(data about data).
 Embedded DML Pre-compiler: It processes DML statements embedded in an application
program into procedural calls.
 Query Optimizer: It executes the instruction generated by DML Compiler.
2. Storage Manager: Storage Manager is a program that provides an interface between the data
stored in the database and the queries received. It is also known as Database Control System. It
maintains the consistency and integrity of the database by applying the constraints and executing
the DCL statements. It is responsible for updating, storing, deleting, and retrieving data in the
database.
It contains the following components –
 Authorization Manager: It ensures role-based access control, i.e,. checks whether the
particular person is privileged to perform the requested operation or not.

 Integrity Manager: It checks the integrity constraints when the database is modified.

 Transaction Manager: It controls concurrent access by performing the operations in a


scheduled way that it receives the transaction. Thus, it ensures that the database remains in the
consistent state before and after the execution of a transaction.

 File Manager: It manages the file space and the data structure used to represent
information in the database.

 Buffer Manager: It is responsible for cache memory and the transfer of data between the
secondary storage and main memory.

3. Disk Storage: It contains the following components –


 Data Files: It stores the data.

 Data Dictionary: It contains the information about the structure of any database object. It
14
is the repository of information that governs the metadata.
15

 Indices: It provides faster retrieval of data item.

Various concepts of ER model

ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used

to define the data elements and relationship for a specified system.


o It develops a conceptual design for the database. It also develops a very simple and easy to design
view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-relationship
diagram.

For example, Suppose we design a school database. In this database, the student will be an entity
with attributes like address, name, id, age, etc. The address can be another entity with attributes like
city, street name, pin code, etc and there will be a relationship between them.

Component of ER Diagram

15
16

1. Entity:

An entity may be any object, class, person or place. In the ER diagram, an entity can be represented
as rectangles.

Consider an organization as an example- manager, product, employee, department etc. can be taken
as an entity.

a. Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't contain any
key attribute of its own. The weak entity is represented by a double rectangle.

2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.

16
17

a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the text underlined.

b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. The composite
attribute is represented by an ellipse, and those ellipses are connected with an ellipse.

c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued attribute. The
double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.

17
18

d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from another attribute like Date
of birth.

3. Relationship

A relationship is used to describe the relation between entities. Diamond or rhombus is used to
represent the relationship.

Types of relationship are as follows:

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is known as one to one
relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then this is known as a one-to-many relationship.
18
19

For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.

c. Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity on the right
associates with the relationship then it is known as a many-to-one relationship.

For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.

Next

Basics of Relational Model

Relational Model was proposed by E.F. Codd to model data in the form of relations or tables. After
designing the conceptual model of Database using ER diagram, we need to convert the conceptual
model in the relational model which can be implemented using any RDBMS languages like Oracle
19
SQL, MySQL etc. So we will see what Relational Model is.
20

Relational Model represents how data is stored in Relational Databases. A relational database
stores data in the form of relations (tables).

Relational model can represent as a table with columns and rows. Each row is known as a tuple. Each
table of the column has a name or attribute.

Domain: It contains a set of atomic values that an attribute can take.

Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a
domain, dom(Ai)

Relational instance: In the relational database system, the relational instance is represented by a finite
set of tuples. Relation instances do not have duplicate tuples.

Relational schema: A relational schema contains the name of the relation and name of all columns
or attributes.

Relational key: In the relational key, each row has one or more attributes. It can identify the row in
the relation uniquely.

Example: STUDENT Relation

NAME ROLL_NO PHONE_NO ADDRESS AGE

Ram 14795 7305758992 Noida 24

Shyam 12839 9026288936 Delhi 35

Laxman 33289 8583287182 Gurugram 20

Mahesh 27857 7086819134 Ghaziabad 27

Ganesh 17282 9028 9i3988 Delhi 40


o In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the attributes.
o The instance of schema STUDENT has 5 tuples.

Data Definition commands:

DDL is an abbreviation of Data Definition Language.

The DDL Commands in Structured Query Language are used to create and modify the schema of the
database and its objects. The syntax of DDL commands is predefined for describing the data. The
commands of Data Definition Language deal with how the data should exist in the database.

Following are the five DDL commands in SQL:


20
1. CREATE Command
21

2. DROP Command
3. ALTER Command
4. TRUNCATE Command
5. RENAME Command

CREATE Command

CREATE is a DDL command used to create databases, tables, triggers and other database objects.

Examples of CREATE Command in SQL

Example 1: This example describes how to create a new database using the CREATE DDL
command.

Syntax to Create a Database:

CREATE Database Database_Name;

Suppose, you want to create a Books database in the SQL database. To do this, you have to write the
following DDL Command:

Create Database Books;

Example 2: This example describes how to create a new table using the CREATE DDL
command.

Syntax to create a new table:

CREATE TABLE table_name


(
column_Name1 data_type ( size of the column ) ,
column_Name2 data_type ( size of the column) ,
column_Name3 data_type ( size of the column) ,
...
column_NameN data_type ( size of the column )
);

Example:

Suppose, you want to create a Student table with five columns in the SQL database. To do this, you
have to write the following DDL command:

CREATE TABLE Student


(
Roll_No. Int ,
First_Name Varchar (20) , 21
Last_Name Varchar (20) ,
22

Age Int ,
Marks Int ,
);

DROP Command

DROP is a DDL command used to delete/remove the database objects from the SQL database. We
can easily remove the entire table, view, or index from the database using this DDL command.

Examples of DROP Command in SQL

Example 1: This example describes how to remove a database from the SQL database.

Syntax to remove a database:

DROP DATABASE Database_Name;

Suppose, you want to delete the Books database from the SQL database. To do this, you have to write
the following DDL command:

DROP DATABASE Books;

Example 2: This example describes how to remove the existing table from the SQL database.

Syntax to remove a table:

DROP TABLE Table_Name;

Suppose, you want to delete the Student table from the SQL database. To do this, you have to write
the following DDL command:

DROP TABLE Student;

Example 3: This example describes how to remove the existing index from the SQL database.

Syntax to remove an index:

DROP INDEX Index_Name;

Suppose, you want to delete the index_city from the SQL database. To do this, you have to write the
following DDL command:

DROP INDEX Index_city;

ALTER Command

ALTER is a DDL command which changes or modifies the existing structure of the database, and it
22
also changes the schema of database objects.
23

We can also add and drop constraints of the table using the ALTER command.

Examples of ALTER Command in SQL

Example 1: This example shows how to add a new field to the existing table.

Syntax to add a newfield in the table:

ALTER TABLE name_of_table ADD column_name column_definition;

Suppose, you want to add the 'Father's_Name' column in the existing Student table. To do this, you
have to write the following DDL command:

ALTER TABLE Student ADD Father's_Name Varchar(60);

Example 2: This example describes how to remove the existing column from the table.

Syntax to remove a column from the table:

ALTER TABLE name_of_table DROP Column_Name_1 , column_Name_2 , ….., column_Name_


N;

Suppose, you want to remove the Age and Marks column from the existing Student table. To do this,
you have to write the following DDL command:

ALTER TABLE StudentDROP Age, Marks;

Example 3: This example describes how to modify the existing column of the existing table.

Syntax to modify the column of the table:

ALTER TABLE table_name MODIFY ( column_name column_datatype(size));

Suppose, you want to change the character size of the Last_Namefield of the Student table. To do
this, you have to write the following DDL command:

ALTER TABLE table_name MODIFY ( Last_Name varchar(25));

TRUNCATE Command

TRUNCATE is another DDL command which deletes or removes all the records from the table.

This command also removes the space allocated for storing the table records.

Syntax of TRUNCATE command 23


24

TRUNCATE TABLE Table_Name;

Example

Suppose, you want to delete the record of the Student table. To do this, you have to write the
following TRUNCATE DDL command:

TRUNCATE TABLE Student;

The above query successfully removed all the records from the student table. Let's verify it by using
the following SELECT statement:

SELECT * FROM Student;

RENAME Command

RENAME is a DDL command which is used to change the name of the database table.

Syntax of RENAME command

RENAME TABLE Old_Table_Name TO New_Table_Name;

Example

RENAME TABLE Student TO Student_Details ;

This query changes the name of the table from Student to Student_Details.

Data manipulation commands

DML Commands in SQL

DML is an abbreviation of Data Manipulation Language.

The DML commands in Structured Query Language change the data present in the SQL database.
We can easily access, store, modify, update and delete the existing records from the database using
DML commands.

Following are the four main DML commands in SQL:

1. SELECT Command
2. INSERT Command 24
3. UPDATE Command
25

4. DELETE Command

SELECT DML Command

SELECT is the most important data manipulation command in Structured Query Language. The
SELECT command shows the records of the specified table. It also shows the particular record of a
particular column by using the WHERE clause.

Syntax of SELECT DML command

SELECT column_Name_1, column_Name_2, ….., column_Name_N FROM Name_of_table;

Here, column_Name_1, column_Name_2, ….., column_Name_N are the names of those columns
whose data we want to retrieve from the table.

If we want to retrieve the data from all the columns of the table, we have to use the following
SELECT command:

SELECT * FROM table_name;

Examples of SELECT Command

Example 1: This example shows all the values of every column from the table.

SELECT * FROM Student;

This SQL statement displays the following values of the student table:

Example 2: This example shows all the values of a specific column from the table.

SELECT Emp_Id, Emp_Salary FROM Employee;

This SELECT statement displays all the values of Emp_Salary and Emp_Id column
of Employee table:

Example 3: This example describes how to use the WHERE clause with the SELECT DML
command.

SELECT * FROM Student WHERE Stu_Marks = 80;

The above SQL query shows the following table in result:


25
26

Student Student_N Student_Ma


_ID ame rks

BCA1001 Abhay 80

BCA1003 Bheem 80

BCA1005 Sumit 80

INSERT DML Command

INSERT is another most important data manipulation command in Structured Query Language,
which allows users to insert data in database tables.

Syntax of INSERT Command

INSERT INTO TABLE_NAME ( column_Name1 , column_Name2 , column_Name3 , .... column_


NameN ) VALUES (value_1, value_2, value_3, .... value_N ) ;

Examples of INSERT Command

Example 1: This example describes how to insert the record in the database table.

Let's take the following student table, which consists of only 2 records of the student.

Stu_ Stu_Na Stu_Mar Stu_A


Id me ks ge

101 Ramesh 92 20

201 Jatin 83 19

Suppose, you want to insert a new record into the student table. For this, you have to write the
following DML INSERT command:

UPDATE DML Command

UPDATE is another most important data manipulation command in Structured Query Language,
which allows users to update or modify the existing data in database tables.

Syntax of UPDATE Command

UPDATE Table_name SET [column_name1= value_1, ….., column_nameN = value_N] WHERE


CONDITION;

Here, 'UPDATE', 'SET', and 'WHERE' are the 26


SQL keywords, and 'Table_name' is the name of the
table whose values you want to update.
27

Examples of the UPDATE command

UPDATE Product SET Product_Price = 80 WHERE Product_Id = 'P102' ;

Suppose, you want to update Stu_Marks and Stu_Age of that student whose Stu_Id is 103 and 202.
To do this, you have to write the following DML Update command:

UPDATE Student SET Stu_Marks = 80, Stu_Age = 21 WHERE Stu_Id = 103 AND Stu_Id = 202;

DELETE DML Command

DELETE is a DML command which allows SQL users to remove single or multiple existing records
from the database tables.

This command of Data Manipulation Language does not delete the stored data permanently from the
database. We use the WHERE clause with the DELETE command to select specific rows from the
table.

Syntax of DELETE Command

DELETE FROM Table_Name WHERE condition;

Examples of DELETE Command

Suppose, you want to delete that product from the Product table whose Product_Id is P203. To do
this, you have to write the following DML DELETE command:

DELETE FROM Product WHERE Product_Id = 'P202' ;

Suppose, you want to delete the record of those students whose Marks is greater than 70. To do this,
you have to write the following DML Update command:

DELETE FROM Student WHERE Stu_Marks > 70 ;

Integrity and security

Data integrity refers to the overall accuracy, completeness, and reliability of data. It can be specified
by the lack of variation between two instances or consecutive updates of a record, indicating that
your information is error-free. It also corresponds to the security and integrity controls and methods
of regulatory compliance.

Data integrity in a database is preserved by an array of error-checking and validation procedures,


rules, and principles executed during the integration flow designing phase. These checks and
correction procedures are based on predefined27business rules. For instance, the rules dictate to filter
out the data with an incorrect date or time value.
28

Data integrity in a database is essential because it is a necessary constituent of data integration. If


data integrity is maintained, data values stored within the database are consistent about the data
model and type.

Data security is important for protecting information from unauthorized access or modification,
which could lead to data loss or corruption.
Data security refers to the protection of data, while data integrity refers to the trustworthiness of
data. Data security focuses on how to minimize the risk of leaking intellectual property, business
documents, healthcare data, emails, trade secrets, and more.

Data integrity and data security are both important for ensuring the safety and reliability of
information. Both data integrity and data security are important for ensuring the safety and
reliability of information. Data integrity is the process of ensuring that data is accurate and
complete. Data security is the process of protecting data from unauthorized access.

UNIT - II
Syllabus:

Relational Approach: Relational algebra and calculus, Relational algebra, selection and
projection, set operations, renaming, joins, division, examples of algebra queries, relational
calculus: Tuple relational calculus, Domain relational calculus, expressive power of algebra and
calculus.

Relational algebra and calculus

Relational algebra is a procedural query language. It gives a step by step process to obtain the result
of the query. It uses operators to perform queries.

What is relational algebra in DBMS?


Relational Algebra is procedural query language, which takes Relation as input and generate relation
as output. Relational algebra mainly provides theoretical foundation for relational databases and
SQL. Operators in Relational Algebra. Projection (π) Projection is used to project required column
data from a relation. 28
29

RELATIONAL ALGEBRA is a widely used procedural query language. It collects instances of


relations as input and gives occurrences of relations as output. It uses various operations to perform
this action. SQL Relational algebra query operations are performed recursively on a relation. The
output of these operations is a new relation, which might be formed from one or more input relations.

What are the types of relational algebra?


Five basic operations in relational algebra:
1. Selection
2. Projection
3. Cartesian
4. Union
5. Set Difference

.
What is the need of relational algebra?
The main application of relational algebra is to provide a theoretical foundation for relational
databases, particularly query languages for such databases, chief among which is SQL. Relational
databases store tabular data represented as relations.

Types of Relational operation

1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).

Notation: σ p(r)

Where:

σ is used for selection prediction


r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT. These
relational can use as relational operators like =, ≠, ≥, <, >, ≤.

For example: LOAN Relation

BRANC LOA AM 29
H_NAM N_N OU
30

E O NT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Input:

σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_N LOAN AMO


AME _NO UNT

Perryride L-15 1500

Perryride L-16 1300

2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result. Rest of the
attributes are eliminated from the table.
o It is denoted by ∏.

1. Notation: ∏ A1, A2, An (r)

Where

A1, A2, A3 is used as an attribute name of relation r.

Example: CUSTOMER RELATION


30
NAM STREE CITY
31

E T

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnso Alma Brookly


n n

Brooks Senator Brookly


n

Input:

∏ NAME, CITY (CUSTOMER)

Output:

NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples that are either in
R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.

1. Notation: R ∪ S

A union operation must hold the following condition:

oR and S must have the attribute of the same number.


o Duplicate tuples are eliminated automatically.
31
32

Example:

DEPOSITOR RELATION

CUSTOMER_ ACCOUN
NAME T_NO

Johnson A-101

Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION

CUSTOMER_NA LOAN_
ME NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17

Input:

∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NA
ME
32
Johnson
33

Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry

Williams

Mayes

4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in
both (common) R & S.
o It is denoted by intersection ∩.

1. Notation: R ∩ S

Example: Using the above DEPOSITOR table and BORROW table

Input:

∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAM
E

Smith

Jones

5. Set Difference:

o Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in R
but not in S. 33
o It is denoted by intersection minus (-).
34

1. Notation: R - S

Example: Using the above DEPOSITOR table and BORROW table

Input:

∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAM
E

Jackson

Hayes

Willians

Curry

6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the other table. It
is also known as a cross product.
o It is denoted by X.

Notation: E X D

Example:

EMPLOYEE

EMP EMP_N EMP_D


_ID AME EPT

1 Smith A

2 Harry C

3 John B

DEPARTMENT

DEPT_ DEPT_NA
NO ME

A Marketing 34
35

B Sales

C Legal

Input:

EMPLOYEE X DEPARTMENT

Output:

EMP_ EMP_NA EMP_DE DEPT_ DEPT_NA


ID ME PT NO ME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename Operation:

The rename operation is used to rename the output relation. It is denoted by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to STUDENT1.

ρ(STUDENT1, STUDENT)

Relational Calculus

Relational calculus is a non procedural query language. It uses mathematical predicate calculus
instead of algebra. It provides the description about the query to get the result where as relational
algebra gives the method to get the result. It informs the system what to do with the relation, but does
not inform how to perform it. 35
36

Relational calculus exists in two forms:

1. Tuple Relational Calculus (TRC)


2. Domain Relational Calculus (DRC).

1. Tuple Relational Calculus (TRC)

A tuple relational calculus is a non procedural query language which specifies to select the tuples in a
relation. It can select the tuples with range of values or tuples for certain attribute values etc. The
resulting relation can have one or more tuples.
The variable which is used in the condition is called tuple variable.
o The tuple relational calculus is specified to select the tuples in a relation. In TRC, filtering variable
uses the tuples of a relation.

o The result of the relation can have one or more tuples.

It is denoted as below:

{t | P (t)} or {t | condition (t)} — this is also known as expression of relational calculus


Where t is the resulting tuples, P(t) is the condition used to fetch t.

{t | EMPLOYEE (t) and t.SALARY>10000} – implies that it selects the tuples from EMPLOYEE
relation such that resulting employee tuples will have salary greater than 10000. It is example of
selecting a range of values.

{t | EMPLOYEE (t) AND t.DEPT_ID = 10} – this select all the tuples of employee name who work
for Department 10.

In above example t.SALARY and t.DEPT_ID are tuple variables. In the first example above, we have
specified the condition t.SALARY >10000. What is the meaning of it?

For all the SALARY>10000, display the employees. Here the SALARY is called as bound variable.
Any tuple variable with ‘For All’ (?) or ‘there exists’ (?) condition is called bound variable. Here, for
any range of values of SALARY greater than 10000, the meaning of the condition remains the same.
36
37

Bound variables are those ranges of tuple variables whose meaning will not change if the tuple
variable is replaced by another tuple variable.

In the second example, we have used DEPT_ID= 10. That means only for DEPT_ID = 10 display
employee details. Such variable is called free variable. Any tuple variable without any ‘For All’ or
‘there exists’ condition is called Free Variable. If we change DEPT_ID in this condition to some
other variable, say EMP_ID, the meaning of the query changes.

For example, if we change EMP_ID = 10, then above it will result in different result set. Free
variables are those ranges of tuple variables whose meaning will change if the tuple variable is
replaced by another tuple variable.

2. Domain Relational Calculus (DRC)

o The second form of relation is known as Domain relational calculus. In domain relational calculus,
filtering variable uses the domain of attributes.
o Domain relational calculus uses the same operators as tuple calculus. It uses logical connectives ∧
(and), ∨ (or) and ┓ (not).
o It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.

Notation:

{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}

Where

a1, a2 are attributes


P stands for formula built by inner attributes

In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as done in
TRC, mentioned above).

This is a ‘non-procedural query language’, whereas ‘Relational Algebra’ is a ‘procedural query


language’.

37 on values from an attributes domain, instead of


This language uses domain variables that take
the values for an entire tuple.
38

An expression in the DRC is of the form :

{ < x1, x2, …,xn > / P ( x1, x2, …, xn ) }

Here, < x1, x2, …,xn > ϵ r, where ‘r’ is a relation on ‘n’ attributes and x1, x2, …,xn are
domain variables.

Example-1 : Find the Faculty No, Faculty name, DNo, Qual and salaries of all the faculty, who
are drawing more than 30000.

{ < f, n, d, q, s > / < f, n, d, q, s > ϵ FACULTY-1 ˄ s > 30000 }

Output :
FNo FName DNo Qual Salary
22 Riya 21 Ph.D 55000
24 Priya 22 M. Tech 50000
25 Zayn 22 M. Tech 42000
30 Sia 23 M. Tech 32000
33 Max 24 Ph.D 53000
35 Fred 24 Ph.D 52000
40 Trent 25 M. Tech 34000

38
39

UNIT - III
Syllabus:

Introduction to NoSQL: Introduction, Overview and History of NoSQL Databases – The


Definition of the Four Types of NoSQL Databases, differences between SQL and NoSQL.

Normalization: Pitfalls of RDBD, Lossless join decomposition, functional dependencies,


Armstrong axioms, normalization for relational databases 1st, 2nd and 3rd normal forms, BCNF,
Basic definitions of MVDs and JDs, 4th and 5th normal forms.

Overview and History of NoSQL Databases:

NoSQL Database is used to refer a non-SQL or non relational database.

It provides a mechanism for storage and retrieval of data other than tabular relations model used in
relational databases. NoSQL database doesn't use tables for storing data. It is generally used to store
big data and real-time web applications.

In the early 1970, Flat File Systems are used. Data were stored in flat files and the biggest problems
with flat files are each company implement their own flat files and there are no standards. It is very
difficult to store data in the files, retrieve data from files because there is no standard way to store
data.

Then the relational database was created by E.F. Codd and these databases answered the question of
having no standard way to store data. But later relational database also get a problem that it could not
handle big data, due to this problem there was a need of database which can handle every types of
problems then NoSQL database was developed.

NoSQL, is an approach to database design that enables the storage and querying of data outside the
traditional structures found in relational databases. While it can still store data found within relational
database management systems (RDBMS), it just stores it differently compared to an RDBMS. The
decision to use a relational database versus a non-relational database is largely contextual, and it
varies depending on the use case.

Instead of the typical tabular structure of a relational


39 database, NoSQL databases, house data within
one data structure, such as JSON document. Since this non-relational database design does not
require a schema, it offers rapid scalability to manage large and typically unstructured data sets.
40

NoSQL is also type of distributed database, which means that information is copied and stored on
various servers, which can be remote or local. This ensures availability and reliability of data. If some
of the data goes offline, the rest of the database can continue to run.

Today, companies need to manage large data volumes at high speeds with the ability to scale up
quickly to run modern web applications in nearly every industry. In this era of growth within
cloud, big data, and mobile and web applications, NoSQL databases provide that speed and
scalability, making it a popular choice for their performance and ease of use.

Advantages of NoSQL
o It supports query language.
o It provides fast performance.
o It provides horizontal scalability.

The Definition of the Four Types of NoSQL Databases:


Here are the four main types of NoSQL databases:

1. Document databases.
2. Key-value stores.
3. Column-oriented databases.
4. Graph databases.

1. Document databases:

A document database stores data in JSON, BSON, or XML documents (not Word documents or
Google Docs, of course). In a document database, documents can be nested. Particular elements can
be indexed for faster querying.

Documents can be stored and retrieved in a form that is much closer to the data objects used in
applications, which means less translation is required to use the data in an application. SQL data must
often be assembled and disassembled when moving back and forth between applications and storage.

Document databases are popular with developers because they have the flexibility to rework their
document structures as needed to suit their application, shaping their data structures as their
application requirements change over time. This flexibility speeds development because, in effect,
data becomes like code and is under the control of developers. In SQL databases, intervention by
database administrators may be required to change the structure of a database.

2. Key-value stores:

The simplest type of NoSQL database is a key-value store. Every data element in the database is
40
stored as a key value pair consisting of an attribute name (or "key") and a value. In a sense, a key-
41

value store is like a relational database with only two columns: the key or attribute name (such as
"state") and the value (such as "Alaska").

Use cases include shopping carts, user preferences, and user profiles.

.
3. Column-oriented databases:

While a relational database stores data in rows and reads data row by row, a column store is
organized as a set of columns. This means that when you want to run analytics on a small number of
columns, you can read those columns directly without consuming memory with the unwanted data.
Columns are often of the same type and benefit from more efficient compression, making reads even
faster. Columnar databases can quickly aggregate the value of a given column (adding up the total
sales for the year, for example). Use cases include analytics.

4. Graph databases:

A graph database focuses on the relationship between data elements. Each element is stored as a node
(such as a person in a social media graph). The connections between elements are called links or
relationships. In a graph database, connections are first-class elements of the database, stored
directly. In relational databases, links are implied, using data to express the relationships.

A graph database is optimized to capture and search the connections between data elements,
overcoming the overhead associated with JOINing multiple tables in SQL.

Very few real-world business systems can survive solely on graph queries. As a result graph
databases are usually run alongside other more traditional databases.

Use cases include fraud detection, social networks, and knowledge graphs.

Differences between SQL and NoSQL:

SQL NoSQL

RELATIONAL DATABASE
MANAGEMENT SYSTEM
(RDBMS) Non-relational or distributed database system.

These databases have fixed


or static or predefined
schema They have dynamic schema

These databases are not


suited for hierarchical data These databases are best suited for hierarchical
storage. data storage.
41
42

SQL NoSQL

These databases are best These databases are not so good for complex
suited for complex queries queries

Vertically Scalable Horizontally scalable

Follows CAP(consistency, availability, partition


Follows ACID property tolerance)

Examples: MySQL, Postgre


SQL, Oracle, MS-SQL Examples: MongoDB, GraphQL, HBase, Neo4j
Server, etc , Cassandra, etc

Pitfalls of RDBD:

1. Poor Design/Planning:
Improper planning of the database leaves you with no time to go back and fix errors and leads to
malicious cyber attacks. Therefore, consider sitting down with a paper and drawing a data model as
per business requirements.

Developers can avoid poor planning/design by checking off the following points.

1. Main tables of your database model


2. Names for tables
3. Rules for naming tables
4. Time span required for the project

2. Ignoring Normalization:
Normalization or SQL (Structured Query Language) groups data under a single table and indirectly
related data are put under separate tables. These tables are connected with a logical relationship
between child and parent tables.

Lack of normalization reduces data consistency and leads to duplication of data because entire data
isn’t stored in one place.

3. Redundant Records:
Redundancy in a database is a condition in which the same set of data is stored at two different
places. This means two different spots or two separate fields in software. This is a problem for
developers because they have to keep several versions of data updated.

42
43

Redundant records lead to unnecessary growth in database size, which in turn decreases the
efficiency of the database and causes data corruption. Hence, the best option is to avoid preparing
redundant records, unless it’s essential for backups.

4. Poor Naming Standards:


Naming is a personal choice, however, it is an important aspect of documentation. Poor naming
standards result in messy and large data files, hence consider incorporating consistency.

The purpose of naming is to allow all future developers or programmers to easily understand the
components of databases and what was their use. This saves time for developers and they need not go
through documents to understand the meaning of a name.

5. Lack of Documentation
Lack of documentation leads to the loss of vital information or a tedious handover process to a new
programmer.

Consider documenting everything you know from day one because any documentation is better than
none. Well-organized documentation throughout the project helps to wrap up everything smoothly
and in turn, helps build robust software.

The goal of documentation is to provide information to the support programmer to detect bugs and
fix them. Documentation starts with naming columns, objects, and tables in a database model.

6. One Table to Hold All Domain Values:


The next common pitfall encountered while designing a database is to prepare one table for all the
same values. For example, you have a range of values for varied areas such as order status, account
status, and payment status; each one of them with different values.

7. Ignoring Frequency or Purpose of the Data:


By ignoring the fundamental purpose of data, a designer shifts away from the primary goal of storing
and retrieving data efficiently when needed. Lack of purpose of data results in no knowledge of what
the data represents, at what rate and how it is going to be acquired, what will be the operational
volume, and how to use it.

Clarity of purpose helps to develop database designs, record size, create entities, choose database
engine management policies, and formats. By ignoring these, designs will be flawed fundamentally,
even though they are mathematically and structurally right.

8. Insufficient Indexing:
Insufficient indexing comes from a SQL configuration whose performance is affected due to
improper, excessive, or missing indexes. In case indexes aren’t created properly, the SQL server goes
through more records to retrieve the data that’s requested by the query.

Overall, indexing is a complex decision because too much indexing is bad as little indexing, as it
impacts the final outcome. 43
44

9. Lack of Testing:
The lack of database testing fails to give information on whether the data values stored and received
in the database are valid or not. Testing helps to save transaction data, avoids data loss, and prevents
unauthorized access to information.

Lossless join decomposition:


Lossless-join decomposition is a process in which a relation is decomposed into two or more
relations. This property guarantees that the extra or less tuple generation problem does not occur and
no information is lost from the original relation during the decomposition
When the sub relations combine again then the new relation must be the same as the original relation
was before decomposition.
Lossless join decomposition is a decomposition of a relation R into relations R1, R2 such that if we
perform a natural join of relation R1 and R2, it will return the original relation R. This is effective
in removing redundancy from databases while preserving the original data…
In other words by lossless decomposition, it becomes feasible to reconstruct the relation R from
decomposed tables R1 and R2 by using Joins.
Only 1NF,2NF,3NF and BCNF are valid for lossless join decomposition.
In Lossless Decomposition, we select the common attribute and the criteria for selecting a common
attribute is that the common attribute must be a candidate key or super key in either relation R1,
R2, or both.
Consider a relation R if we decomposed it into sub-parts relation R1 and relation R2.
The decomposition is lossless when it satisfies the following statement −
 If we union the sub Relation R1 and R2 then it must contain all the attributes that are
available in the original relation R before decomposition.

 Intersections of R1 and R2 cannot be Null. The sub relation must contain a common
attribute. The common attribute must contain unique data.

Functional dependencies:

The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.

X → Y

The left side of FD is known as a determinant, the right side of the production is known as a
dependent.

For example:
44
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
45

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if
we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency

1. Trivial functional dependency


oA → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B

Example:

Consider a table with two columns Employee_Id and Employee_Name.


{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial depend
encies too.

2. Non-trivial functional dependency


oA → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

ID → Name,
Name → DOB
45

Armstrong axioms:
46

Armstrong’s Axioms are a set of rules of axioms. It was developed by William W.Armstrong in
1974. It is used to infer all the functional dependencies on a relational database.

Armstrong’s Axioms has mainly two different sets of rules:


1. Primary Rule
2. Secondary Rule

Primary Rule:
 Reflexive Rule
 Augmentation Rule
 Transitive Rule

Reflexive Rule:
If Y ⊆ X then X -> Y if Y is a subset of the attribute of X.
If X -> Y then XZ -> YZ for a pair of tuple t1 and t2.
∴ t1[X] = t2[X] then it implies that
∴ t1[Y] = t2[Y]
on the other hand,
If t1[XZ] = t2[XZ] then it implies that
∴ t1[YZ] = t2[YZ] then
∴ t1[XZ] = t2[YZ]

Augmentation Rule:
if X -> Y
then XZ -> YZ

Transitive Rule:
If X -> Y and Y -> Z
then If X -> Z

Secondary Rule:
 Decomposition Rule
 Union Rule
 Pseudo Transitive Rule

Decomposition Rule:
If X -> YZ
then X -> Y and X -> Z

Union Rule:
If X -> Y and X -> Z
then X -> YZ

Pseudo Transitive Rule:


If X -> YZ and Y -> W
then X -> WZ 46
47

Normalization for relational databases 1st, 2nd and 3rd normal forms:

First Normal Form (1NF):

For a table to be in the First Normal Form, it should follow the following 4 rules:

1. It should only have single(atomic) valued attributes/columns.


2. Values stored in a column should be of the same domain
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.

EXAMPLE:

N
EMPL A JOB STAT HOME
OYEE_ M _CO JO E_CO _STAT
ID E DE B DE E
A
li
c Ch Michig
E001 e J01 ef 26 an
A
li W
c ait Michig
E001 e J02 er 26 an
B W
o ait Wyomi
E002 b J02 er 56 ng
Ba
B rte
o nd Wyomi
E002 b J03 er 56 ng
A
li
c Ch Wyomi
E003 e J01 ef 56 ng

All the entries are atomic and there is a composite primary key (employee_id, job_code) so the table
is in the first normal form (1NF).
This table is not in 2NF.

Second Normal Form (2NF):


47
For a table to be in the Second Normal Form,
48

1. It should be in the First Normal form.


2. And, it should not have Partial Dependency.

The above table can be converted into 2NF:


Example of Second Normal Form (2NF)
employee_roles Table
EMPLOYEE_ID JOB_CODE
E001 J01
E001 J02
E002 J02
E002 J03
E003 J01
employees Table
EMPLOYEE_ID NAME STATE_CODE HOME_STATE
E001 Alice 26 Michigan
E002 Bob 56 Wyoming
Wyoming Wyoming Wyoming Wyoming
jobs table
JOB_CODE JOB
J01 Chef
J02 Waiter
J03 Bartender
home_state is now dependent on state_code. So, if you know the state_code, then you can find
the home_state value.

Third Normal Form (3NF):

A table is said to be in the Third Normal Form when,

1. It is in the Second Normal form.


2. And, it doesn't have Transitive Dependency.
Example of Third Normal Form (3NF)
employee_roles Table
EMPLOYEE_ID JOB_CODE JOB
E001 J01 Chef
48
E001 J02 Waiter
49

EMPLOYEE_ID JOB_CODE JOB


E002 J02 Waiter
E002 J03 Bartender
E003 J01 Chef
employees Table
EMPLOYEE_ID NAME STATE_CODE
E001 Alice 26
E002 Bob 56
E003 Alice 39
jobs Table
JOB_CODE JOB
J01 Chef
J02 Waiter
J03 Bartender
states Table
STATE_CODE HOME_STATE
26 Michigan
56 Wyoming

Now our database is in 3NF.

BCNF: Boyce Codd Normal Form

BCNF (Boyce Codd Normal Form) is the advanced version of 3NF. A table is in BCNF if every
functional dependency X->Y, X is the super key of the table. For BCNF, the table should be in 3NF,
and for every FD. LHS is super key.

Example
Consider a relation R with attributes (student, subject, teacher).

Student Teacher Subject

Jhansi P.Naresh Database

jhansi K.Das C

subbu P.Naresh Database

subbu R.Prasad 49 C
50

F: { (student, Teacher) -> subject


(student, subject) -> Teacher
Teacher -> subject}
Candidate keys are (student, teacher) and (student, subject).
The above relation is in 3NF [since there is no transitive dependency]. A relation R is in BCNF if for
every non-trivial FD X->Y, X must be a key.
The above relation is not in BCNF, because in the FD (teacher->subject), teacher is not a key. This
relation suffers with anomalies −
For example, if we try to delete the student Subbu, we will lose the information that R. Prasad
teaches C. These difficulties are caused by the fact the teacher is determinant but not a candidate key.

Decomposition for BCNF


Teacher-> subject violates BCNF [since teacher is not a candidate key].
If X->Y violates BCNF then divide R into R1(X, Y) and R2(R-Y).
So R is divided into two relations R1(Teacher, subject) and R2(student, Teacher).
R1

Teacher Subject

P.Naresh Database

K.DAS C

R.Prasad C
R2

Student Teacher

Jhansi P.Naresh

Jhansi K.Das

Subbu P.Naresh

Subbu R.Prasad

All the anomalies which were present in R, now removed in the above two relations.

Note
BCNF decomposition does not always satisfy dependency preserving property. After BCNF
decomposition if dependency is not preserved then we have to decide whether we want to remain in
BCNF or rollback to 3NF. This process of rollback is called denormalization.

Basic definitions of MVD: Multivalued dependency


50
The concept of MVD or Multivalued dependency refers to having multiple rows in a given table.
51

Thus, it implies that there is a presence of multiple other rows in the very same table. Thus, a
multivalued dependency would prevent the 4NF. Any multivalued dependency would at least involve
three attributes of any table.

o Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a third attribute
that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(white and
black) of each model every year.

BIKE_MO MANUF_Y COL


DEL EAR OR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of
each other.

In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:

1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL


multidetermined COLOR".

JDs: Join dependency


Join dependency or JD is a constraint that is similar to FD (functional dependency) or MVD
(multivalued dependency). JD is satisfied only when the concerned relation is a join of a specific
number of projections. Thus, such a type of constraint is known as a join dependency.

Join dependency is a constraint which is51similar to functional dependency or multivalued


dependency. It is satisfied if and only if the relation concerned is the join of a certain number of
52

projections. Such type of constraint is called join dependency.

Whenever we can recreate a table by simply joining various tables where each of these tables consists
of a subset of the table’s attribute, then this table is known as a Join Dependency. Thus, it is like a
generalization of MVD. We can relate the JD to 5NF. Herein, a relation can be in 5NF only when it’s
already in the 4NF. Remember that it cannot be further decomposed.

Example:

<Student>

Stu_Name Stu_Skills Stu_Job


(Assigned
Work)

Tag Marketing GK001

Barry PR GK002

Paulo Graphic GK003


Designing

We can decompose the table given above into these three tables given below. And thus, it is not in
the Fifth Normal Form.

<Student_Skills>

Stu_Na Stu_Skil
me ls

Tag Marketin
g

Barry PR

Paulo Graphic
Designin
g

<Student_Job>

Stu_Nam Stu_Jo
e b

Tag GK001
52
Barry GK002
53

Paulo GK002

<Job_Skills>

Stu_Skills Stu_Job

Marketing GK001

PR GK002

Graphic GK003
Designing

Our Join Dependency would be:


{(Stu_Name, Stu_Skills ), ( Stu_Name, Stu_Job), (Stu_Skills, Stu_Job)}

The relations given above have join dependency. Thus, they do not happen to be in 5NF. It means
that the join relation of the three relations given above is equal to the very original relation
<Student>.

OR

o Join decomposition is a further generalization of Multivalued dependencies.


o Ifthe join of R1 and R2 over C is equal to relation R, then we can say that a join dependency (JD)
exists.
o Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B,
C, D).
o Alternatively, R1 and R2 are a lossless decomposition of R.
oA JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join
decomposition.
o The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation R.
o Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.

.
4th and 5th normal forms:

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.

53
54

Example:

STUDENT

ST COU HO
U_I RSE BB
D Y

21 Comp Danci
uter ng

21 Math Singi
ng

34 Chemi Danci
stry ng

74 Biolog Crick
y et

59 Physic Hock
s ey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two


courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued
dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

ST CO
U_ UR
ID SE

21 Com
puter

21 Math

34 Che
mistr
y

74 Biol
54 ogy
55

59 Phys
ics

STUDENT_HOBBY

ST HO
U_ BB
ID Y

21 Dan
cing

21 Sing
ing

34 Dan
cing

74 Cric
ket

59 Hoc
key

Fifth normal form (5NF)


o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

Example

SUB LECT SEME


JEC URER STER
T

Comp Anshika Semeste


uter r1

Comp John Semeste


uter r1
55
Math John Semeste
56

r1

Math Akash Semeste


r2

Chemi Praveen Semeste


stry r1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math
class for Semester 2. In this case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be
taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as
a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

.
P1:

SEME SUB
STER JEC
T

Semeste Comp
r1 uter

Semeste Math
r1

Semeste Chemi
r1 stry

Semeste Math
r2
P2:

SUB LECT
JEC URER
T

Comp Anshika
uter
56
Comp John
57

uter

Math John

Math Akash

Chemi Praveen
stry
P3:

SEMST LECTUR
ER ER

Semester Anshika
1

Semester John
1

Semester John
1

Semester Akash
2

Semester Praveen
1

.
.
Difference between 4NF and 5NF :

S.N
O 4NF 5NF

A relation in 4NF
must also be in A relation in 5NF must
BCNF(Boyce Codd also be in 4NF(Fourth
1. Normal Form). Normal Form).

A relation in 4NF
must not have any A relation in 5NF must
multi-valued not have any join
2. dependency. dependency.

3. A relation in 4NF
57 A relation in 5NF is
may or may not be always in 4NF
58

S.N
O 4NF 5NF

in 5NF.

Fourth Normal Form


is less stronger in Fifth Normal form is
comparison to Fifth more stronger than
4. Normal form. Fourth Normal Form.

If a relation is in
Fourth Normal Form If a relation is in Fifth
then it will have Normal Form then it
5. more redundancy. will less redundancy.

If a relation is in Fifth
Normal Form then it
If a relation is in cannot be decomposed
Fourth Normal Form further into sub-
then it may be relations without any
decomposed further modification in
6. into sub-relations. meaning or facts.

Database Management Systems (DBMS)


UNIT – 4 & 5 Notes
Syllabus:

Transaction Management: Transaction processing, Transaction concept, transaction State,


implementation of atomicity and durability, concurrent executions, serializability, recoverability.
Concurrency Control: Lock-based protocols, time stamp-based protocols, validation-based protocols,
multiple granularities, multi-version schemes, deadlock handling.
58
59

TOPICS:
Transaction processing

Transaction processing ensures that related data is added to or deleted from the database
simultaneously, thus preserving data integrity in your application. In transaction processing, data is
not written to the database until a commit command is issued. When this happens, data is
permanently written to the database.
Examples include systems that manage sales order entry, airline reservations, payroll, employee
records, manufacturing, and shipping.

Transaction processing ensures that transaction are:

 Atomic

Either all database changes for an entire transaction are completed or none of the changes are
completed.

 Consistent

Database changes transform from one consistent database state to another.

 Isolated

Transactions from concurrent applications do not interfere with each other. The updates from a
transaction are not visible to other transactions that execute concurrently until the transaction
commits.

 Durable

Complete database operations are permanently written to the database.

Transaction concept
o The transaction is a set of logically related operation. It contains a group of tasks.

oA transaction is an action or series of actions. It is performed by a single user to perform operations


for accessing the contents of the database.

Transactions refer to a set of operations that are used for performing a set of logical work. Usually, a
transaction means the data present in the DB has changed. Protecting the user data from system
failures is one of the primary uses of DBMS.
59
60

Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account. This
small transaction contains several low-level tasks:

X's Account

1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)

Operations of Transaction:

Following are the main operations of transaction:

Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in
main memory.

Write(X): Write operation is used to write the value back to the database from the buffer.

Let's take an example to debit transaction from an account which consists of following operations:

1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will be 3500.
60
But it may be possible that because of the failure of hardware, software or power, etc. that transaction
may fail before finished all the operations in the set.
61

For example: If in the above transaction, the debit transaction fails after executing operation 2 then
X's value will remain 4000 in the database which is not acceptable by the bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

Transaction State

These are different types of Transaction States:

1. Active State –
When the instructions of the transaction are running then the transaction is in active state. If all the
‘read and write’ operations are performed without any error then it goes to the “partially committed
state”; if any instruction fails, it goes to the “failed state”.

2. Partially Committed –
After completion of all the read and write operation the changes are made in main memory or local
buffer. If the changes are made permanent on the DataBase then the state will change to
“committed state” and in case of failure it will go to the “failed state”.

3. Failed State –
When any instruction of the transaction fails, it goes to the “failed state” or if failure occurs in
making a permanent change of data on Data Base.

4. Aborted State – 61
After having any type of failure the transaction goes from “failed state” to “aborted state” and since
62

in previous states, the changes are only made to local buffer or main memory and hence these
changes are deleted or rolled-back.

5. Committed State –
It is the state when the changes are made permanent on the Data Base and the transaction is
complete and therefore terminated in the “terminated state”.

Implementation of atomicity and durability

The recovery-management component of a database system can support atomicity and durability by a
variety of schemes.

Here we are going to learn about one of the simplest scheme called Shadow copy.

 In the shadow-copy scheme, a transaction that wants to update the database first creates a complete
copy of the database. All updates are done on the new database copy, leaving the original copy, the
shadow copy, untouched. If at any point the transaction has to be aborted, the system merely deletes
the new copy. The old copy of the database has not been affected.
 This scheme is based on making copies of the database, called shadow copies, assumes that only
one transaction is active at a time. The scheme also assumes that the database is simply a file on disk.
A pointer called db-pointer is maintained on disk; it points to the current copy of the database.

If the transaction completes, it is committed as follows:

 First, the operating system is asked to make sure that all pages of the new copy of the database have
been written out to disk. (Unix systems use the flush command for this purpose.)
 After the operating system has written all the pages to disk, the database system updates the pointer
db-pointer to point to the new copy of the database; the new copy then becomes the current copy of
the database. The old copy of the database is then deleted.

Concurrent executions

In a multi-user system, multiple users can access and use the same database at one time, which is
known as the concurrent execution of the database. It means that the same database is executed
simultaneously on a multi-user system by different users.

Most high-performance transactional systems need to run transactions concurrently to meet their
performance requirements. Thus, without concurrency control such systems can neither provide
correct results nor maintain their databases consistently

For example, deposit of money, balance enquiry, reservation of tickets etc. Every transaction starts
with delimiters begin transaction and terminates with end transaction delimiters. The set of
operations within these two delimiters constitute one transaction.
62
Concurrency Control in DBMS is a procedure of managing simultaneous transactions ensuring their
63

atomicity, isolation, consistency, and serializability. Several problems that arise when numerous
transactions execute simultaneously in a random manner are referred to as concurrency control
problems.

Serializability

Serializability is a property of a system describing how different processes operate on shared data.
A system is serializable if its result is the same as if the operations were executed in some
sequential order, meaning there is no overlap in execution. A database management system
(DBMS) can be accomplished by locking data so that no other process can access it while it is
being read or written.

In a database management system (DBMS), serializability requires that transactions appear to


happen in a particular order, even if they execute concurrently. Transactions that are not
serializable may produce incorrect results. There are different types of serializability in DBMS
with different advantages and disadvantages. Here, we'll look closely at two of the most common
types: conflict serializability and view serializability in DBMS.

Types of Serializability

Conflict serializability is a type of serializability in which conflicting operations on the same data
items are executed in an order that preserves database consistency. Each transaction is assigned a
unique number, and the operations within each transaction are executed in order based on that
number. This ensures that no two conflicting operations are executed concurrently. For example,
consider a database with two tables: Customers and Orders. A customer can have multiple orders,
but each order can only be associated with one customer.

View serializability is a type of serializability in which each transaction produces results that are
equivalent to some well-defined sequential execution of all transactions in the system. Unlike
conflict serializability, which focuses on preventing inconsistencies within the database, view
serializability in DBMS focuses on providing users with consistent views of the database.

Recoverability
63
If any transaction that performs a dirty read operation from an uncommitted transaction and also its
64

committed operation becomes delayed till the uncommitted transaction is either committed or
rollback such type of schedules is called as Recoverable Schedules.

Example
Let us consider two transaction schedules as given below −

T1 T2

Read(A)

Write(A)

- Read(A) ///Dirty Read

- Write(A)

Commit

Commit // delayed

The above schedule is a recoverable schedule because of the reasons mentioned below −
 The transaction T2 performs dirty read operation on A.
 The commit operation of transaction T2 is delayed until transaction T1 commits or rollback.
 Transaction commits later.
 In the above schedule transaction T2 is now allowed to commit whereas T1 is not yet committed.
 In this case transaction T1 is failed, and transaction T2 still has a chance to recover by rollback.

Concurrency Control
Lock-based protocols

In this type of protocol, any transaction cannot read or write data until it acquires an appropriate lock
on it. There are two types of lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock, then it can't
update the data on the data item.
64
2. Exclusive lock:
65

o In the exclusive lock, the data item can be both reads as well as written by the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.

There are four types of lock protocols available:

1. Simplistic lock protocol

It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow all
the transactions to get the lock on the data before insert or delete or update on it. It will unlock the
data item after completing the transaction.

2. Pre-claiming Lock Protocol


o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which they need
locks.
o Before initiating an execution of the transaction, it requests DBMS for all the lock on all those data
items.

3. Two-phase locking (2PL)


o The two-phase locking protocol divides the execution phase of the transaction into three parts.
o In the first part, when the execution of the transaction starts, it seeks permission for the lock it
requires.
o In the second part, the transaction acquires all the locks. The third phase is started as soon as the
transaction releases its first lock.

4. Strict Two-phase locking (Strict-2PL)


o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the locks, the
transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock after
using it.

Time stamp-based protocols

The timestamp-ordering protocol ensures serializability among transactions in their conflicting read
and write operations. This is the responsibility of the protocol system that the conflicting pair of tasks
should be executed according to the timestamp values of the transactions.

o The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The
65 order of the transaction creation.
order of transaction is nothing but the ascending
66

o The priority of the older transaction is higher that's why it executes first. To determine the
timestamp of the transaction, this protocol uses system time or logical counter.
o The lock-based protocol is used to manage the order between conflicting pairs among transactions
at the execution time. But Timestamp based protocols start working as soon as a transaction is
created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the
system at 007 times and transaction T2 has entered the system at 009 times. T1 has the higher
priority, so it executes first as it is entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation on
a data.

Validation-based protocols

Validation phase is also known as optimistic concurrency control technique. In the validation based
protocol, the transaction is executed in the following three phases:

1. Read phase: In this phase, the transaction T is read and executed. It is used to read the value
of various data items and stores them in temporary local variables. It can perform all the write
operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against the
actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results are
written to the database or system otherwise the transaction is rolled back.

Multiple granularities

It can be defined as hierarchically breaking up the database into blocks which can be locked. The
Multiple Granularity protocol enhances concurrency and reduces lock overhead. It maintains the
track of what to lock and how to lock.

Let's start by understanding the meaning of granularity.

Granularity: It is the size of data item allowed to lock.

o It can be defined as hierarchically breaking up the database into blocks which can be locked.
o The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
66
o It maintains the track of what to lock and how to lock.
67

o It makes easy to decide either to lock a data item or to unlock a data item. This type of hierarchy
can be graphically represented as a tree.

For example: Consider a tree which has four levels of nodes.

o The first level or higher level shows the entire database.


o The second level represents a node of type area. The higher level database consists of exactly these
areas.
o The area consists of children nodes which are known as files. No file can be present in more than
one area.
o Finally, each file contains child nodes known as records. The file has exactly those records that are
its child nodes. No records represent in more than one file.
o Hence, the levels of the tree starting from the top level are as follows:
1. Database
2. Area
3. File
4. Record

67
68

Multi-version schemes

Multi-version protocol minimizes the delay for reading operation and maintains different versions of
data items. For each writes operation performed, it creates a new version of transaction data so that
whenever any transaction performs read operation to read that data then the appropriate created data
version is selected by the control manager to make that read operation conflict-free and successful.

When the write operation and new version of data is created then that new version contains some
information that is given below

1. Content: This field contains the data value of that version.


2. Write_timestamp: This field contains the timestamp of the transaction whose new version
is created.
3. Read_timestamp: This field contains the timestamp of the transaction of that transaction
that will read that newly created value.

Deadlock handling

68
69

A deadlock is a condition where two or more transactions are waiting indefinitely for one another to
give up locks. Deadlock is said to be one of the most feared complications in DBMS as no task ever
gets finished and is in waiting state forever.

For example: In the student table, transaction T1 holds a lock on some rows and needs to update
some rows in the grade table. Simultaneously, transaction T2 holds locks on some rows in the grade
table and needs to update the rows in the Student table held by Transaction T1.

Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and similarly,
transaction T2 is waiting for T1 to release its lock. All activities come to a halt state and remain at a
standstill. It will remain in a standstill until the DBMS detects the deadlock and aborts one of the
transactions.

Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database rather than
aborting or restating the database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A method like
"wait for graph" is used for detecting the deadlock situation but this method is suitable only for the
smaller database. For the larger database, deadlock prevention method can be used.

Deadlock Detection

In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should detect
whether the transaction is involved in a deadlock or not. The lock manager maintains a Wait for the
graph to detect the deadlock cycle in the database.

UNIT
69 –5
70

Syllabus:

Data Storage: Overview of physical storage media, magnetic disks, storage access, file
organization, organization of records in files.
Indexing and Hashing: Basic concepts, types of indexing, difference between B and B+ Indexing,
static hashing, Dynamic Hashing.

Overview of physical storage media:

Several types of data storage exist in most computer systems. They vary in speed of access, cost per
unit of data, and reliability.

o Cache: most costly and fastest form of storage. Usually very small, and managed by the
operating system.
o
o Main Memory (MM): the storage area for data available to be operated on.
 General-purpose machine instructions operate on main memory.
 Contents of main memory are usually lost in a power failure or ``crash''.
 Usually too small (even with megabytes) and too expensive to store the entire
database.

o Flash memory: EEPROM (electrically erasable programmable read-only memory).


 Data in flash memory survive from power failure.
 Reading data from flash memory takes about 10 nano-secs (roughly as fast as from
main memory), and writing data into flash memory is more complicated: write-once takes about 4-10
microsecs.
 To overwrite what has been written, one has to first erase the entire bank of the
memory. It may support only a limited number of erase cycles ( to ).
 It has found its popularity as a replacement for disks for storing small volumes of
data (5-10 megabytes).

o Magnetic-disk storage: primary medium for long-term storage.


 Typically the entire database is stored on disk.
 Data must be moved from disk to main memory in order for the data to be operated
on.
 After operations are performed, data must be copied back to disk if any changes
were made.
 Disk storage is called direct access storage as it is possible to read data on the disk
in any order (unlike sequential access).
 Disk storage usually survives 70
power failures and system crashes.

71

o Optical storage: CD-ROM (compact-disk read-only memory), WORM (write-once read-


many) disk (for archival storage of data), and Juke box (containing a few drives and numerous disks
loaded on demand).
o
o Tape Storage: used primarily for backup and archival data.
 Cheaper, but much slower access, since tape must be read sequentially from the
beginning.
 Used as protection from disk failures!

Magnetic disks

A magnetic disk is a storage device that can be assumed as the shape of a Gramophone record. This
disk is coated on both sides with a thin film of Magnetic material. This magnetic material has the
property that it can store either ‘1’ or ‘0] permanently. The magnetic material has square loop
hysteresis (curve) which can remain in one out of two possible directions which correspond to binary
‘1’ or ‘0’.
Bits are saved in the magnetized surface in marks along concentric circles known as tracks. The
tracks are frequently divided into areas known as sectors.
In this system, the lowest quantity of data that can be sent is a sector. The subdivision of one disk
surface into tracks and sectors is displayed in the figure.

71
72

Advantages of Magnetic Disk


 Access time − With a magnetic disk, it is achievable to access a record explicitly. Therefore access
time is less in this case.
 Flexibility − Magnetic disk has to be the flexibility of being used as a sequential as well as direct
access storage device.
 Transmission Speed − The rate of data transfer is fast in a magnetic disk.
 Reusable − It can remove a specific data and save another data at the same place.
 Storage Capacity − It can store a very large amount of data.

Storage access

Storage Access. A database is mapped into a number of different files, which are maintained by the
underlying operating system. Files are organized into block and a block may contain one or more
data item. A major goal of the DBMS is to minimize the number of block transfers between the disk
and memory.

Data access refers to a user's ability to access or retrieve data stored within a database or other
repository. Users who have data access can store, retrieve, move or manipulate stored data, which can
be stored on a wide range of hard drives and external devices.

1. Each file is partitioned into fixed-length storage units, called blocks, which are the units of
both storage allocation and data transfer.
2. It is desirable to keep as many blocks as possible in main memory. Usually, we cannot keep
all blocks in main memory, so we need to manage the allocation of available main memory space.
3. We need to use disk storage for the database, and to transfer blocks of data between main
memory and disk. We also want to minimize the number of such transfers, as they are time-
consuming.
4. The buffer is the part of main memory available for storage of copies of disk blocks.

File organization

File Organization refers to the logical relationships among various records that constitute the file,
particularly with respect to the means of identification and access to any specific record. In simple
terms, Storing the files in certain order is called file Organization. File Structure refers to the
format of the label and data blocks and of any72
logical control record.
73

File Organization
o The File is a collection of records. Using the primary key, we can access the records. The type and
frequency of access can be determined by the type of file organization which was used for a given set
of records.
o File organization is a logical relationship among various records. This method defines how file
records are mapped onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms of blocks, and
the blocks are placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store only one fixed
length record in any given file. An alternative approach is to structure our files so that we can contain
multiple lengths for records.
o Files of fixed length records are easier to implement than the files of variable length records.

Objective of file organization


o It contains an optimal selection of records, i.e., records can be selected as fast as possible.
o To perform insert, delete or update transaction on the records should be quick and easy.
o The duplicate records cannot be induced as a result of insert, update or delete.
o For the minimal cost of storage, records should be stored efficiently.

Organization of records in files

A logical relationship between distinct records is referred to as file organization. This method
specifies how disc blocks are mapped to file records. The word “file organization” refers to the
method by which records are organized into blocks and then placed on a storage media.

There are several ways of organizing records in files.

 Heap file organization. Any record can be placed anywhere in the file where there is space for the
record. There is no ordering of records.

 Sequential file organization. Records are stored in sequential order, based on the value of the
search key of each record.
 Hashing file organization. A hash function is computed on some attribute of each record. The
result of the function specifies in which block of the file the record should be placed -- to be
discussed in chapter 11 since it is closely related to the indexing structure.

 Clustering file organization. Records of several different relations can be stored in the same file.
Related records of the different relations are stored on the same block so that one I/O operation
fetches related records from all the relations.
73
File organization helps:
74

 It has an ideal record selection, which means records can be selected as quickly as feasible.
 Insert, delete, and update transactions on records should be simple and rapid.
 Duplicate records cannot be created by inserting, updating, or deleting records.
 Records should be stored efficiently to save money on storage.
.

Indexing and Hashing

Basic concepts

1. Indexing:

Indexing, as name suggests, is a technique or mechanism generally used to speed up access of data.
Index is basically a type of data structure that is used to locate and access data in database table
quickly. Indexes can easily be developed or created using one or more columns of database table.

2. Hashing:

Hashing, as name suggests, is a technique or mechanism that uses hash functions with search keys
as parameters to generate address of data record. It calculates direct location of data record on disk
without using index structure. A good hash functions only uses one-way hashing algorithm and
hash cannot be converted back into original key. In simple words, it is a process of converting
given key into another value known as hash value or simply hash.

Types of indexing:

1.Ordered Index.
2.Primary Index.
3.Clustering Index.
4.Sparsing Index.

Ordered index

The indices are usually sorted to make searching faster. The indices which are sorted are known as
ordered indices.

Example: Suppose we have an employee table with thousands of record and each of which is 10
bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.

o In the case of a database with no index, we have to search the disk block from starting till it reaches
543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using
74 indexes and the DBMS will read the record after
reading 542*2= 1084 bytes which are very less compared to the previous case.
75

Primary Index
o If the index is created on the basis of the primary key of the table, then it is known as primary
indexing. These primary keys are unique to each record and contain 1:1 relation between the records.
o As primary keys are stored in sorted order, the performance of the searching operation is quite
efficient.
o The primary index can be classified into two types: Dense index and Sparse index.

Sparse index
o In the data file, index record appears only for a few items. Each item points to a block.
o In this, instead of pointing to each record in the main table, the index points to the records in the
main table in a gap.

Clustering Index
oA clustered index can be defined as an ordered data file. Sometimes the index is created on non-
primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get the unique value
and create index out of them. This method is called a clustering index.
o The records which have similar characteristics are grouped, and indexes are created for these group.

Difference between B and B+ Indexing

B+ tree eliminates the drawback B-tree used for indexing by storing data pointers only at the
leaf nodes of the tree. Thus, the structure of leaf nodes of a B+ tree is quite different from the
structure of internal nodes of the B tree

Basis of
Comparison B tree B+ tree

All internal and leaf nodes Only leaf nodes have data
Pointers have data pointers pointers

Since all keys are not available All keys are at leaf nodes,
at leaf, search often takes more hence search is faster and more
Search time. accurate.

Duplicate of keys are


Redundant No duplicate of keys is maintained and all nodes are
Keys maintained in the tree.
75 present at the leaf.
76

Static hashing

Static hashing:

It is a hashing technique that enables users to lookup a definite data set. Meaning, the data in the
directory is not changing, it is "Static" or fixed. In this hashing technique, the resulting number of
data buckets in memory remains constant.
Operations Provided by Static Hashing
Static hashing provides the following operations −
 Delete − Search a record address and delete a record at the same address or delete a chunk of
records from records for that address in memory.
 Insertion − While entering a new record using static hashing, the hash function (h) calculates
bucket address "h(K)" for the search key (k), where the record is going to be stored.
 Search − A record can be obtained using a hash function by locating the address of the bucket
where the data is stored.
 Update − It supports updating a record once it is traced in the data bucket.

Advantages of Static Hashing


Static hashing is advantageous in the following ways −
 Offers unparalleled performance for small-size databases.
 Allows Primary Key value to be used as a Hash Key.

Dynamic Hashing

It is a hashing technique that enables users to lookup a dynamic data set. Means, the data set is
modified by adding data to or removing the data from, on demand hence the name ‘Dynamic’
hashing. Thus, the resulting data bucket keeps increasing or decreasing depending on the number of
records.
In this hashing technique, the resulting number of data buckets in memory is ever-changing.
Operations Provided by Dynamic Hashing
Dynamic hashing provides the following operations −
 Delete − Locate the desired location and support deleting data (or a chunk of data) at that location.
 Insertion − Support inserting new data into the data bucket if there is a space available in the data
bucket.
 Query − Perform querying to compute the bucket address.
 Update − Perform a query to update the data.

Advantages of Dynamic Hashing


Dynamic hashing is advantageous in the following ways −
 It works well with scalable data.
 It can handle addressing large amount of memory in which data size is always changing.
 76
Bucket overflow issue comes rarely or very late.
77

Factor Static Hashing Dynamic Hashing

Fixed-size, non-changing data. Variable-size, changing


Form of Data
data.

The resulting Data Bucket is of The resulting Data Bucket


Result
fixed-length. is of variable-length.

Challenge of Bucket overflow Bucket overflow can occur


Bucket
can arise often depending upon very late or doesn’t occur
Overflow
memory size. at all.

Complexity Simple Complex

77

You might also like