DBMS Notes: Database: Database Is A Collection of Inter-Related Data Which Helps in Efficient

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22
At a glance
Powered by AI
The key takeaways are that a database organizes data in tables and allows for efficient retrieval, insertion and deletion of data. A DBMS is software that is used to manage databases and allows users to define, manipulate and retrieve data.

Some advantages of using a DBMS over a file system are minimized data redundancy and inconsistency, simplified data access, multiple data views, data security, concurrent access to data, and backup and recovery mechanisms.

The three levels in a DBMS are the physical level, conceptual/logical level, and external/view level. The physical level stores location information, the conceptual level represents data as tables, and the external level specifies views of data for users.

DBMS Notes

Database: Database is a collection of inter-related data which helps in efficient


retrieval, insertion and deletion of data from database and organizes the data in
the form of tables, views, schemas, reports etc. For Example, university database
organizes the data about students, faculty, and admin staff etc. which helps in
efficient retrieval, insertion and deletion of data from it.

DDL is short name of Data Definition Language, which deals with database schemas
and descriptions, of how the data should reside in the database.

DML is short name of Data Manipulation Language which deals with data
manipulation and includes most common SQL statements such SELECT, INSERT,
UPDATE, DELETE, etc., and it is used to store, modify, retrieve, delete and update
data in a database.

Database Management System: The software which is used to manage database is


called Database Management System (DBMS). For Example, MySQL, Oracle etc. are
popular commercial DBMS used in different applications. DBMS allows users the
following tasks:

Data Definition: It helps in creation, modification and removal of definitions that


define the organization of data in database.
Data Updation: It helps in insertion, modification and deletion of the actual data in
the database.
Data Retrieval: It helps in retrieval of data from the database which can be used by
applications for various purposes.
User Administration: It helps in registering and monitoring users, enforcing data
security, monitoring performance, maintaining data integrity, dealing with
concurrency control and recovering information corrupted by unexpected failure.

Disadvantages of file system


1. Redundancy of data: Data is said to be redundant if same data is copied at many
places.
2. Inconsistency of Data: Data is said to be inconsistent if multiple copies of same data
does not match with each other.
3. Difficult Data Access: A user should know the exact location of file to access data, so
the process is very cumbersome and tedious.
4. Unauthorized Access: File System may lead to unauthorized access to data. If a
student gets access to file having his marks, he can change it in unauthorized way.
5. No Concurrent Access: The access of same data by multiple users at same time is
known as concurrency. File system does not allow concurrency as data can be
accessed by only one user at a time.
6. No Backup and Recovery: File system does not incorporate any backup and recovery
of data if a file is lost or corrupted.

Advantages of DBMS
1. Minimized redundancy and data inconsistency: Data is normalized in DBMS to
minimize the redundancy which helps in keeping data consistent.
2. Simplified Data Access: A user need only name of the relation not exact location to
access data, so the process is very simple.
3. Multiple data views: Different views of same data can be created to cater the needs
of different users. For Example, faculty salary information can be hidden from
student view of data but shown in admin view.
4. Data Security: Only authorized users are allowed to access the data in DBMS. Also,
data can be encrypted by DBMS which makes it secure.
5. Concurrent access to data: Data can be accessed concurrently by different users at
same time in DBMS.
6. Backup and Recovery mechanism: DBMS backup and recovery mechanism helps to
avoid data loss and data inconsistency in case of catastrophic failures.

Levels in DBMS
1. Physical Level: At the physical level, the information about the location of database
objects in the data store is kept
2. Conceptual Level/logical: At conceptual level, data is represented in the form of
various database tables. For Example, STUDENT database may contain STUDENT
and COURSE tables which will be visible to users but users are unaware of their
storage.
3. External Level/views: An external level specifies a view of the data in terms of
conceptual level tables. Ex. A professor might only want to see the marks of the
students and not interested in other details about the students.

Data independence means a change of data at one level should not affect another
level. Two types of data independence are present in this architecture:

1. Physical Data Independence: Any change in the physical location of tables and
indexes should not affect the conceptual level or external view of data.
2. Conceptual Data Independence: The data at conceptual level schema and
external level schema must be independent. This means a change in conceptual
schema should not affect external schema. e.g.; Adding or deleting attributes of
a table should not affect the user’s view of the table.
DBMS Architecture 2-Level, 3-Level
Two tier architecture:
Two tier architecture is similar to a basic client-server model. The application at the
client end directly communicates with the database at the server side. API’s like
ODBC,JDBC are used for this interaction. An advantage of this type is that
maintenance and understanding is easier, compatible with existing systems.
However this model gives poor performance when there are a large number of users.

Three Tier architecture:


In this type, there is another layer between the client and the server. The client does
not directly communicate with the server. Instead, it interacts with an application
server which further communicates with the database system and then the query
processing and transaction management takes place. Advantages:
• Enhanced scalability due to distributed deployment of application servers.
Now,individual connections need not be made between client and server.
• Security is improved. This type of model prevents direct interaction of the client
with the server thereby reducing access to unauthorized data.
Disadvantages:
Increased complexity of implementation and communication. It becomes difficult for
this sort of interaction to take place due to presence of middle layers.

These are seven types of data base users in DBMS.


1. Database Administrator (DBA) :
Database Administrator (DBA) is a person/team who defines the schema and also controls the 3
levels of database.
• DBA also monitors the recovery and back up and provide technical support.
• DBA is also responsible for providing security to the data base and he allows only the authorized
users to access/modify the data base.
• DBA repairs damage caused due to hardware and/or software failures.

2. Naive / Parametric End Users :


Parametric End Users are the unsophisticated who don’t have any DBMS knowledge but they
frequently use the data base applications in their daily life to get the desired results.
For examples, Railway’s ticket booking users are naive users.
3. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They check
whether all the requirements of end users are satisfied.

4. Sophisticated Users :
Sophisticated users can be engineers, scientists, business analyst, who are familiar with the
database. they interact the data base by writing SQL queries directly through the query
processor.
5. Data Base Designers :
Data Base Designers are the users who design the structure of data base which includes tables,
indexes, views, constraints, triggers, stored procedures.
6. Application Program :
Application Program are the back end programmers who writes the code for the application
programs.
7. Casual Users / Temporary Users :
Casual Users are the users who occasionally use/access the data base but each time when they
access the data base they require the new information, for example, Middle or higher le vel
manager.

Trigger: A trigger is a stored procedure in database which automatically invokes whenever a


special event in the database occurs. For example, a trigger can be invoked when a row is
inserted into a specified table or when certain table columns are being updated.
Views in SQL are kind of virtual tables. A view also has rows and columns as they are in a real
table in the database. We can create a view by selecting fields from one or more tables present
in the database. A View can either have all the rows of a table or specific rows based on certain
condition.

Disadvantages of DBMS
1. Increased Cost:
These are different types of costs:
1. Cost of Hardware and Software –
2. Cost of Staff Training –

3. Cost of Data Conversion –


We need to convert our data into a database management system, there is a
requirement of a lot of money as it adds to the cost of the database management
system.
2. Complexity:
5. Frequency Upgrade/Replacement Cycles:

ER MODEL
ER Model is used to model the logical view of the system from data perspective
which consists of these components:
Entity, Entity Type, Entity Set –
An Entity may be an object with a physical existence – a particular person, car,
house, or employee
An Entity is an object of Entity Type and set of all entities is called as entity set.
Attributes are the properties which define the entity type. For example, Roll_No,
Name, DOB, Age, Address, Mobile_No are the attributes which defines entity type
Student. In ER diagram, attribute is represented by an oval.

1. Key Attribute –
The attribute which uniquely identifies each entity in the entity set is called key
attribute.For example, Roll_No will be unique for each student. In ER diagram,
key attribute is represented by an oval with underlying lines.

2. Composite Attribute –
An attribute composed of many other attribute is called as composite attribute

3. Multivalued Attribute –
An attribute consisting more than one value for a given entity.

4. Derived Attribute –
An attribute which can be derived from other attributes of the entity type is
known as derived attribute. e.g.; Age (can be derived from DOB).

The complete entity type Student with its attributes can


be represented as:

Relationship Type and Relationship Set:


A relationship type represents the association between entity types. For
example,‘Enrolled in’ is a relationship type that exists between entity type Student
and Course. In ER diagram, relationship type is represented by a diamond and
connecting the entities with lines.
A set of relationships of same type is known as relationship set. The following
relationship set depicts S1 is enrolled in C2, S2 is enrolled in C1 and S3 is enrolled in
C3.
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as
degree of a relationship set.
1. Unary Relationship –
When there is only ONE entity set participating in a relation, the relationship is
called as unary relationship. For example, one person is married to only one
person.

2. Binary Relationship –
When there are TWO entities set participating in a relation, the relationship is
called as binary relationship.For example, Student is enrolled in Course

3. n-ary Relationship –

Cardinality:
The number of times an entity of an entity set participates in a relationship set is
known as cardinality. Cardinality can be of different types:
1. One to one – When each entity in each entity set can take part only once in the
relationship, the cardinality is one to one. Let us assume that a male can marry
to one female and a female can marry to one male. So the relationship will be
one to one.

2. Many to one – When entities in one entity set can take part only once in the
relationship set and entities in other entity set can take part more than once
in the relationship set, cardinality is many to one. Let us assume that a student
can take only one course but one course can be taken by many students. So the
cardinality will be n to 1. It means that for one course there can be n students
but for one student, there will be only one course.
3. Many to many – When entities in all entity sets can take part more than once
in the relationship cardinality is many to many. Let us assume that a student
can take more than one course and one course can be taken by many students
Participation Constraint:
Participation Constraint is applied on the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the
relationship. If each student must enroll in a course, the participation of student
will be total. Total participation is shown by double line in ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate
in the relationship.
Using set, it can be represented as,

Weak Entity Type and Identifying Relationship:


As discussed before, an entity type has a key attribute which uniquely identifies
each entity in the entity set. But there exists some entity type for which key
attribute can’t be defined. These are called Weak Entity type.
For example, A company may store the information of dependants (Parents,
Children, Spouse) of an Employee. But the dependents don’t have existence
without the employee. So Dependent will be weak entity type and Employee will be
Identifying Entity type for Dependant.
A weak entity type is represented by a double rectangle. The participation of weak
entity type is always total. The relationship between weak entity type and its
identifying strong entity type is called identifying relationship and it is represented
by double diamond.

Enhanced ER Model
complexity of the data is increasing so it becomes more and more difficult to use
the traditional ER model for database modeling. To reduce this complexity of
modeling we have to make improvements or enhancements were made to the
existing ER model to make it able to handle the complex application in a better
way.

Generalization and Specialization –


Specialized classes are often called subclass while a generalized class is called a
superclass, probably inspired by object-oriented programming. A sub-class is best
understood by “IS-A analysis”. Following statements hopefully makes some sense
to your mind “Technician IS-A Employee”, “Laptop IS-A Computer”.
An entity is a specialized type/class of another entity. For example, a Technician is a
special Employee in a university system Faculty is a special class of Employee. We
call this phenomenon generalization/specialization. In the example here Employee
is a generalized entity class while the Technician and Faculty are specialized classes
of Employee.
Generalization –
Generalization is the process of extracting common properties from a set of entities and
create a generalized entity from it. It is a bottom-up approach
Specialization –
In specialization, an entity is divided into sub-entities based on their characteristics. It is
a top-down approach where higher level entity is specialized into two or more lower
level entities.

Aggregation is a collection of different things. It represents has a relationship. It is more specific


than an association. It describes a part-whole or part-of relationship.

Relational Model
Relational Model represents how data is stored in Relational Databases. A
relational database stores data in the form of relations (tables).

• Attribute: Attributes are the properties that define a relation.


e.g.; ROLL_NO, NAME
• Relation Schema: A relation schema represents name of the relation with its
attributes. e.g.; STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is
relation schema for STUDENT. If a schema has more than 1 relation, it is called
Relational Schema.
• Tuple: Each row in the relation is known as tuple. The above relation contains 4
tuples, one of which is shown as:
• Relation Instance: The set of tuples of a relation at a particular instance of time
is called as relation instance.
• Degree: The number of attributes in the relation is known as degree of the
relation.
• Cardinality: The number of tuples in a relation is known as cardinality.
• Column: Column represents the set of values for a particular attribute. NULL
Values: The value which is not known or unavailable is called NULL value. It is
represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is
NULL.
Constraints in Relational Model
While designing Relational Model, we define some conditions which must hold for
data present in database are called Constraints. These constraints are checked
before performing any operation (insertion, deletion and updation) in database. If
there is a violation in any of constrains, operation will fail.
Domain Constraints: These are attribute level constraints. An attribute can only
take values which lie inside the domain range.
Key Integrity: Every relation in the database should have atleast one set of
attributes which defines a tuple uniquely. Those set of attributes is called key. e.g.;
ROLL_NO in STUDENT is a key. No two students can have same roll number. So a
key has two properties:
• It should be unique for all tuples.
• It can’t have NULL values.
Referential Integrity: When one attribute of a relation can only take values from
other attribute of same relation or any other relation, it is called referential
integrity. Let us suppose we have 2 relations

SUPER KEYS:
Any set of attributes that allows us to identify unique rows (tuples) in a given
relation are known as super keys. Out of these super keys we can always choose a
proper subset among these which can be used as a primary key. Such keys are
known as Candidate keys. If there is a combination of two or more attributes which
is being used as the primary key then we call it as a Composite key.

Different Types of Keys in Relational Model

Candidate Key: The minimal set of attribute which can uniquely identify a tuple is
known as candidate key. For Example, STUD_NO in STUDENT relation.
• The value of Candidate Key is unique and non-null for every tuple.
• There can be more than one candidate key in a relation. For Example, STUD_NO
is candidate key for relation STUDENT.

Primary Key: There can be more than one candidate key in relation out of which
one can be chosen as the primary key.

Alternate Key: The candidate key other than the primary key is called an alternate
key.

Foreign Key: If an attribute can only take the values which are present as values of
some other attribute, it will be a foreign key to the attribute to which it refers. The
relation which is being referenced is called referenced relation and the
corresponding attribute is called referenced attribute and the relation which refers
to the referenced relation is called referencing relation and the corresponding
attribute is called referencing attribute. The referenced attribute of the referenced
relation should be the primary key for it. For Example, STUD_NO in
STUDENT_COURSE is a foreign key to STUD_NO in STUDENT relation.

SUPER KEYS:
Any set of attributes that allows us to identify unique rows (tuples) in a given
relation are known as super keys. Out of these super keys we can always choose a
proper subset among these which can be used as a primary key. Such keys are
known as Candidate keys. If there is a combination of two or more attributes which
is being used as the primary key then we call it as a Composite key.

ANOMALIES
An anomaly is an irregularity, or something which deviates from the expected or
normal state. When designing databases, we identify three types of
anomalies: Insert, Update and Delete.
Insertion Anomaly in Referencing Relation:
We can’t insert a row in REFERENCING RELATION if referencing attribute’s value is
not present in referenced attribute value.
Deletion/ Updation Anomaly in Referenced Relation:
We can’t delete or update a row from REFERENCED RELATION if value of
REFERENCED ATTRIBUTE is used in value of REFERENCING ATTRIBUTE
ON DELETE CASCADE: It will delete the tuples from REFERENCING RELATION
if value used by REFERENCING ATTRIBUTE is deleted from REFERENCED RELATION.
ON UPDATE CASCADE: It will update the REFERENCING ATTRIBUTE in REFERENCING
RELATION if attribute value used by REFERENCING ATTRIBUTE is updated in
REFERENCED RELATION.

TODO : Steps to convert ER to Relational Model


-----------------------------------------------
In procedural languages, the program code is written as a sequence of instructions.
User has to specify “what to do” and also “how to do” (step by step
procedure). FORTRAN, COBOL, ALGOL, BASIC, C and Pascal.
In the non-procedural languages, the user has to specify only “what to do” and not
“how to do”. It is also known as an applicative or functional language. SQL, PROLOG,
LISP.
Relational Algebra is procedural query language, which takes Relation as input and
generate relation as output.
Projection is used to project required column data from a relation.
Selection is used to select required tuples of the relations.
Union operation in relational algebra is same as union operation in set theory, only
constraint is for union of two relation both relation must have same set of Attributes.
Set Difference in relational algebra is same set difference operation as in set theory
with the constraint that both relation should have same set of attributes.
Cross product between two relations let say A and B, so cross product between A X B
will results all the attributes of A followed by each attribute of B. Each record of A will
pairs with every record of B.
Natural join between two or more relations will result set of all combination of tuples
where they have equal common attribute.
Conditional join works similar to natural join. In natural join, by default condition is
equal between common attribute while in conditional join we can specify the any
condition such as greater than, less than, not equal

SQL JOIN
An SQL Join is used to combine data from two or more tables, based on a common field
between them. For example, consider the following two tables.
Note: INNER is optional above. Simple JOIN is also considered as INNER JOIN
What is the difference between inner join and outer join?
Outer Join is of 3 types
1) Left outer join
2) Right outer join
3) Full Join
1) Left outer join returns all rows of table on left side of join. The rows for which
there is no matching row on right side, result contains NULL in the right side.
2) Right Outer Join is similar to Left Outer Join (Right replaces Left everywhere)
3) Full Outer Join Contains results of both Left and Right outer joins.

Functional Dependency, Attribute types, Normalization and its types


Functional Dependency
A functional dependency A->B in a relation holds if two tuples having same value of
attribute A also have same value for attribute B.
i.e A uniquely determines B.
For Example, in relation STUDENT shown in table 1, Functional Dependencies
STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE hold
STUD_NAME->STUD_ADDR do not hold
A->B , a can determine b , b is determined by a.
Functional Dependency Set: Functional Dependency set or FD set of a relation is the
set of all FDs present in the relation.
Attribute Closure (A+): Attribute closure of an attribute set(A) can be defined as set of
attributes which can be functionally determined from it.

An attribute that is not part of any candidate key is known as non-prime attribute.
An attribute that is a part of one of the candidate keys is known as prime attribute.

Properties of FD / Armstrong’s Axioms


Reflexivity: If Y is a subset of X, then X → Y.
Augmentation: If X → Y, then XZ → YZ.
Transitivity: If X → Y and Y → Z,
Union : X-> Y & X->Z then X->YZ
Decomposition : X->YZ then X-> Y & X->Z
Composition : x -> y & a -> b then ax -> by
Pseudo transitivity : x -> y & wy ->z then xw -> z

A canonical cover of a set of functional dependencies F is a simplified set of functional


dependencies that has the same closure as the original set F.

Trivial Functional Dependency


X → Y is trivial only when Y is subset of X.

Normalization is the process of minimizing redundancy from a relation or set of


relations. Redundancy in relation may cause insertion, deletion and updation
anomalies. Normal forms are used to eliminate or reduce redundancy in database
tables.

1. First Normal Form –

A relation is in first normal form if every attribute in that relation is singled valued
attribute.

2. Second Normal Form –


To be in second normal form, a relation must be in first normal form and relation
must not contain any partial dependency.
Partial Dependency – If the proper subset of candidate key determines non-prime
attribute, it is called partial dependency.

2NF tries to reduce the redundant data getting stored in memory. F

3. Third Normal Form – A relation is in third normal form, if there is no transitive


dependency for non-prime attributes as well as it is in second normal form.

Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive
dependency.

4. Boyce-Codd Normal Form (BCNF) –

A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is super
key. A relation is in BCNF iff in every non-trivial functional dependency X –> Y, X is a
super key.
1. BCNF is free from redundancy.
2. If a relation is in BCNF, then 3NF is also also satisfied.
3. If all attributes of relation are prime attribute, then the relation is always in
3NF.

Normalization is done to reduce inconsistency, to remove anomalies(update, insert,


delete)

Decomposition of relation types:

The process of breaking up of a relation into smaller subrelations is called


Decomposition. Decomposition is required in DBMS to convert a relation into specific
normal form which further reduces redundancy, anomalies and inconsistency in the
relation.

Lossless join decomposition is a decomposition of a relation R into relations R1,R2


such that if we perform natural join of two smaller relations it will return the original
relation. This is effective in removing redundancy from databases while preserving
the original data.

In Lossless Decomposition we select the common element and the criteria for
selecting common element is that the common element must be a candidate key or
super key in either of relation R1/R2 or both.
Lossy decomposition : The decompositions R1, R2, R2…Rn for a relation schema R are
said to be Lossy if there natural join results into addition of extraneous(spurious) tuples
with the the original relation R.

Lossy here means inconsistency in the database.

For 5th Normal Form , there should be lossless decomposition.

Denormalization is a database optimization technique in which we add redundant


data to one or more tables. This can help us avoid costly joins in a relational database.
Note that denormalization does not mean not doing normalization. It is an
optimization technique that is applied after doing normalization.

Concurrency Control in DBMS

Concurrency Control deals with interleaved execution of more than one


transaction. Much like race condition in OS.

A transaction is a single logical unit of work which accesses and possibly modifies the
contents of a database. Transactions access data using read and write operations.

Database operations to prevent inconsistency:


Commit: After all instructions of a transaction are successfully executed, the changes
made by transaction are made permanent in the database.
Rollback: If a transaction is not able to execute all operations successfully, all the
changes made by transaction are undone.

Properties of transactions:

Atomicity
By this, we mean that either the entire transaction takes place at once or doesn’t
happen at all. There is no midway i.e. transactions do not occur partially. Each
transaction is considered as one unit and either runs to completion or is not executed
at all. It involves the following two operations.
—Abort: If a transaction aborts, changes made to database are not visible.
—Commit: If a transaction commits, changes made are visible.

Consistency
This means that integrity constraints must be maintained so that the database is
consistent before and after the transaction. It refers to the correctness of a database.
Isolation
This property ensures that multiple transactions can occur concurrently without
leading to the inconsistency of database state. Transactions occur independently
without interference. Changes occurring in a particular transaction will not be visible
to any other transaction until that particular change in that transaction is written to
memory or has been committed. This property ensures that the execution of
transactions concurrently will result in a state that is equivalent to a state achieved
these were executed serially in some order.

Durability:
This property ensures that once the transaction has completed execution, the updates
and modifications to the database are stored in and written to disk and they persist
even if a system failure occurs. These updates now become permanent and are stored
in non-volatile memory.

A schedule is a series of operations from one or more transactions. A schedule can


be of two types:

• Serial Schedule: When one transaction completely executes before starting


another transaction, the schedule is called serial schedule. A serial schedule is
always consistent.
• Concurrent Schedule: When operations of a transaction are interleaved with
operations of other transactions of a schedule, the schedule is called Concurrent
schedule. So we need concurrency control here. 2 types : serializable and non
serializable.

Locking in DBMS

Locking protocols are used in database management systems as a means of


concurrency control. Multiple transactions may request a lock on a data item
simultaneously. Hence, we require a mechanism to manage the locking requests made
by transactions. Such a mechanism is called as Lock Manager.
The data structure required for implementation of locking is called as Lock table.
1. It is a hash table where name of data items are used as hashing index.
2. Each locked data item has a linked list associated with it.
3. Every node in the linked list represents the transaction which requested for
lock, mode of lock requested (mutual/exclusive) and current status of the
request (granted/waiting).
4. Every new lock request for the data item will be added in the end of linked list
as a new node.
5. Collisions in hash table are handled by technique of separate chaining.
The log is a sequence of log records, recording all the update activities in the
database.

Because all database modifications must be preceded by creation of log record, the
system has available both the old value prior to modification of data item and new
value that is to be written for data item.
1. Undo: using a log record sets the data item specified in log record to old value.
2. Redo: using a log record sets the data item specified in log record to new value.
The database can be modified using two approaches –
1. Deferred Modification Technique: If the transaction does not modify the
database until it has partially committed, it is said to use deferred modification
technique.
2. Immediate Modification Technique: If database modification occur while
transaction is still active, it is said to use immediate modification technique.

Use of Checkpoints –
When a system crash occurs, user must consult the log. In principle, that need to
search the entire log to determine this information. There are two major difficulties
with this approach:
1. The search process is time-consuming.
2. Most of the transactions that, according to our algorithm, need to be redone
have already written their updates into the database. Although redoing them
will cause no harm, it will cause recovery to take longer.

What is a Checkpoint ?
The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed. During transaction execution,
such checkpoints are traced. After execution, transaction log files will be created.

It is faster than undoing the transactions.

Dirty Reads –
When a transaction is allowed to read a row that has been modified by an another
transaction which is not committed yet that time Dirty Reads occurred. It is mainly
occurred because of multiple transaction at a time which is not committed.

1. Conflict Serializable:
A schedule is called conflict serializable if it can be transformed into a serial
schedule by swapping non-conflicting operations. Two operations are said to be
conflicting if all conditions satisfy:
• They belong to different transactions
• They operate on the same data item
• At Least one of them is a write operation
2. View Serializable:
A Schedule is called view serializable if it is view equal to a serial schedule (no
overlapping transactions), i.e. some arrangement of transactions is serializable.
A conflict schedule is a view serializable but if the serializability contains blind
writes, then the view serializable does not conflict serializable. If no blind write
then surely not view serializable.
View equivalence conditions:
S1 and S2 are schedules , T is transactions in that schedules
1) Initial Read
If a transaction T1 reading data item A from database in S1 then in S2 also T1
should read A from database.

2)Updated Read
If Ti is reading A which is updated by Tj in S1 then in S2 also Ti should read A which
is updated by Tj.
3)Final Write operation
If a transaction T1 updated A at last in S1, then in S2 also T1 should perform final
write operations.
Cascading schedule : When there is a failure in one transaction and this leads to the
rolling back or aborting other dependent transactions, then such scheduling is
referred to as Cascading rollback or cascading abort

Cascadeless Schedule:
Schedules in which transactions read values only after all transactions whose changes
they are going to read commit are called cascadeless schedules. Avoids that a single
transaction abort leads to a series of transaction rollbacks.
Transaction Isolation Levels in DBMS
1. Read Uncommitted – Read Uncommitted is the lowest isolation level. In this
level, one transaction may read not yet committed changes made by other
transaction, thereby allowing dirty reads. In this level, transactions are not
isolated from each other.
2. Read Committed – This isolation level guarantees that any data read is
committed at the moment it is read. Thus it does not allows dirty read.
3. Repeatable Read – This is the most restrictive isolation level. The transaction
holds read locks on all rows it references and writes locks on all rows it inserts,
updates, or deletes. Since other transaction cannot read, update or delete these
rows, consequently it avoids non-repeatable read.
4. Serializable – This is the Highest isolation level. A serializable execution is
guaranteed to be serializable.
Database recovery:
• From Backup
• From checkpoint
• From the logs
• Undo the transactions
• Caching/Buffering

Starvation or Livelock is the situation when a transaction has to wait for a indefinite
period of time to acquire a lock.
Reasons of Starvation –
• If waiting scheme for locked items is unfair. ( priority queue )

Soln :
• Increasing priority of waiting transaction
• Using different algorithm such as FCFS, non priority based.

In a database, a deadlock is an unwanted situation in which two or more transactions


are waiting indefinitely for one another to give up locks. Deadlock is said to be one of
the most feared complications in DBMS as it brings the whole system to a Halt.
Solns similar to OS : deadlock prevention, release resources from transactions and roll
back changes.

SQL vs NoSQL

SQL NoSQL

Non-relational or distributed database


RELATIONAL DATABASE MANAGEMENT system. Data can be stored in the form
SYSTEM (RDBMS) Documents(json) or graph formats as well.

These databases have fixed or static or


predefined schema They have dynamic schema.

These databases are not suited for These databases are best suited for
hierarchical data storage. hierarchical data storage.

These databases are best suited for These databases are not so good for
complex queries complex queries

Vertically Scalable Horizontally scalable


SQL NoSQL

Follows CAP(consistency, availability,


Follows ACID property partition tolerance)

Queries used for accessing the data. API’s fetch data.

Example of SQL : MySQL


Example of NoSQL : MongoDB (data stored in JSON format in name value pairs)

RAID, or “Redundant Arrays of Independent Disks” is a technique which makes use of


a combination of multiple disks instead of using a single disk for increased
performance, data redundancy or both.
Data redundancy, although taking up extra space, adds to disk reliability. This means,
in case of disk failure, if the same data is also backed up onto another disk, we can
retrieve the data and go on with the operation.

INDEXING
Indexing is a data structure technique which allows you to quickly retrieve
records from a database file. An Index is a small table having only two columns.
The first column comprises a copy of the primary or candidate key of a table. Its
second column contains a set of pointers for holding the address of the disk block
where that specific key value stored.

It helps you to reduce the total number of I/O operations needed to retrieve that
data, so you don't need to access a row in the database from an index structure.

An index -

• Takes a search key as input


• Efficiently returns a collection of matching records.
Primary Index

Primary Index is an ordered file which is fixed length size with two fields. The first
field is the same a primary key and second, filed is pointed to that specific data
block. In the primary Index, there is always one to one relationship between the
entries in the index table.

The primary Indexing in DBMS is also further divided into two types.

• Dense Index : a record is created for every search key valued in the
database. This helps you to search faster but needs more space to store
index records.
• Sparse Index : It is an index record that appears for only some of the values
in the file. In this method of indexing technique, pointer points to some key
in a block , and using that other keys can be obtained using that block
address.

Clustering Index

Used when we have ordered data but non keys, i.e. repetitive entries. The data file
is ordered on a non-key field. Records with similar characteristics are grouped
together and indexes are created for these groups. Sparse Index table is used.

Non-clustered or Secondary Indexing


A non clustered index just tells us where the data lies, i.e. it gives us a list of virtual
pointers or references to the location where the data is actually stored.

The actual data here(information on each page of the book) is not organized but we
have an ordered reference(contents page) to where the data points actually lie.
We can have only dense ordering as data is not ordered.It requires more time than
other indexing.

Secondary Primary
Multilevel Indexing in Database is created when a primary index does not fit in
memory. Index is divided into several other indexes and we maintain a sparse
table to refer it.

B-Tree is a self-balancing search tree. In most of the other self-balancing search trees
(like AVL and Red-Black Trees), it is assumed that everything is in main memory. To
understand the use of B-Trees, we must think of the huge amount of data that cannot
fit in main memory. When the number of keys is high, the data is read from disk in
the form of blocks.
The main idea of using B-Trees is to reduce the number of disk accesses. Most of the
tree operations (search, insert, delete, max, min, ..etc ) require O(h) disk accesses
where h is the height of the tree.
-tree is a fat tree. The height of B-Trees is kept low by putting maximum possible keys
in a B-Tree node.
Generally, the B-Tree node size is kept equal to the disk block size.

WHY B+ Trees

B-Trees stores the data pointer (a pointer to the disk file block containing the key
value), corresponding to a particular key value, along with that key value in the node
of a B-tree. This technique, greatly reduces the number of entries that can be packed
into a node of a B-tree. B+ tree eliminates the above drawback by storing data
pointers only at the leaf nodes of the tree.

You might also like