DBMS Notes
DBMS Notes
DBMS Notes
Data Independence:
1. The ability to modify a scheme definition in one level without affecting a scheme definition
in a higher level is called data independence.
2. There are two kinds:
o Physical data independence
The ability to modify the physical scheme without causing application programs to be
rewritten
Modifications at this level are usually to improve performance
o Logical data independence
The ability to modify the conceptual scheme without causing application programs to
be rewritten
Usually done when logical structure of database is altered
3. Logical data independence is harder to achieve as the application programs are usually
heavily dependent on the logical structure of the data. An analogy is made to abstract data
types in programming languages.
Types of Database Users:
Users are differentiated by the way they expect to interact with the system:
1. Application programmers - interact with system through DML calls.
2. Sophisticated users - form requests in a database query language.
3. Specialized users - write specialized database applications that do not fit into the
traditional data processing framework.
4. Naive users - invoke one of the permanent application programs that have been written
previously.
Database Administrator Roles and Responsibilities:
A Database Administrator, Database Analyst or Database Developer is the person responsible for
managing the information within an organization. As most companies continue to experience
inevitable growth of their databases, these positions are probably the most solid within the IT
industry. In most cases, it is not an area that is targeted for layoffs or downsizing. On the
downside, however, most database departments are often understaffed, requiring administrators
to perform a multitude of tasks.
Depending on the company and the department, this role can either be highly specialized or
incredibly diverse. The primary role of the Database Administrator is to adminster, develop, maintain
and implement the policies and procedures necessary to ensure the security and integrity of the
corporate database. Sub roles within the Database Administrator classification may include security,
architecture, warehousing and/or business analysis. Other primary roles will include:
Implementation of data models
Database design
Database accessibility
Performance issues
Capacity issues
Data replication
Table Maintainence
Elements of Database System:
Database schema
Schema objects
Indexes
Tables
Fields and columns
Records and rows
Keys
Relationships
Data types
Database Development Process
The steps in the database development process are as follows:
enterprise modeling,
conceptual data modeling,
project initiation and planning & analysis phases of SDLC,
logical database design,
physical database design and creation,
database implementation, and
Database Maintenance.
Using the SDLC model stage 1 project identification and selection encompasses the enterprise
modeling function. Stage two, project initiation and planning, and stage three, analysis,
encompasses the conceptual data modeling activities. The final four stages, logical and physical
design, implementation and maintenance are the same stages for software and for database
design and development although the activities conducted within each stage are different.
Database maintenance activities for example consist of fine-tuning the database to optimize
performance, adding new data structures and so on. The maintenance of software consists of
fixing minor bugs and making minor modifications to software such as changing the title of a
report.
Database Management System
Database Management System or DBMS in short, refers to the technology of storing and
retrieving users data with utmost efficiency along with safety and security features. DBMS
allows its users to create their own databases which are relevant with the nature of work they
want. These databases are highly configurable and offers bunch of options.
Database is collection of data which is related by some aspect. Data is collection of facts and
figures which can be processed to produce information. Name of a student, age, class and her
subjects can be counted as data for recording purposes.
Mostly data represents recordable facts. Data aids in producing information which is based on
facts. For example, if we have data about marks obtained by all students, we can then conclude
about toppers and average marks etc.
A database management system stores data, in such a way which is easier to retrieve, manipulate
and helps to produce information.
Characteristics
Traditionally data was organized in file formats. DBMS was all new concepts then and all the
research was done to make it to overcome all the deficiencies in traditional style of data
management. Modern DBMS has the following characteristics:
Real-world entity: Modern DBMS are more realistic and uses real world entities to
design its architecture. It uses the behavior and attributes too. For example, a school
database may use student as entity and their age as their attribute.
Relation-based tables: DBMS allows entities and relations among them to form as
tables. This eases the concept of data saving. A user can understand the architecture of
database just by looking at table names etc.
Isolation of data and application: A database system is entirely different than its data.
Where database is said to active entity, data is said to be passive one on which the
database works and organizes. DBMS also stores metadata which is data about data, to
ease its own process.
Less redundancy: DBMS follows rules of normalization, which splits a relation when
any of its attributes is having redundancy in values. Following normalization, which itself
is a mathematically rich and scientific process, make the entire database to contain as less
redundancy as possible.
Consistency: DBMS always enjoy the state on consistency where the previous form of
data storing applications like file processing does not guarantee this. Consistency is a
state where every relation in database remains consistent. There exist methods and
techniques, which can detect attempt of leaving database in inconsistent state.
Query Language: DBMS is equipped with query language, which makes it more
efficient to retrieve and manipulate data. A user can apply as many and different filtering
options, as he or she wants. Traditionally it was not possible where file-processing
system was used.
ACID Properties: DBMS follows the concepts for ACID properties, which stands for
Atomicity, Consistency, Isolation and Durability. These concepts are applied on
transactions, which manipulate data in database. ACID properties maintains database in
healthy state in multi-transactional environment and in case of failure.
Multiuser and Concurrent Access: DBMS support multi-user environment and allows
them to access and manipulate data in parallel. Though there are restrictions on
transactions when they attempt to handle same data item, but users are always unaware of
them.
Multiple views: DBMS offers multiples views for different users. A user who is in sales
department will have a different view of database than a person working in production
department. This enables user to have a concentrate view of database according to their
requirements.
Security: Features like multiple views offers security at some extent where users are
unable to access data of other users and departments. DBMS offers methods to impose
constraints while entering data into database and retrieving data at later stage. DBMS
offers many different levels of security features, which enables multiple users to have
different view with different features. For example, a user in sales department cannot see
data of purchase department is one thing, additionally how much data of sales department
he can see, can also be managed. Because DBMS is not saved on disk as traditional file
system it is very hard for a thief to break the code.
Users
DBMS is used by various users for various purposes. Some may involve in retrieving data and
some may involve in backing it up. Some of them are described as follows:
Administrators: A bunch of users maintain the DBMS and are responsible for
administrating the database. They are responsible to look after its usage and by whom it
should be used. They create users access and apply limitation to maintain isolation and
force security. Administrators also look after DBMS resources like system license,
software application and tools required and other hardware related maintenance.
Designer: This is the group of people who actually works on designing part of database.
The actual database is started with requirement analysis followed by a good designing
process. They people keep a close watch on what data should be kept and in what format.
They identify and design the whole set of entities, relations, constraints and views.
End Users: This group contains the persons who actually take advantage of database
system. End users can be just viewers who pay attention to the logs or market rates or end
users can be as sophisticated as business analysts who takes the most of it.
The design of a Database Management System highly depends on its architecture. It can be
centralized or decentralized or hierarchical. DBMS architecture can be seen as single tier or
multi tier. n-tier architecture divides the whole system into related but independent n modules,
which can be independently modified, altered, changed or replaced.
In 1-tier architecture, DBMS is the only entity where user directly sits on DBMS and uses it.
Any changes done here will directly be done on DBMS itself. It does not provide handy tools for
end users and preferably database designer and programmers use single tier architecture.
If the architecture of DBMS is 2-tier then must have some application, which uses the DBMS.
Programmers use 2-tier architecture where they access DBMS by means of application. Here
application tier is entirely independent of database in term of operation, design and
programming.
3-tier architecture
Most widely used architecture is 3-tier architecture. 3-tier architecture separates it tier from each
other on basis of users. It is described as follows:
Entity-Relationship Model
Entity-Relationship model is based on the notion of real world entities and relationship among
them. While formulating real-world scenario into database model, ER Model creates entity set,
relationship set, general attributes and constraints.
ER Model is best used for the conceptual design of database.
ER Model is based on:
Entities and their attributes
Relationships among entities
These concepts are explained below.
[Image: ER Model]
Entity
An entity in ER Model is real world entity, which has some properties called attributes.
Every attribute is defined by its set of values, called domain.
For example, in a school database, a student is considered as an entity. Student has
various attributes like name, age and class etc.
Relationship
The logical association among entities is called relationship. Relationships are mapped
with entities in various ways. Mapping cardinalities define the number of association
between two entities.
Mapping cardinalities:
o one to one
o one to many
o many to one
o many to many
ER-Model is explained here.
Relational Model
The most popular data model in DBMS is Relational Model. It is more scientific model then
others. This model is based on first-order predicate logic and defines table as an n-ary relation.
If the attributes are composite, they are further divided in a tree like structure. Every node is then
connected to its attribute. That is composite attributes are represented by eclipses that are
connected with an eclipse.
[Image: One-to-one]
One-to-many
When more than one instance of entity is associated with the relationship, it is marked as
'N'. This image below reflects that only 1 instance of entity on the left and more than one
instance of entity on the right can be associated with the relationship. It depicts one-to-
many relationship
[Image: One-to-many]
Many-to-one
When more than one instance of entity is associated with the relationship, it is marked as
'N'. This image below reflects that more than one instance of entity on the left and only
one instance of entity on the right can be associated with the relationship. It depicts
many-to-one relationship
[Image: Many-to-one]
Many-to-many
This image below reflects that more than one instance of entity on the left and more than
one instance of entity on the right can be associated with the relationship. It depicts
many-to-many relationship
[Image: Many-to-many]
Participation Constraints
Total Participation: Each entity in the entity is involved in the relationship. Total
participation is represented by double lines.
Partial participation: Not all entities are involved in the relation ship. Partial
participation is represented by single line.
[Image: Participation Constraints]
ER Model has the power of expressing database entities in conceptual hierarchical manner such
that, as the hierarchical goes up it generalize the view of entities and as we go deep in the
hierarchy it gives us detail of every entity included.
Going up in this structure is called generalization, where entities are clubbed together to
represent a more generalized view. For example, a particular student named, Mira can be
generalized along with all the students, the entity shall be student, and further a student is person.
The reverse is called specialization where a person is student, and that student is Mira.
Generalization
As mentioned above, the process of generalizing entities, where the generalized entities contain
the properties of all the generalized entities is called Generalization. In generalization, a number
of entities are brought together into one generalized entity based on their similar characteristics.
For an example, pigeon, house sparrow, crow and dove all can be generalized as Birds.
[Image: Generalization]
Specialization
Specialization is a process, which is opposite to generalization, as mentioned above. In
specialization, a group of entities is divided into sub-groups based on their characteristics. Take a
group Person for example. A person has name, date of birth, gender etc. These properties are
common in all persons, human beings. But in a company, a person can be identified as employee,
employer, customer or vendor based on what role do they play in company.
[Image: Specialization]
Similarly, in a school database, a person can be specialized as teacher, student or staff; based on
what role do they play in school as entities.
Inheritance
We use all above features of ER-Model, in order to create classes of objects in object oriented
programming. This makes it easier for the programmer to concentrate on what she is
programming. Details of entities are generally hidden from the user, this process known as
abstraction.
One of the important features of Generalization and Specialization, is inheritance, that is, the
attributes of higher-level entities are inherited by the lower level entities.
[Image: Inheritance]
For example, attributes of a person like name, age, and gender can be inherited by lower level
entities like student and teacher etc.
Relational data model is the primary data model, which is used widely around the world for data
storage and processing. This model is simple and have all the properties and capabilities required
to process data with storage efficiency.
Concepts
Tables: In relation data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represent records and
columns represents the attributes.
Tuple: A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance: A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema: This describes the relation name (table name), attributes and their names.
Relation key: Each row has one or more attributes which can identify the row in the relation
(table) uniquely, is called the relation key.
Attribute domain: Every attribute has some pre-defined value scope, known as attribute
domain.
Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions
are called Relational Integrity Constraints. There are three main integrity constraints.
Key Constraints
Domain constraints
Referential integrity constraints
Key Constraints:
There must be at least one minimal subset of attributes in the relation, which can identify a tuple
uniquely. This minimal subset of attributes is called key for that relation. If there are more than
one such minimal subsets, these are called candidate keys.
Key constraints forces that:
in a relation with a key attribute, no two tuples can have identical value for key attributes.
key attribute can not have NULL values.
Key constrains are also referred to as Entity Constraints.
Domain constraints
Attributes have specific values in real-world scenario. For example, age can only be positive
integer. The same constraints has been tried to employ on the attributes of a relation. Every
attribute is bound to have a specific range of values. For example, age can not be less than zero
and telephone number can not be a outside 0-9.
Referential integrity constraints
This integrity constraints works on the concept of Foreign Key. A key attribute of a relation can
be referred in other relation, where it is called foreign key.
Referential integrity constraint states that if a relation refers to an key attribute of a different or
same relation, that key element must exists.
Relational database systems are expected to be equipped by a query language that can assist its
user to query the database instances. This way its user empowers itself and can populate the
results as required. There are two kinds of query languages, relational algebra and relational
calculus.
Mapping Entity
An entity is a real world object with some attributes.
Mapping Process (Algorithm):
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | Pune | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
If you want to modify all ADDRESS and SALARY column values in CUSTOMERS table, you
do not need to use WHERE clause and UPDATE query would be as follows:
SQL> UPDATE CUSTOMERS
SET ADDRESS = 'Pune', SALARY = 1000.00;
Now, CUSTOMERS table would have the following records:
+----+----------+-----+---------+---------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+---------+
| 1 | Ramesh | 32 | Pune | 1000.00 |
| 2 | Khilan | 25 | Pune | 1000.00 |
| 3 | kaushik | 23 | Pune | 1000.00 |
| 4 | Chaitali | 25 | Pune | 1000.00 |
| 5 | Hardik | 27 | Pune | 1000.00 |
| 6 | Komal | 22 | Pune | 1000.00 |
| 7 | Muffy | 24 | Pune | 1000.00 |
+----+----------+-----+---------+---------+
The SQL AND and OR operators are used to combine multiple conditions to narrow data in an
SQL statement. These two operators are called conjunctive operators.
These operators provide a means to make multiple comparisons with different operators in the
same SQL statement.
The AND Operator:
The AND operator allows the existence of multiple conditions in an SQL statement's WHERE
clause.
Syntax:
The basic syntax of AND operator with WHERE clause is as follows:
SELECT column1, column2, columnN
FROM table_name
WHERE [condition1] AND [condition2]...AND [conditionN];
You can combine N number of conditions using AND operator. For an action to be taken by the
SQL statement, whether it be a transaction or query, all conditions separated by the AND must
be TRUE.
Example:
Consider the CUSTOMERS table having the following records:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000 AND age is less tan 25 years:
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000 AND age < 25;
This would produce the following result:
+----+-------+----------+
| ID | NAME | SALARY |
+----+-------+----------+
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+----+-------+----------+
The OR Operator:
The OR operator is used to combine multiple conditions in an SQL statement's WHERE clause.
Syntax:
The basic syntax of OR operator with WHERE clause is as follows:
SELECT column1, column2, columnN
FROM table_name
WHERE [condition1] OR [condition2]...OR [conditionN]
You can combine N number of conditions using OR operator. For an action to be taken by the
SQL statement, whether it be a transaction or query, only any ONE of the conditions separated
by the OR must be TRUE.
Example:
Consider the CUSTOMERS table having the following records:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000 OR age is less tan 25 years:
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000 OR age < 25;
This would produce the following result:
+----+----------+----------+
| ID | NAME | SALARY |
+----+----------+----------+
| 3 | kaushik | 2000.00 |
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+----+----------+----------+
Normalization
If a database design is not perfect it may contain anomalies, which are like a bad dream for
database itself. Managing a database with anomalies is next to impossible.
Update anomalies: if data items are scattered and are not linked to each other properly,
then there may be instances when we try to update one data item that has copies of it
scattered at several places, few instances of it get updated properly while few are left with
there old values. This leaves database in an inconsistent state.
Deletion anomalies: we tried to delete a record, but parts of it left undeleted because of
unawareness, the data is also saved somewhere else.
Insert anomalies: we tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring database to consistent state
and free from any kinds of anomalies.
First Normal Form:
This is defined in the definition of relations (tables) itself. This rule defines that all the attributes
in a relation must have atomic domains. Values in atomic domain are indivisible units.
Having seen how privilege can be abused intentionally, let us see how privilege can be abused
unintentionally. A company is providing a “work from home" option to its employees and the
employee takes a backup of sensitive data to work on from his home. This not only violates the
security policies of the organization, but also may result in data security breach if the system at
home is compromised.
Concurrency Control
Concurrency control is a database management systems (DBMS) concept that is used to address
conflicts with the simultaneous accessing or altering of data that can occur with a multi-user
system. Concurrency control, when applied to a DBMS, is meant to coordinate simultaneous
transactions while preserving data integrity. The Concurrency is about to control the multi-user
access of Database
Concurrency Control Locking Strategies
Pessimistic Locking: This concurrency control strategy involves keeping an entity in a
database locked the entire time it exists in the database's memory. This limits or prevents
users from altering the data entity that is locked. There are two types of locks that fall
under the category of pessimistic locking: write lock and read lock. With write lock,
everyone but the holder of the lock is prevented from reading, updating, or deleting the
entity. With read lock, other users can read the entity, but no one except for the lock
holder can update or delete it.
Optimistic Locking: This strategy can be used when instances of simultaneous
transactions, or collisions, are expected to be infrequent. In contrast with pessimistic
locking, optimistic locking doesn't try to prevent the collisions from occurring. Instead, it
aims to detect these collisions and resolve them on the chance occasions when they
occur.
Pessimistic locking provides a guarantee that database changes are made safely. However, it
becomes less viable as the number of simultaneous users or the number of entities involved in a
transaction increase because the potential for having to wait for a lock to release will increase.
Optimistic locking can alleviate the problem of waiting for locks to release, but then users have
the potential to experience collisions when attempting to update the database.