DBMS Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 53

Database:

A database is a collection of information that is organized so that it can easily be accessed,


managed, and updated.
Database Management System (DBMS):
DataBase Management System (DBMS) is a software package that allows data to be effectively
stored, retrieved and manipulated and the data stored in a DBMS packege can be accessed by
multiple users and by multiple application programs like (SQL Server, Oracle, Ms-Access) .
Comparison of Traditional File-Based Approach and Database Approach
At the beginning, you should understand the rationale of replacing the traditional file-based
system with the database system.
File-based System
File-based systems were an early attempt to computerize the manual filing system. File-based
system is a collection of application programs that perform services for the end-users. Each
program defines and manages its data.
However, five types of problem are occurred in using the file-based approach:
1. Separation and isolation of data
When data is isolated in separate files, it is more difficult for us to access data that should be
available. The application programmer is required to synchronize the processing of two or more
files to ensure the correct data is extracted.
2. Duplication of data
When employing the decentralized file-based approach, the uncontrolled duplication of data is
occurred. Uncontrolled duplication of data is undesirable because:
i. Duplication is wasteful
ii. Duplication can lead to loss of data integrity
3. Data dependence
Using file-based system, the physical structure and storage of the data files and records are
defined in the application program code. This characteristic is known as program-data
dependence. Making changes to an existing structure are rather difficult and will lead to a
modification of program. Such maintenance activities are time-consuming and subject to error.
4. Incompatible file formats
The structures of the file are dependent on the application programming language. However file
structure provided in one programming language such as direct file, indexed-sequential file
which is available in COBOL programming, may be different from the structure generated by
other programming language such as C. The direct incompatibility makes them difficult to
process jointly.
Fixed queries / proliferation of application programs
File-based systems are very dependent upon the application programmer. Any required queries
or reports have to be written by the application programmer. Normally, a fixed format query or
report can only be entertained and no facility for ad-hoc queries if offered.
Database Approach:
In order to overcome the limitations of the file-based approach, the concept of database and the
Database Management System (DMS) was emerged in 60s.
Advantages
A number of advantages of applying database approach in application system are obtained
including:
1. Control of data redundancy
The database approach attempts to eliminate the redundancy by integrating the file. Although
the database approach does not eliminate redundancy entirely, it controls the amount of
redundancy inherent in the database.
2. Data consistency
By eliminating or controlling redundancy, the database approach reduces the risk of
inconsistencies occurring. It ensures all copies of the data are kept consistent.
3. More information from the same amount of data
With the integration of the operated data in the database approach, it may be possible to derive
additional information for the same data.
4. Sharing of data
Database belongs to the entire organization and can be shared by all authorized users.
5. Improved data integrity
Database integrity provides the validity and consistency of stored data. Integrity is usually
expressed in terms of constraints, which are consistency rules that the database is not permitted
to violate.
6. Improved security
Database approach provides a protection of the data from the unauthorized users. It may take the
term of user names and passwords to identify user type and their access right in the operation
including retrieval, insertion, updating and deletion.
7. Enforcement of standards
The integration of the database enforces the necessary standards including data formats, naming
conventions, documentation standards, update procedures and access rules.
8. Economy of scale
Cost savings can be obtained by combining all organization's operational data into one database
with applications to work on one source of data.
9. Balance of conflicting requirements
By having a structural design in the database, the conflicts between users or departments can be
resolved. Decisions will be based on the base use of resources for the organization as a whole
rather that for an individual entity.
10. Improved data accessibility and responsiveness
By having an integration in the database approach, data accessing can be crossed departmental
boundaries. This feature provides more functionality and better services to the users.
11. Increased productivity
The database approach provides all the low-level file-handling routines. The provision of these
functions allows the programmer to concentrate more on the specific functionality required by
the users. The fourth-generation environment provided by the database can simplify the database
application development.
12. Improved maintenance
Database approach provides a data independence. As a change of data structure in the database
will be affect the application program, it simplifies database application maintenance.
13. Increased concurrency
Database can manage concurrent data access effectively. It ensures no interference between
users that would not result any loss of information nor loss of integrity.
14. Improved backing and recovery services
Modern database management system provides facilities to minimize the amount of processing
that can be lost following a failure by using the transaction approach.
Disadvantages
In split of a large number of advantages can be found in the database approach, it is not without
any challenge. The following disadvantages can be found including:
1. Complexity
Database management system is an extremely complex piece of software. All parties must be
familiar with its functionality and take full advantage of it. Therefore, training for the
administrators, designers and users is required.
2. Size
The database management system consumes a substantial amount of main memory as well as a
large number amount of disk space in order to make it run efficiently.
3. Cost of DBMS
A multi-user database management system may be very expensive. Even after the installation,
there is a high recurrent annual maintenance cost on the software.
4. Cost of conversion
When moving from a file-base system to a database system, the company is required to have
additional expenses on hardware acquisition and training cost.
5. Performance
As the database approach is to cater for many applications rather than exclusively for a particular
one, some applications may not run as fast as before.
6. Higher impact of a failure
The database approach increases the vulnerability of the system due to the centralization. As all
users and applications reply on the database availability, the failure of any component can bring
operations to a halt and affect the services to the customer seriously.
Components of DBMS
A database management system (DBMS) consists of several components. Each component plays
very important role in the database management system environment. The major components of
database management system are:
 Software
 Hardware
 Data
 Procedures
 Database Access Language
Software
The main component of a DBMS is the software. It is the set of programs used to handle the
database and to control and manage the overall computerized database
1. DBMS software itself, is the most important software component in the overall system
2. Operating system including network software being used in network, to share the data of
database among multiple users.
3. Application programs developed in programming languages such as C++, Visual Basic
that are used to to access database in database management system. Each program
contains statements that request the DBMS to perform operation on database. The
operations may include retrieving, updating, deleting data etc . The application program
may be conventional or online workstations or terminals.
Hardware
Hardware consists of a set of physical electronic devices such as computers (together with
associated I/O devices like disk drives), storage devices, I/O channels, electromechanical devices
that make interface between computers and the real world systems etc, and so on. It is impossible
to implement the DBMS without the hardware devices, In a network, a powerful computer with
high data processing speed and a storage device with large storage capacity is required as
database server.
Data
Data is the most important component of the DBMS. The main purpose of DBMS is to process
the data. In DBMS, databases are defined, constructed and then data is stored, updated and
retrieved to and from the databases. The database contains both the actual (or operational) data
and the metadata (data about data or description about data).
Procedures
Procedures refer to the instructions and rules that help to design the database and to use the
DBMS. The users that operate and manage the DBMS require documented procedures on hot use
or run the database management system. These may include.
1. Procedure to install the new DBMS.
2. To log on to the DBMS.
3. To use the DBMS or application program.
4. To make backup copies of database.
5. To change the structure of database.
6. To generate the reports of data retrieved from database.
Database Access Language
The database access language is used to access the data to and from the database. The users use
the database access language to enter new data, change the existing data in database and to
retrieve required data from databases. The user write a set of appropriate commands in a
database access language and submits these to the DBMS. The DBMS translates the user
commands and sends it to a specific part of the DBMS called the Database Jet Engine. The
database engine generates a set of results according to the commands submitted by user, converts
these into a user readable form called an Inquiry Report and then displays them on the screen.
The administrators may also use the database access language to create and maintain the
databases.
The most popular database access language is SQL (Structured Query Language). Relational
databases are required to have a database query language.
Users
The users are the people who manage the databases and perform different operations on the
databases in the database system. There are three kinds of people who play different roles in
database system
1. Application Programmers
2. Database Administrators
3. End-Users
Application Programmers
The people who write application programs in programming languages (such as Visual Basic,
Java, or C++) to interact with databases are called Application Programmer.
Database Administrators
A person who is responsible for managing the overall database management system is called
database administrator or simply DBA.
End-Users
The end-users are the people who interact with database management system to perform
different operations on database such as retrieving, updating, inserting, deleting data etc.
DBMS Architecture:
1. External view: This is a highest level of abstraction as seen by user. This level of abstraction
describes only the part of entire database. It is based on the conceptual model, is the end user
view of data environment. Each external view described by means of a schema called an external
schema or subschema.
2. Conceptual level: At this level of database abstraction all the database entities and the
relationships among them are included. One conceptual view represents the entire database. the
conceptual schema defines ths conceptual view.
3. Internal (physical) level : This lowest level of abstraction. it closest to physical storage
device. It describes how data are actually stored on the storage medium. The internal schema,
which contains the definition of the stored record, the method representing the data fields,
expresses the internal view and the access aids used.

Data Independence:
1. The ability to modify a scheme definition in one level without affecting a scheme definition
in a higher level is called data independence.
2. There are two kinds:
o Physical data independence
 The ability to modify the physical scheme without causing application programs to be
rewritten
 Modifications at this level are usually to improve performance
o Logical data independence
 The ability to modify the conceptual scheme without causing application programs to
be rewritten
 Usually done when logical structure of database is altered
3. Logical data independence is harder to achieve as the application programs are usually
heavily dependent on the logical structure of the data. An analogy is made to abstract data
types in programming languages.
Types of Database Users:
Users are differentiated by the way they expect to interact with the system:
1. Application programmers - interact with system through DML calls.
2. Sophisticated users - form requests in a database query language.
3. Specialized users - write specialized database applications that do not fit into the
traditional data processing framework.
4. Naive users - invoke one of the permanent application programs that have been written
previously.
Database Administrator Roles and Responsibilities:
A Database Administrator, Database Analyst or Database Developer is the person responsible for
managing the information within an organization. As most companies continue to experience
inevitable growth of their databases, these positions are probably the most solid within the IT
industry. In most cases, it is not an area that is targeted for layoffs or downsizing. On the
downside, however, most database departments are often understaffed, requiring administrators
to perform a multitude of tasks.
Depending on the company and the department, this role can either be highly specialized or
incredibly diverse. The primary role of the Database Administrator is to adminster, develop, maintain
and implement the policies and procedures necessary to ensure the security and integrity of the
corporate database. Sub roles within the Database Administrator classification may include security,
architecture, warehousing and/or business analysis. Other primary roles will include:
 Implementation of data models
 Database design
 Database accessibility
 Performance issues
 Capacity issues
 Data replication
 Table Maintainence
Elements of Database System:
 Database schema
 Schema objects
 Indexes
 Tables
 Fields and columns
 Records and rows
 Keys
 Relationships
 Data types
Database Development Process
The steps in the database development process are as follows:
 enterprise modeling,
 conceptual data modeling,
 project initiation and planning & analysis phases of SDLC,
 logical database design,
 physical database design and creation,
 database implementation, and
 Database Maintenance.
Using the SDLC model stage 1 project identification and selection encompasses the enterprise
modeling function. Stage two, project initiation and planning, and stage three, analysis,
encompasses the conceptual data modeling activities. The final four stages, logical and physical
design, implementation and maintenance are the same stages for software and for database
design and development although the activities conducted within each stage are different.
Database maintenance activities for example consist of fine-tuning the database to optimize
performance, adding new data structures and so on. The maintenance of software consists of
fixing minor bugs and making minor modifications to software such as changing the title of a
report.
Database Management System
Database Management System or DBMS in short, refers to the technology of storing and
retrieving users data with utmost efficiency along with safety and security features. DBMS
allows its users to create their own databases which are relevant with the nature of work they
want. These databases are highly configurable and offers bunch of options.
Database is collection of data which is related by some aspect. Data is collection of facts and
figures which can be processed to produce information. Name of a student, age, class and her
subjects can be counted as data for recording purposes.
Mostly data represents recordable facts. Data aids in producing information which is based on
facts. For example, if we have data about marks obtained by all students, we can then conclude
about toppers and average marks etc.
A database management system stores data, in such a way which is easier to retrieve, manipulate
and helps to produce information.
Characteristics
Traditionally data was organized in file formats. DBMS was all new concepts then and all the
research was done to make it to overcome all the deficiencies in traditional style of data
management. Modern DBMS has the following characteristics:
 Real-world entity: Modern DBMS are more realistic and uses real world entities to
design its architecture. It uses the behavior and attributes too. For example, a school
database may use student as entity and their age as their attribute.
 Relation-based tables: DBMS allows entities and relations among them to form as
tables. This eases the concept of data saving. A user can understand the architecture of
database just by looking at table names etc.
 Isolation of data and application: A database system is entirely different than its data.
Where database is said to active entity, data is said to be passive one on which the
database works and organizes. DBMS also stores metadata which is data about data, to
ease its own process.
 Less redundancy: DBMS follows rules of normalization, which splits a relation when
any of its attributes is having redundancy in values. Following normalization, which itself
is a mathematically rich and scientific process, make the entire database to contain as less
redundancy as possible.
 Consistency: DBMS always enjoy the state on consistency where the previous form of
data storing applications like file processing does not guarantee this. Consistency is a
state where every relation in database remains consistent. There exist methods and
techniques, which can detect attempt of leaving database in inconsistent state.
 Query Language: DBMS is equipped with query language, which makes it more
efficient to retrieve and manipulate data. A user can apply as many and different filtering
options, as he or she wants. Traditionally it was not possible where file-processing
system was used.
 ACID Properties: DBMS follows the concepts for ACID properties, which stands for
Atomicity, Consistency, Isolation and Durability. These concepts are applied on
transactions, which manipulate data in database. ACID properties maintains database in
healthy state in multi-transactional environment and in case of failure.
 Multiuser and Concurrent Access: DBMS support multi-user environment and allows
them to access and manipulate data in parallel. Though there are restrictions on
transactions when they attempt to handle same data item, but users are always unaware of
them.
 Multiple views: DBMS offers multiples views for different users. A user who is in sales
department will have a different view of database than a person working in production
department. This enables user to have a concentrate view of database according to their
requirements.
 Security: Features like multiple views offers security at some extent where users are
unable to access data of other users and departments. DBMS offers methods to impose
constraints while entering data into database and retrieving data at later stage. DBMS
offers many different levels of security features, which enables multiple users to have
different view with different features. For example, a user in sales department cannot see
data of purchase department is one thing, additionally how much data of sales department
he can see, can also be managed. Because DBMS is not saved on disk as traditional file
system it is very hard for a thief to break the code.
Users
DBMS is used by various users for various purposes. Some may involve in retrieving data and
some may involve in backing it up. Some of them are described as follows:

 Administrators: A bunch of users maintain the DBMS and are responsible for
administrating the database. They are responsible to look after its usage and by whom it
should be used. They create users access and apply limitation to maintain isolation and
force security. Administrators also look after DBMS resources like system license,
software application and tools required and other hardware related maintenance.
 Designer: This is the group of people who actually works on designing part of database.
The actual database is started with requirement analysis followed by a good designing
process. They people keep a close watch on what data should be kept and in what format.
They identify and design the whole set of entities, relations, constraints and views.
 End Users: This group contains the persons who actually take advantage of database
system. End users can be just viewers who pay attention to the logs or market rates or end
users can be as sophisticated as business analysts who takes the most of it.
The design of a Database Management System highly depends on its architecture. It can be
centralized or decentralized or hierarchical. DBMS architecture can be seen as single tier or
multi tier. n-tier architecture divides the whole system into related but independent n modules,
which can be independently modified, altered, changed or replaced.
In 1-tier architecture, DBMS is the only entity where user directly sits on DBMS and uses it.
Any changes done here will directly be done on DBMS itself. It does not provide handy tools for
end users and preferably database designer and programmers use single tier architecture.
If the architecture of DBMS is 2-tier then must have some application, which uses the DBMS.
Programmers use 2-tier architecture where they access DBMS by means of application. Here
application tier is entirely independent of database in term of operation, design and
programming.
3-tier architecture
Most widely used architecture is 3-tier architecture. 3-tier architecture separates it tier from each
other on basis of users. It is described as follows:

3-tier DBMS architecture


 Database (Data) Tier: At this tier, only database resides. Database along with its query
processing languages sits in layer-3 of 3-tier architecture. It also contains all relations and
their constraints.
 Application (Middle) Tier: At this tier the application server and program, which access
database, resides. For a user this application tier works as abstracted view of database.
Users are unaware of any existence of database beyond application. For database-tier,
application tier is the user of it. Database tier is not aware of any other user beyond
application tier. This tier works as mediator between the two.
 User (Presentation) Tier: An end user sits on this tier. From a users aspect this tier is
everything. He/she doesn't know about any existence or form of database beyond this
layer. At this layer multiple views of database can be provided by the application. All
views are generated by applications, which resides in application tier.
Multiple tier database architecture is highly modifiable as almost all its components are
independent and can be changed independently.
DBMS Data Models
Data model tells how the logical structure of a database is modeled. Data Models are
fundamental entities to introduce abstraction in DBMS. Data models define how data is
connected to each other and how it will be processed and stored inside the system.
The very first data model could be flat data-models where all the data used to be kept in same
plane. Because earlier data models were not so scientific they were prone to introduce lots of
duplication and update anomalies.

Entity-Relationship Model
Entity-Relationship model is based on the notion of real world entities and relationship among
them. While formulating real-world scenario into database model, ER Model creates entity set,
relationship set, general attributes and constraints.
ER Model is best used for the conceptual design of database.
ER Model is based on:
 Entities and their attributes
 Relationships among entities
These concepts are explained below.

[Image: ER Model]
 Entity
An entity in ER Model is real world entity, which has some properties called attributes.
Every attribute is defined by its set of values, called domain.
For example, in a school database, a student is considered as an entity. Student has
various attributes like name, age and class etc.
 Relationship
The logical association among entities is called relationship. Relationships are mapped
with entities in various ways. Mapping cardinalities define the number of association
between two entities.
Mapping cardinalities:
o one to one
o one to many
o many to one
o many to many
ER-Model is explained here.

Relational Model
The most popular data model in DBMS is Relational Model. It is more scientific model then
others. This model is based on first-order predicate logic and defines table as an n-ary relation.

[Image: Table in relational Model]


The main highlights of this model are:
 Data is stored in tables called relations.
 Relations can be normalized.
 In normalized relations, values saved are atomic values.
 Each row in relation contains unique value
 Each column in relation contains values from a same domain.
Database schema
Database schema skeleton structure of and it represents the logical view of entire database. It
tells about how the data is organized and how relation among them is associated. It formulates all
database constraints that would be put on data in relations, which resides in database.
A database schema defines its entities and the relationship among them. Database schema is a
descriptive detail of the database, which can be depicted by means of schema diagrams. All these
activities are done by database designer to help programmers in order to give some ease of
understanding all aspect of database.

[Image: Database Schemas]


Database schema can be divided broadly in two categories:
 Physical Database Schema: This schema pertains to the actual storage of data and its
form of storage like files, indices etc. It defines the how data will be stored in secondary
storage etc.
 Logical Database Schema: This defines all logical constraints that need to be applied on
data stored. It defines tables, views and integrity constraints etc.
Database Instance
It is important that we distinguish these two terms individually. Database schema is the skeleton
of database. It is designed when database doesn't exist at all and very hard to do any changes
once the database is operational. Database schema does not contain any data or information.
Database instances, is a state of operational database with data at any given time. This is a
snapshot of database. Database instances tend to change with time. DBMS ensures that its every
instance (state) must be a valid state by keeping up to all validation, constraints and condition
that database designers has imposed or it is expected from DBMS itself.
If the database system is not multi-layered then it will be very hard to make any changes in the
database system. Database systems are designed in multi-layers as we leant earlier.
Data Independence:
There's a lot of data in whole database management system other than user's data. DBMS
comprises of three kinds of schemas, which is in turn data about data (Meta-Data). Meta-data is
also stored along with database, which once stored is then hard to modify. But as DBMS
expands, it needs to be changed over the time satisfy the requirements of users. But if the whole
data were highly dependent it would become tedious and highly complex.
[Image: Data independence]
Data about data itself is divided in layered architecture so that when we change data at one layer
it does not affect the data layered at different level. This data is independent but mapped on each
other.
Logical Data Independence
Logical data is data about database, that is, it stores information about how data is managed
inside. For example, a table (relation) stored in the database and all constraints, which are
applied on that relation.
Logical data independence is a kind of mechanism, which liberalizes itself from actual data
stored on the disk. If we do some changes on table format it should not change the data residing
on disk.
Physical Data Independence
All schemas are logical and actual data is stored in bit format on the disk. Physical data
independence is the power to change the physical data without impacting the schema or logical
data.
For example, in case we want to change or upgrade the storage system itself, that is, using SSD
instead of Hard-disks should not have any impact on logical data or schemas.
Entity relationship model defines the conceptual view of database. It works around real world
entity and association among them. At view level, ER model is considered well for designing
databases.
Entity
A real-world thing either animate or inanimate that can be easily identifiable and distinguishable.
For example, in a school database, student, teachers, class and course offered can be considered
as entities. All entities have some attributes or properties that give them their identity.
An entity set is a collection of similar types of entities. Entity set may contain entities with
attribute sharing similar values. For example, Students set may contain all the student of a
school; likewise Teachers set may contain all the teachers of school from all faculties. Entities
sets need not to be disjoint.
Attributes
Entities are represented by means of their properties, called attributes. All attributes have values.
For example, a student entity may have name, class, age as attributes.
There exist a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of attributes:
 Simple attribute:
Simple attributes are atomic values, which cannot be divided further. For example,
student's phone-number is an atomic value of 10 digits.
 Composite attribute:
Composite attributes are made of more than one simple attribute. For example, a student's
complete name may have first_name and last_name.
 Derived attribute:
Derived attributes are attributes, which do not exist physical in the database, but there
values are derived from other attributes presented in the database. For example,
average_salary in a department should be saved in database instead it can be derived. For
another example, age can be derived from data_of_birth.
 Single-valued attribute:
Single valued attributes contain on single value. For example: Social_Security_Number.
 Multi-value attribute:
Multi-value attribute may contain more than one values. For example, a person can have
more than one phone numbers, email_addresses etc.
These attribute types can come together in a way like:
 simple single-valued attributes
 simple multi-valued attributes
 composite single-valued attributes
 composite multi-valued attributes
Entity-set and Keys
Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.
For example, roll_number of a student makes her/him identifiable among students.
 Super Key: Set of attributes (one or more) that collectively identifies an entity in an
entity set.
 Candidate Key: Minimal super key is called candidate key that is, supers keys for which
no proper subset are a superkey. An entity set may have more than one candidate key.
 Primary Key: This is one of the candidate key chosen by the database designer to
uniquely identify the entity set.
Relationship
The association among entities is called relationship. For example, employee entity has relation
works_at with department. Another example is for student who enrolls in some course. Here,
Works_at and Enrolls are called relationship.
Relationship Set:
Relationship of similar type is called relationship set. Like entities, a relationship too can have
attributes. These attributes are called descriptive attributes.
Degree of relationship
The number of participating entities in an relationship defines the degree of the relationship.
 Binary = degree 2
 Ternary = degree 3
 n-ary = degree
Mapping Cardinalities:
Cardinality defines the number of entities in one entity set which can be associated to the
number of entities of other set via relationship set.
 One-to-one: one entity from entity set A can be associated with at most one entity of
entity set B and vice versa.
[Image: One-to-one relation]
 One-to-many: One entity from entity set A can be associated with more than one entities
of entity set B but from entity set B one entity can be associated with at most one entity.

[Image: One-to-many relation]


 Many-to-one: More than one entities from entity set A can be associated with at most
one entity of entity set B but one entity from entity set B can be associated with more
than one entity from entity set A.
[Image: Many-to-one relation]
 Many-to-many: one entity from A can be associated with more than one entity from B
and vice versa.

[Image: Many-to-many relation]


Now we shall learn how ER Model is represented by means of ER diagram. Every object like
entity, attributes of an entity, relationship set, and attributes of relationship set can be represented
by tools of ER diagram.
Entity
Entities are represented by means of rectangles. Rectangles are named with the entity set they
represent.

[Image: Entities in a school database]


Attributes
Attributes are properties of entities. Attributes are represented by means of eclipses. Every
eclipse represents one attribute and is directly connected to its entity (rectangle).

[Image: Simple Attributes]

If the attributes are composite, they are further divided in a tree like structure. Every node is then
connected to its attribute. That is composite attributes are represented by eclipses that are
connected with an eclipse.

[Image: Composite Attributes]

Multivalued attributes are depicted by double eclipse.


[Image: Multivalued Attributes]

Derived attributes are depicted by dashed eclipse.

[Image: Derived Attributes]


Relationship
Relationships are represented by diamond shaped box. Name of the relationship is written in the
diamond-box. All entities (rectangles), participating in relationship, are connected to it by a line.
Binary relationship and cardinality
A relationship where two entities are participating, is called a binary relationship. Cardinality is
the number of instance of an entity from a relation that can be associated with the relation.
 One-to-one
When only one instance of entity is associated with the relationship, it is marked as '1'.
This image below reflects that only 1 instance of each entity should be associated with
the relationship. It depicts one-to-one relationship

[Image: One-to-one]
 One-to-many
When more than one instance of entity is associated with the relationship, it is marked as
'N'. This image below reflects that only 1 instance of entity on the left and more than one
instance of entity on the right can be associated with the relationship. It depicts one-to-
many relationship

[Image: One-to-many]
 Many-to-one
When more than one instance of entity is associated with the relationship, it is marked as
'N'. This image below reflects that more than one instance of entity on the left and only
one instance of entity on the right can be associated with the relationship. It depicts
many-to-one relationship

[Image: Many-to-one]
 Many-to-many
This image below reflects that more than one instance of entity on the left and more than
one instance of entity on the right can be associated with the relationship. It depicts
many-to-many relationship

[Image: Many-to-many]

Participation Constraints
 Total Participation: Each entity in the entity is involved in the relationship. Total
participation is represented by double lines.
 Partial participation: Not all entities are involved in the relation ship. Partial
participation is represented by single line.
[Image: Participation Constraints]
ER Model has the power of expressing database entities in conceptual hierarchical manner such
that, as the hierarchical goes up it generalize the view of entities and as we go deep in the
hierarchy it gives us detail of every entity included.
Going up in this structure is called generalization, where entities are clubbed together to
represent a more generalized view. For example, a particular student named, Mira can be
generalized along with all the students, the entity shall be student, and further a student is person.
The reverse is called specialization where a person is student, and that student is Mira.
Generalization
As mentioned above, the process of generalizing entities, where the generalized entities contain
the properties of all the generalized entities is called Generalization. In generalization, a number
of entities are brought together into one generalized entity based on their similar characteristics.
For an example, pigeon, house sparrow, crow and dove all can be generalized as Birds.

[Image: Generalization]
Specialization
Specialization is a process, which is opposite to generalization, as mentioned above. In
specialization, a group of entities is divided into sub-groups based on their characteristics. Take a
group Person for example. A person has name, date of birth, gender etc. These properties are
common in all persons, human beings. But in a company, a person can be identified as employee,
employer, customer or vendor based on what role do they play in company.
[Image: Specialization]
Similarly, in a school database, a person can be specialized as teacher, student or staff; based on
what role do they play in school as entities.
Inheritance
We use all above features of ER-Model, in order to create classes of objects in object oriented
programming. This makes it easier for the programmer to concentrate on what she is
programming. Details of entities are generally hidden from the user, this process known as
abstraction.
One of the important features of Generalization and Specialization, is inheritance, that is, the
attributes of higher-level entities are inherited by the lower level entities.
[Image: Inheritance]
For example, attributes of a person like name, age, and gender can be inherited by lower level
entities like student and teacher etc.
Relational data model is the primary data model, which is used widely around the world for data
storage and processing. This model is simple and have all the properties and capabilities required
to process data with storage efficiency.
Concepts
Tables: In relation data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represent records and
columns represents the attributes.
Tuple: A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance: A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema: This describes the relation name (table name), attributes and their names.
Relation key: Each row has one or more attributes which can identify the row in the relation
(table) uniquely, is called the relation key.
Attribute domain: Every attribute has some pre-defined value scope, known as attribute
domain.
Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions
are called Relational Integrity Constraints. There are three main integrity constraints.
 Key Constraints
 Domain constraints
 Referential integrity constraints
Key Constraints:
There must be at least one minimal subset of attributes in the relation, which can identify a tuple
uniquely. This minimal subset of attributes is called key for that relation. If there are more than
one such minimal subsets, these are called candidate keys.
Key constraints forces that:
 in a relation with a key attribute, no two tuples can have identical value for key attributes.
 key attribute can not have NULL values.
Key constrains are also referred to as Entity Constraints.
Domain constraints
Attributes have specific values in real-world scenario. For example, age can only be positive
integer. The same constraints has been tried to employ on the attributes of a relation. Every
attribute is bound to have a specific range of values. For example, age can not be less than zero
and telephone number can not be a outside 0-9.
Referential integrity constraints
This integrity constraints works on the concept of Foreign Key. A key attribute of a relation can
be referred in other relation, where it is called foreign key.
Referential integrity constraint states that if a relation refers to an key attribute of a different or
same relation, that key element must exists.
Relational database systems are expected to be equipped by a query language that can assist its
user to query the database instances. This way its user empowers itself and can populate the
results as required. There are two kinds of query languages, relational algebra and relational
calculus.
Mapping Entity
An entity is a real world object with some attributes.
Mapping Process (Algorithm):

[Image: Mapping Entity]


 Create table for each entity
 Entity's attributes should become fields of tables with their respective data types.
 Declare primary key
Mapping relationship
A relationship is association among entities.
Mapping process (Algorithm):

[Image: Mapping relationship]


 Create table for a relationship
 Add the primary keys of all participating Entities as fields of table with their respective
data types.
 If relationship has any attribute, add each attribute as field of table.
 Declare a primary key composing all the primary keys of participating entities.
 Declare all foreign key constraints.
Mapping Weak Entity Sets
A weak entity sets is one which does not have any primary key associated with it.
Mapping process (Algorithm):

[Image: Mapping Weak Entity Sets]


 Create table for weak entity set
 Add all its attributes to table as field
 Add the primary key of identifying entity set
 Declare all foreign key constraints
Mapping hierarchical entities
ER specialization or generalization comes in the form of hierarchical entity sets.
Mapping process (Algorithm):

[Image: Mapping hierarchical entities]


 Create tables for all higher level entities
 Create tables for lower level entities
 Add primary keys of higher level entities in the table of lower level entities
 In lower level tables, add all other attributes of lower entities.
 Declare primary key of higher level table the primary key for lower level table
SQL is a programming language for Relational Databases. It is designed over relational algebra
and tuple relational calculus. SQL comes as a package with all major distributions of RDBMS.
SQL comprises both data definition and data manipulation languages. Using the data definition
properties of SQL, one can design and modify database schema whereas data manipulation
properties allows SQL to store and retrieve data from database.
Data definition Language
SQL uses the following set of commands to define database schema:
CREATE
Creates new databases, tables and views from RDBMS
For example:
Create database tutorialspoint;
Create table article;
Create view for_students;
DROP
Drop commands deletes views, tables and databases from RDBMS
Drop object_type object_name;
Drop database tutorialspoint;
Drop table article;
Drop view for_students;
ALTER
Modifies database schema.
Alter object_type object_name parameters;
for example:
Alter table article add subject varchar;
This command adds an attribute in relation article with name subject of string type.
Data Manipulation Language
SQL is equipped with data manipulation language. DML modifies the database instance by
inserting, updating and deleting its data. DML is responsible for all data modification in
databases. SQL contains the following set of command in DML section:
 SELECT/FROM/WHERE
 INSERT INTO/VALUES
 UPDATE/SET/WHERE
 DELETE FROM/WHERE
These basic constructs allows database programmers and users to enter data and information into
the database and retrieve efficiently using a number of filter options.
SELECT/FROM/WHERE
 SELECT
This is one of the fundamental query command of SQL. It is similar to projection
operation of relational algebra. It selects the attributes based on the condition described
by WHERE clause.
 FROM
This clause takes a relation name as an argument from which attributes are to be
selected/projected. In case more than one relation names are given this clause
corresponds to cartesian product.
 WHERE
This clause defines predicate or conditions which must match in order to qualify the
attributes to be projected.
For example:
Select author_name
From book_author
Where age > 50;
This command will project names of author’s from book_author relation whose age is
greater than 50.
INSERT INTO/VALUES
This command is used for inserting values into rows of table (relation).
Syntax is
INSERT INTO table (column1 [, column2, column3 ... ]) VALUES (value1 [, value2, value3 ...
])
Or
INSERT INTO table VALUES (value1, [value2, ... ])
For Example:
INSERT INTO tutorialspoint (Author, Subject) VALUES ("anonymous", "computers");
UPDATE/SET/WHERE
This command is used for updating or modifying values of columns of table (relation).
Syntax is
UPDATE table_name SET column_name = value [, column_name = value ...] [WHERE
condition]
For example:
UPDATE tutorialspoint SET Author="webmaster" WHERE Author="anonymous";
DELETE/FROM/WHERE
This command is used for removing one or more rows from table (relation).
Syntax is
DELETE FROM table_name [WHERE condition];
For example:
DELETE FROM tutorialspoints
WHERE Author="unknown";
Declare foreign key constraints.
Creating a basic table involves naming the table and defining its columns and each column's data
type.
The SQL CREATE TABLE statement is used to create a new table.
Syntax:
Basic syntax of CREATE TABLE statement is as follows:
CREATE TABLE table_name(
column1 datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
PRIMARY KEY( one or more columns )
);
CREATE TABLE is the keyword telling the database system what you want to do. In this case,
you want to create a new table. The unique name or identifier for the table follows the CREATE
TABLE statement.
Then in brackets comes the list defining each column in the table and what sort of data type it is.
The syntax becomes clearer with an example below.
A copy of an existing table can be created using a combination of the CREATE TABLE
statement and the SELECT statement.
Example:
Following is an example, which creates a CUSTOMERS table with ID as primary key and NOT
NULL are the constraints showing that these fields can not be NULL while creating records in
this table:
SQL> CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);
You can verify if your table has been created successfully by looking at the message displayed
by the SQL server, otherwise you can use DESC command as follows:
SQL> DESC CUSTOMERS;
+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | | |
| NAME | varchar(20) | NO | | | |
| AGE | int(11) | NO | | | |
| ADDRESS | char(25) | YES | | NULL | |
| SALARY | decimal(18,2) | YES | | NULL | |
+---------+---------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
Now, you have CUSTOMERS table available in your database which you can use to store
required information related to customers.
Syntax:
There are two basic syntaxes of INSERT INTO statement as follows:
INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)]
VALUES (value1, value2, value3,...valueN);
Here, column1, column2,...columnN are the names of the columns in the table into which you
want to insert data.
You may not need to specify the column(s) name in the SQL query if you are adding values for
all the columns of the table. But make sure the order of the values is in the same order as the
columns in the table. The SQL INSERT INTO syntax would be as follows:
INSERT INTO TABLE_NAME VALUES (value1,value2,value3,...valueN);
Example:
Following statements would create six records in CUSTOMERS table:
INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (1, 'Ramesh', 32, 'Ahmedabad', 2000.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (2, 'Khilan', 25, 'Delhi', 1500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (3, 'kaushik', 23, 'Kota', 2000.00 );
INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (4, 'Chaitali', 25, 'Mumbai', 6500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (5, 'Hardik', 27, 'Bhopal', 8500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (6, 'Komal', 22, 'MP', 4500.00 );
You can create a record in CUSTOMERS table using second syntax as follows:
INSERT INTO CUSTOMERS
VALUES (7, 'Muffy', 24, 'Indore', 10000.00 );
All the above statements would produce the following records in CUSTOMERS table:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Syntax:
There are two basic syntaxes of INSERT INTO statement as follows:
INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)]
VALUES (value1, value2, value3,...valueN);
Here, column1, column2,...columnN are the names of the columns in the table into which you
want to insert data.
You may not need to specify the column(s) name in the SQL query if you are adding values for
all the columns of the table. But make sure the order of the values is in the same order as the
columns in the table. The SQL INSERT INTO syntax would be as follows:
INSERT INTO TABLE_NAME VALUES (value1,value2,value3,...valueN);
Example:
Following statements would create six records in CUSTOMERS table:
INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (1, 'Ramesh', 32, 'Ahmedabad', 2000.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (2, 'Khilan', 25, 'Delhi', 1500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (3, 'kaushik', 23, 'Kota', 2000.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (4, 'Chaitali', 25, 'Mumbai', 6500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (5, 'Hardik', 27, 'Bhopal', 8500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (6, 'Komal', 22, 'MP', 4500.00 );
You can create a record in CUSTOMERS table using second syntax as follows:
INSERT INTO CUSTOMERS
VALUES (7, 'Muffy', 24, 'Indore', 10000.00 );
All the above statements would produce the following records in CUSTOMERS table:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The SQL WHERE clause is used to specify a condition while fetching the data from single table
or joining with multiple tables.
If the given condition is satisfied then only it returns specific value from the table. You would
use WHERE clause to filter the records and fetching only necessary records.
The WHERE clause is not only used in SELECT statement, but it is also used in UPDATE,
DELETE statement, etc., which we would examine in subsequent chapters.
Syntax:
The basic syntax of SELECT statement with WHERE clause is as follows:
SELECT column1, column2, columnN
FROM table_name
WHERE [condition]
You can specify a condition using comparison or logical operators like >, <, =, LIKE, NOT, etc.
Below examples would make this concept clear.
Example:
Consider the CUSTOMERS table having the following records:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000:
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000;
This would produce the following result:
+----+----------+----------+
| ID | NAME | SALARY |
+----+----------+----------+
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+----+----------+----------+
Following is an example, which would fetch ID, Name and Salary fields from the CUSTOMERS
table for a customer with name Hardik. Here, it is important to note that all the strings should be
given inside single quotes ('') where as numeric values should be given without any quote as in
above example:
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE NAME = 'Hardik';
This would produce the following result:
+----+----------+----------+
| ID | NAME | SALARY |
+----+----------+----------+
| 5 | Hardik | 8500.00 |
+----+----------+----------+
The SQL Joins clause is used to combine records from two or more tables in a database. A JOIN
is a means for combining fields from two tables by using values common to each.
Consider the following two tables, (a) CUSTOMERS table is as follows:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
(b) Another table is ORDERS as follows:
+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060 |
+-----+---------------------+-------------+--------+
Now, let us join these two tables in our SELECT statement as follows:
SQL> SELECT ID, NAME, AGE, AMOUNT
FROM CUSTOMERS, ORDERS
WHERE CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

This would produce the following result:


+----+----------+-----+--------+
| ID | NAME | AGE | AMOUNT |
+----+----------+-----+--------+
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+----+----------+-----+--------+
Here, it is noticeable that the join is performed in the WHERE clause. Several operators can be
used to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be
used to join tables. However, the most common operator is the equal symbol.
SQL Join Types:
There are different types of joins available in SQL:
 INNER JOIN: returns rows when there is a match in both tables.
 LEFT JOIN: returns all rows from the left table, even if there are no matches in the right
table.
 RIGHT JOIN: returns all rows from the right table, even if there are no matches in the
left table.
 FULL JOIN: returns rows when there is a match in one of the tables.
 SELF JOIN: is used to join a table to itself as if the table were two tables, temporarily
renaming at least one table in the SQL statement.
 CARTESIAN JOIN: returns the Cartesian product of the sets of records from the two or
more joined tables
The SQL ORDER BY clause is used to sort the data in ascending or descending order, based
on one or more columns. Some database sorts query results in ascending order by default.
Syntax:
The basic syntax of ORDER BY clause is as follows:
SELECT column-list
FROM table_name
[WHERE condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];
You can use more than one column in the ORDER BY clause. Make sure whatever column
you are using to sort, that column should be in column-list.
Example:
Consider the CUSTOMERS table having the following records:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would sort the result in ascending order by NAME and
SALARY:
SQL> SELECT * FROM CUSTOMERS
ORDER BY NAME, SALARY;

This would produce the following result:


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would sort the result in descending order by NAME:
SQL> SELECT * FROM CUSTOMERS
ORDER BY NAME DESC;
This would produce the following result:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
+----+----------+-----+-----------+----------+
The SQL UPDATE Query is used to modify the existing records in a table.
You can use WHERE clause with UPDATE query to update selected rows otherwise all the rows
would be affected.
Syntax:
The basic syntax of UPDATE query with WHERE clause is as follows:
UPDATE table_name
SET column1 = value1, column2 = value2...., columnN = valueN
WHERE [condition];
You can combine N number of conditions using AND or OR operators.
Example:
Consider the CUSTOMERS table having the following records:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would update ADDRESS for a customer whose ID is 6:
SQL> UPDATE CUSTOMERS
SET ADDRESS = 'Pune'
WHERE ID = 6;
Now, CUSTOMERS table would have the following records:

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | Pune | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
If you want to modify all ADDRESS and SALARY column values in CUSTOMERS table, you
do not need to use WHERE clause and UPDATE query would be as follows:
SQL> UPDATE CUSTOMERS
SET ADDRESS = 'Pune', SALARY = 1000.00;
Now, CUSTOMERS table would have the following records:
+----+----------+-----+---------+---------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+---------+
| 1 | Ramesh | 32 | Pune | 1000.00 |
| 2 | Khilan | 25 | Pune | 1000.00 |
| 3 | kaushik | 23 | Pune | 1000.00 |
| 4 | Chaitali | 25 | Pune | 1000.00 |
| 5 | Hardik | 27 | Pune | 1000.00 |
| 6 | Komal | 22 | Pune | 1000.00 |
| 7 | Muffy | 24 | Pune | 1000.00 |
+----+----------+-----+---------+---------+
The SQL AND and OR operators are used to combine multiple conditions to narrow data in an
SQL statement. These two operators are called conjunctive operators.
These operators provide a means to make multiple comparisons with different operators in the
same SQL statement.
The AND Operator:
The AND operator allows the existence of multiple conditions in an SQL statement's WHERE
clause.
Syntax:
The basic syntax of AND operator with WHERE clause is as follows:
SELECT column1, column2, columnN
FROM table_name
WHERE [condition1] AND [condition2]...AND [conditionN];
You can combine N number of conditions using AND operator. For an action to be taken by the
SQL statement, whether it be a transaction or query, all conditions separated by the AND must
be TRUE.
Example:
Consider the CUSTOMERS table having the following records:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000 AND age is less tan 25 years:
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000 AND age < 25;
This would produce the following result:
+----+-------+----------+
| ID | NAME | SALARY |
+----+-------+----------+
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+----+-------+----------+
The OR Operator:
The OR operator is used to combine multiple conditions in an SQL statement's WHERE clause.
Syntax:
The basic syntax of OR operator with WHERE clause is as follows:
SELECT column1, column2, columnN
FROM table_name
WHERE [condition1] OR [condition2]...OR [conditionN]
You can combine N number of conditions using OR operator. For an action to be taken by the
SQL statement, whether it be a transaction or query, only any ONE of the conditions separated
by the OR must be TRUE.
Example:
Consider the CUSTOMERS table having the following records:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000 OR age is less tan 25 years:
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000 OR age < 25;
This would produce the following result:
+----+----------+----------+
| ID | NAME | SALARY |
+----+----------+----------+
| 3 | kaushik | 2000.00 |
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+----+----------+----------+
Normalization
If a database design is not perfect it may contain anomalies, which are like a bad dream for
database itself. Managing a database with anomalies is next to impossible.
 Update anomalies: if data items are scattered and are not linked to each other properly,
then there may be instances when we try to update one data item that has copies of it
scattered at several places, few instances of it get updated properly while few are left with
there old values. This leaves database in an inconsistent state.
 Deletion anomalies: we tried to delete a record, but parts of it left undeleted because of
unawareness, the data is also saved somewhere else.
 Insert anomalies: we tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring database to consistent state
and free from any kinds of anomalies.
First Normal Form:
This is defined in the definition of relations (tables) itself. This rule defines that all the attributes
in a relation must have atomic domains. Values in atomic domain are indivisible units.

[Image: Unorganized relation]


We re-arrange the relation (table) as below, to convert it to First Normal Form

[Image: Relation in 1NF]


Each attribute must contain only single value from its pre-defined domain.
Second Normal Form:
Before we learn about second normal form, we need to understand the following:
 Prime attribute: an attribute, which is part of prime-key, is prime attribute.
 Non-prime attribute: an attribute, which is not a part of prime-key, is said to be a non-
prime attribute.
Second normal form says, that every non-prime attribute should be fully functionally dependent
on prime key attribute. That is, if X → A holds, then there should not be any proper subset Y of
X, for that Y → A also holds.

[Image: Relation not in 2NF]


We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon
both and not on any of the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called
partial dependency, which is not allowed in Second Normal Form.

[Image: Relation in 2NF]


We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
Third Normal Form:
For a relation to be in Third Normal Form, it must be in Second Normal form and the following
must satisfy:
 No non-prime attribute is transitively dependent on prime key attribute
 For any non-trivial functional dependency, X → A, then either
 X is a superkey or,
 A is prime attribute.

[Image: Relation not in 3NF]


We find that in above depicted Student_detail relation, Stu_ID is key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a
superkey nor City is a prime attribute. Additionally, Stu_ID → Zip → City, so there exists
transitive dependency.

[Image: Relation in 3NF]


We broke the relation as above depicted two relations to bring it into 3NF.
Database backup & recovery from catastrophic failure
So far we have not discovered any other planet in our solar system, which may have life on it,
and our own earth is not that safe. In case of catastrophic failure like alien attack, the database
administrator may still be forced to recover the database.
Remote backup, described next, is one of the solutions to save life. Alternatively, whole database
backups can be taken on magnetic tapes and stored at a safer place. This backup can later be
restored on a freshly installed database and bring it to the state at least at the point of backup.
Grown up databases are too large to be frequently backed-up. Instead, we are aware of
techniques where we can restore a database by just looking at logs. So backup of logs at frequent
rate is more feasible than the entire database. Database can be backed-up once a week and logs,
being very small can be backed-up every day or as frequent as every hour.
Remote Backup
Remote backup provides a sense of security and safety in case the primary location where the
database is located gets destroyed. Remote backup can be offline or real-time and online. In case
it is offline it is maintained manually.

DBMS Data Recovery


Crash Recovery
Though we are living in highly technologically advanced era where hundreds of satellite monitor
the earth and at every second billions of people are connected through information technology,
failure is expected but not every time acceptable.
DBMS is highly complex system with hundreds of transactions being executed every second.
Availability of DBMS depends on its complex architecture and underlying hardware or system
software. If it fails or crashes amid transactions being executed, it is expected that the system
would follow some sort of algorithm or techniques to recover from crashes or failures.
Failure Classification
To see where the problem has occurred we generalize the failure into various categories, as
follows:
Transaction failure
When a transaction is failed to execute or it reaches a point after which it cannot be completed
successfully it has to abort. This is called transaction failure. Where only few transaction or
process are hurt.
Reason for transaction failure could be:
 Logical errors: where a transaction cannot complete because of it has some code error or
any internal error condition
 System errors: where the database system itself terminates an active transaction because
DBMS is not able to execute it or it has to stop because of some system condition. For
example, in case of deadlock or resource unavailability systems aborts an active
transaction.
System crash
There are problems, which are external to the system, which may cause the system to stop
abruptly and cause the system to crash. For example interruption in power supply, failure of
underlying hardware or software failure.
Examples may include operating system errors.
Disk failure:
In early days of technology evolution, it was a common problem where hard disk drives or
storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any
other failure, which destroys all or part of disk storage
Recovery and Atomicity
When a system crashes, it many have several transactions being executed and various files
opened for them to modifying data items. As we know that transactions are made of various
operations, which are atomic in nature. But according to ACID properties of DBMS, atomicity of
transactions as a whole must be maintained that is, either all operations are executed or none.
When DBMS recovers from a crash it should maintain the following:
 It should check the states of all transactions, which were being executed.
 A transaction may be in the middle of some operation; DBMS must ensure the atomicity
of transaction in this case.
 It should check whether the transaction can be completed now or needs to be rolled back.
 No transactions would be allowed to left DBMS in inconsistent state.
There are two types of techniques, which can help DBMS in recovering as well as maintaining
the atomicity of transaction:
 Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
 Maintaining shadow paging, where are the changes are done on a volatile memory and
later the actual database is updated.
Log-Based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction.
It is important that the logs are written prior to actual modification and stored on a stable storage
media, which is failsafe.
Database can be modified using two approaches:
1. Deferred database modification: All logs are written on to the stable storage and
database is updated when transaction commits.
2. Immediate database modification: Each log follows an actual database modification.
That is, database is modified immediately after every operation.
Recovery with concurrent transactions
When more than one transactions are being executed in parallel, the logs are interleaved. At the
time of recovery it would become hard for recovery system to backtrack all logs, and then start
recovering. To ease this situation most modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the memory
space available in the system. At time passes log file may be too big to be handled at all.
Checkpoint is a mechanism where all the previous logs are removed from the system and stored
permanently in storage disk. Checkpoint declares a point before which the DBMS was in
consistent state and all the transactions were committed.
Recovery
When system with concurrent transaction crashes and recovers, it does behave in the following
manner:

[Image: Recovery with concurrent transactions]


 The recovery system reads the logs backwards from the end to the last Checkpoint.
 It maintains two lists, undo-list and redo-list.
 If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in redo-list.
 If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it
puts the transaction in undo-list.
All transactions in undo-list are then undone and their logs are removed. All transaction in redo-
list, their previous logs are removed and then redone again and log saved.
Database Security
Database security refers to the collective measures used to protect and secure a database or
database management software from illegitimate use and malicious threats and attacks.
It is a broad term that includes a multitude of processes, tools and methodologies that ensure
security within a database environment.
Database security, under the umbrella of information security, protects the confidentiality,
integrity and availability of an organization’s databases.
Confidentiality is the most important aspect of database security, and is most commonly
enforced through encryption. Encryption should be done both for data-in-transit and data-at-rest.
Integrity is yet another crucial aspect of database security, because it ensures that only the
correct people will be able to see privileged company information. The integrity of a database is
enforced through a User Access Control system that defines permissions for who can access
which data.
The integrity aspect extends beyond simply permissions, however. Security implementations
like authentication protocols, strong password policies, and ensuring unused accounts (like of
employees that have left the company) are locked or deleted, further strengthen the integrity of a
database.
Availability relates to the need for databases to be up and available for use. Databases need to
be dependable in order to be functional, which requires they be up and running whenever the
organization is. This means downtimes should be planned on weekends and servers kept up-to-
date.
Database security best practices that can help keep your databases safe from attackers:
 Ensure physical database security
 Use web application and database firewalls
 Harden your database to the fullest extent possible
 Encrypt your data
 Minimize value of databases
 Manage database access tightly
 Audit and monitor database activity
Database security threats
1. Privilege abuse: When database users are provided with privileges that exceeds their day-to-
day job requirement, these privileges may be abused intentionally or unintentionally.
Take, for instance, a database administrator in a financial institution. What will happen if he
turns off audit trails or create bogus accounts? He will be able to transfer money from one
account to another thereby abusing the excessive privilege intentionally.

Having seen how privilege can be abused intentionally, let us see how privilege can be abused
unintentionally. A company is providing a “work from home" option to its employees and the
employee takes a backup of sensitive data to work on from his home. This not only violates the
security policies of the organization, but also may result in data security breach if the system at
home is compromised.

2. Operating System vulnerabilities: Vulnerabilities in underlying operating systems like


Windows, UNIX, Linux, etc., and the services that are related to the databases could lead to
unauthorized access. This may lead to a Denial of Service (DoS) attack. This could be prevented
by updating the operating system related security patches as and when they become available.
3. Database rootkits: A database rootkit is a program or a procedure that is hidden inside the
database and that provides administrator-level privileges to gain access to the data in the
database. These rootkits may even turn off alerts triggered by Intrusion Prevention Systems
(IPS). It is possible to install a rootkit only after compromising the underlying operating system.
4. Weak authentication: Weak authentication models allow attackers to employ strategies such
as social engineering and brute force to obtain database login credentials and assume the identity
of legitimate database users.
5. Weak audit trails: A weak audit logging mechanism in a database server represents a critical
risk to an organization especially in retail, financial, healthcare, and other industries with
stringent regulatory compliance. Regulations such as PCI, SOX, and HIPAA demand extensive
logging of actions to reproduce an event at a later point of time in case of an incident. Logging of
sensitive or unusual transactions happening in a database must be done in an automated manner
for resolving incidents. Audit trails act as the last line of database defense. Audit trails can detect
the existence of a violation that could help trace back the violation to a particular point of time
and a particular user.

Concurrency Control
Concurrency control is a database management systems (DBMS) concept that is used to address
conflicts with the simultaneous accessing or altering of data that can occur with a multi-user
system. Concurrency control, when applied to a DBMS, is meant to coordinate simultaneous
transactions while preserving data integrity. The Concurrency is about to control the multi-user
access of Database
Concurrency Control Locking Strategies
 Pessimistic Locking: This concurrency control strategy involves keeping an entity in a
database locked the entire time it exists in the database's memory. This limits or prevents
users from altering the data entity that is locked. There are two types of locks that fall
under the category of pessimistic locking: write lock and read lock. With write lock,
everyone but the holder of the lock is prevented from reading, updating, or deleting the
entity. With read lock, other users can read the entity, but no one except for the lock
holder can update or delete it.
 Optimistic Locking: This strategy can be used when instances of simultaneous
transactions, or collisions, are expected to be infrequent. In contrast with pessimistic
locking, optimistic locking doesn't try to prevent the collisions from occurring. Instead, it
aims to detect these collisions and resolve them on the chance occasions when they
occur.
Pessimistic locking provides a guarantee that database changes are made safely. However, it
becomes less viable as the number of simultaneous users or the number of entities involved in a
transaction increase because the potential for having to wait for a lock to release will increase.
Optimistic locking can alleviate the problem of waiting for locks to release, but then users have
the potential to experience collisions when attempting to update the database.

You might also like