DBMS Module1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 144

DATABASE

MANAGEMENT
SYSTEM-
module 1
PRESENTED BY
-MRS. POONAM MISHRA
Introduction to
Databases and
transactions
What is database system?
A database-management system (DBMS) is a
collection of interrelated data and a set of programs to
access those data.
The collection of data, usually referred to as the
database, contains information relevant to an
enterprise. The primary goal of a DBMS is to provide a
way to store and retrieve database information that is
both convenient and efficient.
Manage large bodies of information.
The database system must ensure the safety of the
Information stored, despite system crashes or attempts
at unauthorized access
Purpose of database system
• Data redundancy and inconsistency.-
Files and application programs created by different
programmers have Different structures and written using
different programming languages.
The same information may be duplicated in several places (files).
This redundancy leads to higher storage and access cost.
In addition, it may lead to data inconsistency; that is, the various
copies of the same data may no longer agree.
? For example, a changed student address may be reflected in the
one department records but not elsewhere in the system.
Difficulty in accessing data:
? Suppose that one of the university clerks needs to find out
the names of all students who live within a particular
postal-code area.
? The clerk asks the data-processing department to generate
such a list.
? Because the designers of the original system did not
anticipate this request, there is no application program on
hand to meet it.
? There is, however, an application program to generate the
list of all students.
? The university clerk has now two choices: either obtain the
list of all students and extract the needed information
manually or ask a programmer to write the necessary
application program.
? Both alternatives are obviously unsatisfactory.
? Suppose that such a program is written, and that, several
days later, the same clerk needs to trim that list to include
only those students who have taken at least 60 credit hours.
? As expected, a program to generate such a list does not exist.
Again, the clerk has the preceding two options, neither of
which is satisfactory.
? The point here is that conventional file-processing
environments do not allow needed data to be retrieved in a
convenient and efficient manner.
? More responsive data-retrieval systems are required for
general use.
• Data isolation-
? Because data are scattered in various files, and files may be in
different formats, writing new application programs to retrieve the
appropriate data is difficult.
• Integrity problems-
? The data values stored in the database must satisfy certain types of
consistency constraints.
? Suppose the university maintains an account for each department,
and records the balance amount in each account.
? Suppose also that the university requires that the account balance of
a department may never fall below zero.
? Developers enforce these constraints in the system by adding
appropriate code in the various application programs.
? However, when new constraints are added, it is difficult to change
the programs to enforce them. The problem is compounded when
constraints involve several data items from different files.
• Atomicity problems-
? A computer system, like any other device, is subject to failure.
? In many applications, it is crucial that, if a failure occurs, the data
be restored to the consistent state that existed prior to the failure.
? Example: Consider a program to transfer Rs 500 from the account
balance of department A to the account balance of department B.
? If a system failure occurs during the execution of the program, it is
possible that the 500 Rs was removed from the balance of
department A but was not credited to the balance of department B,
resulting in an inconsistent database state.
? Clearly, it is essential to database consistency that either both the
credit and debit occur, or that neither occur. That is, the funds
transfer must be atomic—it must happen in its entirety or not at all.
It is difficult to ensure atomicity in a conventional file-processing
system.
• Concurrent-access anomalies-
? For the sake of overall performance of the system and
faster response, many systems allow multiple users to
update the data simultaneously.
? Indeed, today, the largest Internet retailers may have
millions of accesses per day to their data by shoppers.
? In such an environment, interaction of concurrent updates
is possible and may result in inconsistent data.
• Security problems-
? Not every user of the database system should be able to
access all the data.
? For example, in a university, payroll personnel need to see
only that part of the database that has financial
information.
? They do not need access to information about academic
records.
? But, since application programs are added to the
file-processing system in an ad hoc manner, enforcing such
security constraints is difficult.
? These difficulties, among others, prompted the
development of database systems.
View of Data
? A database system is a collection of interrelated data and a
set of programs that allow users to access and modify these
data.
? A major purpose of a database system is to provide users
with an abstract view of the data. That is, the system hides
certain details of how the data are stored and maintained.
1.Data Abstraction-
? As many database-system users are not computer trained,
developers hide the complexity from users through several
levels of abstraction, to simplify users interactions with the
system:
? Physical level-The lowest level of data abstraction which
describes how the data is actually stored.
? Logical level- The next-higher level of abstraction that
describes what data are stored in the database, and what
relationships exist among those data.
? The logical level thus describes the entire database in terms
of a small number of relatively simple structures.
? The logical level of abstraction is used by the Database
administrators, who must decide what information is to be
kept in the database.
? View level- This is the highest level of data abstraction which
describes only the part of the entire database.
? Many users of the database system do not need all the
information; instead, they need to access only a part of the
database.
? Even though the logical level uses simpler structures,
complexity remains because of the variety of information
stored in a large database.
? The view level of abstraction exists to simplify their
interaction with the system. The system may provide many
views for the same database. The views also provides a
security mechanisms to prevent some users from accessing
parts of the database.
? For example- clerks in the university registrar office can see
only that part of the database that has information about
students; they cannot access information about salaries of
instructors.
2.Instances and Schemas
? Databases change over time as information is inserted and
deleted.
? Instance - The collection of information stored in the
database at a particular moment is called an instance of
the database.
? Schema - The overall design of the database is called the
database schema.

? Schemas are changed infrequently, if at all. A database schema


corresponds to the variable declarations (along with associated type
definitions) in a program. Each variable has a particular value at a given
instant. The values of the variables in a program at a point in time
correspond to an instance of a database schema. Database systems have
several schemas, partitioned according to the levels of abstraction.
? Physical schema - The physical schema describes the
database design at the physical level.
? Logical schema- The logical schema describes the database
design at the logical level.
? View Schema - A database may also have several schemas
at the view level, sometimes called subschemas, that
describe different views of the database.
? Data Independence – Data Independence is mainly
defined as a property of DBMS that helps you to change
the database schema at one level of a system without
requiring to change the schema at the next level. It helps
to keep the data separated from all program that makes
use of it.
? We have namely two levels of data independence arising
from these levels of abstraction:
? 1.Physical level data independence
? 2.Logical level data independence
? Physical Level Data Independence -It refers to the
characteristic of being able to modify the physical schema
without any alterations to the conceptual or logical
schema, done for optimization purposes, e.g., the
Conceptual structure of the database would not be
affected by any change in storage size of the database
system server.
? Logical Level Data Independence - It refers characteristic
of being able to modify the logical schema without
affecting the external schema or application program. The
user view of the data would not be affected by any changes
to the conceptual view of the data. These changes may
include insertion or deletion of attributes, altering table
structures entities or relationships to the logical schema,
etc.
3.Data Models –
? Data model is a collection of conceptual tools for
describing data, data relationships, data semantics, and
consistency constraints.
? A data model provides a way to describe the design of a
database at the physical, logical, and view levels.
? The data models can be classified into four different
categories:
• Entity-Relationship Model -
? The entity-relationship (E-R) data model uses a collection
of basic objects, called entities, and relationships among
these objects. An entity is a “thing” or “object” in the real
world that is distinguishable from other objects.
• Relational Model-
? The relational model uses a collection of tables to
represent both data and the relationships among those
data.
? Each table has multiple columns, and each column has a
unique name.
? Tables are also known as relations. The relational model is
an example of a record-based model.
? Record-based models are so named because the database
is structured in fixed-format records of several types.
? Each table contains records of a particular type. Each
record type defines a fixed number of fields, or attributes.
The columns of the table correspond to the attributes of
the record type.
• Object-Based Data Model -
? Object-oriented programming (especially in Java, C++, or
C#) has become the dominant software-development
methodology.
? This led to the development of an object-oriented data
model that can be seen as extending the E-R model with
notions of encapsulation, methods (functions), and object
identity.
? The object-relational data model combines features of the
object-oriented data model and relational data model.
• Semi structured Data Model-
? The semi structured data model permits the
specification of data where individual data
items of the same type may have different sets
of attributes.
? This is in contrast to the data models mentioned
earlier, where every data item of a particular
type must have the same set of attributes.
? The Extensible Markup Language (XML) is
widely used to represent semi structured data.
Relational Databases
? A relational database is based on the relational model. It
uses a collection of tables to represent both data and the
relationships among those data. It also includes a DML
and DDL.
1.Tables -Each table has multiple columns and each column has
a unique name.
? Figure below represents a sample relational database
comprising two tables: one shows details of university
instructors and the other shows details of the various
university departments.
? The relational model is an example of a record-based model.
? Record-based models are so named because the database is
structured in fixed-format records of several types.
? Each table contains records of a particular type(for eg
Student table will contain data fields related to student only ).
Each record type ( eg student table is a record of particular
type ) defines a fixed number of fields or attributes. The
columns of the table correspond to the attributes of the
record type ( that means related to student type).
? The relational model hides low-level implementation details
from database developers and users.
Data Manipulation Language
? The SQL query language is nonprocedural.
? A query takes as input several tables (possibly only one)
and always returns a single table.
? Here is an example of an SQL query that finds the names
of all instructors in the History department:
select instructor.name
from instructor
where instructor.dept_name = ‘History’
? The query specifies that those rows from the table
instructor where the dept name is History must be
retrieved, and the name attribute of these rows must be
displayed.
Data Defination Language-
? SQL provides a rich DDL that allows one to define tables,
integrity constraints, assertions, etc.
? For instance, the following SQL DDL statement defines the
department table:
create table department
(depLname char(20),
building char (15),
budget numeric (12,2));
? Execution of the above DDL statement creates the
department table with three columns :dept_name,
building, and budget, each of which has a specific data
type associated with it.
? In addition, the DDL statement updates the data
dictionary, which contains metadata. The schema of a
table is an example of metadata.
Database Access from Application Programs-
? To access the database?
? DML statements need to be executed from the host
language. There are two ways to do this:
? By providing an application program interface (set of
procedures) that can be used to send DML and DDL
statements to the database and retrieve the results.
? The Open Database Connectivity (ODBC) standard for
use with the C language is a commonly used application
program interface standard.
? The Java Database Connectivity (JDBC) standard
provides corresponding features to the Java language.
Database Architecture
Database System Structure: The architecture of a database
system is greatly influenced by the underlying computer
system on which the database system runs.
? Database architecture can be seen as a single tier or
multi-tier (centralized, or client-server). But logically,
database architecture is of two types like: 2-tier
architecture and 3-tier architecture.
? Most users of a database system today are not present at
the site of the database system, but connect to it through a
network.
? In 1 tier architecture, the database is directly available to
the user. It means the user can directly sit on the DBMS
and use it.
? Any changes done here will directly be done on the
database itself.
? It is used for development of the local application, where
programmers can directly communicate with the database
for the quick response.
? In 2-Tier architecture or the basic client-server
architecture, applications on the client end can directly
communicate with the database at the server side. For this
interaction, API's like: ODBC, JDBC are used.
? The basic client/server architecture is used to deal with a
large number of PCs, web servers, database servers and
other components that are connected with networks.
? The client/server architecture consists of many PCs and a
workstation which are connected via the network.
? The user interfaces and application programs are run on
the client-side.
? The server side is responsible to provide the functionalities
like: query processing and transaction management.
? To communicate with the DBMS, client-side application
establishes a connection with the server side.
? In contrast, in a three-tier architecture, the client machine
acts as merely a front end and does not contain any direct
database calls. Instead, the client end communicates with
an application server, usually through a forms interface.
? The application server in turn communicates with a
database system to access data. The business logic of the
application, which says what actions to carry out under
what conditions, is embedded in the application server,
instead of being distributed across multiple clients.
? Three-tier applications are more appropriate for large
applications, and for applications that run on the World
Wide Web.
Transaction Management
1. Atomicity:
? Several operations on the database form a single logical
unit of work.
? Consider an example of funds transfer, in which one
department account(say A) is debited and another
department account (say B) is credited.
? It is essential that either both the credit and debit occur,
or that neither occur.
? This all-or-none requirement is called atomicity. In the
absence of failures, all transactions complete successfully,
and atomicity is achieved easily.
2. Consistency:
? In addition, it is essential that the execution of the funds
transfer preserve the consistency of the database.
? That is, the value of the sum of the balances of A and B
must be preserved. This correctness requirement is called
consistency.
3. Durability:
? Finally, after the successful execution of a funds transfer,
the new values of the balances of accounts A and B must
persist, despite the possibility of system failure. This
persistence requirement is called durability.
? A transaction is a collection of operations that performs a
single logical function in a database application. Each
transaction is a unit of both atomicity and consistency.
Thus, we require that transactions do not violate any data
base consistency constraints. That is, if the database was
consistent when a transaction started, the database must
be consistent when the transaction successfully terminates.
? However, during the execution of a transaction, it may be
necessary temporarily to allow inconsistency, since either
the debit of A or the credit of B must be done before the
other.
? This temporary inconsistency, although necessary, may
lead to difficulty if a failure occurs.
? It is the programmer’s responsibility to define properly
the various transactions, so that each preserves the
consistency of the database. For example, the transaction
to transfer funds from the account of department A to the
account of department B could be defined to be composed
of two separate programs: one that debits account A, and
another that credits account B. The execution of these two
programs one after the other will indeed preserve
consistency. However, each program by itself does not
transform the database from a consistent state to a new
consistent state. Thus, those programs are not
transactions.
4. Recovery Manager:
? Ensuring the atomicity and durability properties is the
responsibility of the database system itself specifically, of
the recovery manager.
5. Failure recovery:
? Because of various types of failure, a transaction may not
always complete its execution successfully. If we are to
ensure the atomicity property, a failed transaction must
have no effect on the state of the database.
? The database must be restored to the state in which it was
before the transaction in question started executing. The
database system must therefore perform failure recovery,
that is, detect system failures and restore the database to
the state that existed prior to the occurrence of the failure.
6. Concurrency-control manager:
? When several transactions update the database
concurrently, the consistency of data may no longer be
preserved, even though each individual transaction is
correct. It is the responsibility of the concurrency-control
manager to control the interaction among the concurrent
transactions, to ensure the consistency of the database.
The transaction manager consists of the
concurrency-control manager and the recovery manager.
DATA MODELS
The importance of Data Models
? A data model is a structure of the data that contains all the
required details of the data like the name of the data, size of the
data, relationship with other data and constraints that are
applied on the data.
? Data model gives an idea of how the final system or software will
look after when the development is completed.
? This concept is exactly like real world modelling in which before
constructing any project (Buildings, Bridges, Towers) engineers
create a model for it and gives the idea of how a project will look
like after Construction.
Importance of Data Models:
? A data model is a set of concepts that can be used to
describe the structure of data in a database.
? Data models are used to support the development of
information systems by providing the definition and
format of data to be involved in future system.
? Data model is acting like a guideline for development also
gives an idea about possible alternatives to achieve
targeted solution.
? A data model can sometimes be referred to as data
structure especially in the context of programming
languages.
Advantages of Data Models:
? Data model prevents the system from future risk and
failure by defining structure of data in advance.
? As we got an idea of final system at the beginning of
development itself so we can reduce the cost of project by
proper planning and cost estimation as actual system is
not yet developed.
? Data repetition and data type compatibility can be
checked and removed with help of data model.
? We can improve Graphical User Interface (GUI) of system
by making its model and get it approved by its future user
so it will be simple for them to operate system and make
entire system effective.
Basic Building blocks
The basic building block for any data model is Entities,
Attributes , relationships and constraints.
1.Entity:
? Entities are real time objects that exist. It can be a person,
place, object, event, concept. Entities are represented by a
rectangle box containing the entity name in it.
? An entity is having its own independent existence in real
world.
? E.g.: A Student, Faculty, Subject having independent
existence.
2. Attributes:
? Attribute is the set of characteristics representing an entity.
It is represented by an ellipse symbol with attribute name
on it.
? A particular entity will have some value for each of its
attributes . for e.g –An Employee has attributes name, age,
phone etc.
3.Relationships:
? It describes the association between two or more entities.
? It is represented using diamond symbol containing
relationship name with it.
? The data model generally uses three kinds of relationships :
one to many, many to many, one to one.
? Example: The relationship between two entities Student and
Class has many to many relationship.The degree of the
relationship is the number of participating entity types in a
particular relation.
TYPES OF RELATIONSHIPS
One is to one:
? One entity is associated with at most one other entity.
? E.g., One department can have only one manager
One is to Many:
? One entity is associated with any number of entities in
other entity.
? E.g., One teacher may teach to many students
Many is to Many:
? Many entitities are associated with any number of entities in

other entity.

? E.g., Books in library issued by students


4.Constraints:
? Constraints are conditions applied on the data.
? It provides the data integrity.
? Example: A student can take a maximum of 2 books from
the library is applied as a constraint on the student
database.
Business Rules
? Definition: Business rules are statements of a discrete
operational business policy or practice within specific
organisations that constraints the business.
? It is intended to control or influence the behaviour of the
business.
? Database designer needs to take help from concepts such
as entity, attributes and relationships to build a data
model, but the above things are not sufficient to describe a
system completely.
? Business rules may define actors and prescribe how they
should behave by setting constraints and help to manage
business change in the system.
Characteristics of Business Rules:
1. Atomicity: Rule should define any one aspect
of the system environment.
? E.g.: - College should have students in it.
2. Business format: Rule should be expressed in
business terms understandable to business
people.
? E.g.: ER diagram, object diagram etc
3. Business ownership: Each rule is governed
by a businessperson who is responsible for
verifying it,enforcing it,
and monitoring need for change.
? E.g.: End user or customer is responsible for
requirements submitted by him.
4. Classification: Each rule can be
classified by its data and
constraints.
5. Business Formalism: Each rule
can be implemented in the related
information system. Business rules
should be consistent and
non-redundant.
EXAMPLES:
? A student may take admission to college
? One subject is taught by only one professor
? A class consists of minimum 60 and maximum
80 students
Types of Business Rules:
1.DEFINITIONS:
? Define some business terms. Definitions
are incorporated in systems data
dictionary.
? E.g., A professor is someone who teaches to
students.
2. FACTS:
? Connect business terms in ways that make
business sense. Facts are implemented as
relationships between various data entities.
? E.g., A professor may have student
3. CONSTRAINTS:
? Shows how business rules and how business
terms are connected with each other.
Constraints usually state how many of one
data entity can be related to another data
entity.
? E.g., Each professor may teach up to four
subjects.
4. DERIVATIONS:
? Enable new knowledge or actions. Derivations
are often implemented as formulas and
triggers.
? E.g. A student pending fees is his fees paid
minus total fees.
The Evaluation of Data Models

1. Hierarchical Model
? In Hierarchical Model, a hierarchical relation is formed
by collection of relations and forms a tree-like
structure.
? The relationship can be defined in the form of parent
child type.
? One of the first and most popular Hierarchical Model
is Information Management System (IMS), developed
by IBM.
Advantages
● The design of the hierarchical model is simple.
● Provides Data Integrity since it is based on parent/ child
relationship
● Data sharing is feasible since the data is stored in a
single database.
● Even for large volumes of data, this model works
perfectly.
Disadvantages
● Implementation is complex.
● This model has to deal with anomalies like Insert,
Update and Delete.
● Maintenance is difficult since changes done in the
database may want you to do changes in the entire
database structure.
2. Network Model
· The Hierarchical Model creates
hierarchical tree with parent/ child
relationship, whereas the Network Model
has graph and links.
· The relationship can be defined in the
form of links and it handles many-to-many
relations. This itself states that a record
can have more than one parent.
Advantages
● Easy to design the Network Model
● The model can handle one-one, one-to-many,
many-to-many relationships.
● It isolates the program from other details.
● Based on standards and conventions.

Disadvantages
● Pointers bring complexity since the records are
based on pointers and graphs.
● Changes in the database isn’t easy that makes it
hard to achieve structural independence.
3. Relational Model
? A relational model groups data into
one or more tables. These tables are
related to each other using common
records.
? The data is represented in the form of
rows and columns i.e. tables:
Advantages
● The Relational Model does not have any issues that we
saw in the previous two models i.e. update, insert and
delete anomalies have nothing to do in this model.
● Changes in the database do not require you to affect the
complete database.
● Implementation of a Relational Model is easy.
● To maintain a Relational Model is not a tiresome task.

Disadvantages
● Database inefficiencies hide and arise when the model has
large volumes of data.
● The overheads of using relational data model come with
the cost of using powerful hardware and devices.
4. E-R model
? An E-R model is the logical
representation of data as objects
and relationship among them.
? This objects are known as Entities
and relationship is an association
between these entities.
Advantages

? It is simple to draw an ER diagram when we


know entities and relationships. It is an effective
communication tool.
? The ER Model can be easily integrated with
relational model.
? The design of ER is very logical and hence they
are easy to design and understand.They show
database capabilities like how tables, keys and
columns are used to find a solution to the given
question.
Disadvantages

ER model can represent limited relationships as


compared to other models and It is not possible to
indicate primary keys and foreign keys when they’re
expected.
ER models can be difficult to modify once they are
created. Any changes made to the model may require
extensive rework, which can be time-consuming and
expensive.
ER models do not provide support for business rules,
which can make it difficult to ensure data integrity and
enforce constraints.
5.Object Oriented Data Model

? The Object-Oriented Model in DBMS or OODM is the data


model where data is stored in the form of objects.
? The data and data relationship are stored together in a
single entity known as an object in the Object Oriented
Model.The Object-Oriented Database Management
System is built on top of Object Oriented Model.
? We can use the Object Oriented Model in DBMS to store
real-world entities. Here, we can store pictures, audio, video,
and other types of data, which was previously impossible to
store with the relational approach.
? This model works with object oriented programming
languages like python, Java, VB.net and Perl etc.
Advantages
· Database integrity can be achieved.
· Structural and database independence is created.
· We can store pictures, audio, video, and other
types of data, which was previously impossible
to store earlier.
Disadvantages
· It has complex navigational data access.
· There is a steep learning curve here.
· The transactions might be slow here.
Degree of Data Abstraction

? For the system to be usable, it must retrieve data


efficiently.
? The need for efficiency has led designers to use
complex data structures to represent data in the
database, hide the complexity from users through
several levels of abstraction, to simplify users
interactions with the system:
? Physical level: The lowest level of data abstraction
which describes how the data is actually stored.
? Logical level/ Conceptual level: The next-higher level of
abstraction that describes what data are stored in the
database, and what relationships exist among those data.
The logical level thus describes the entire database in terms
of a small number of relatively simple structures. The
logical level of abstraction is used by the Database
administrators, who must decide what information is to be
kept in the database.
? View level: The highest level of abstraction describes only
part of the entire database. Even though the logical level
uses simpler structures, complexity remains because of the
variety of information stored in a large database. Many
users of the database system do not need all this
information; instead, they need to access only a part of the
database. The view level of abstraction exists to simplify
their interaction with the system. The system may provide
many views for the same database
Type customer = record
customer-id: string;
customer-name: string;
customer-street: string;
customer-city: string;
end;

• This code defines a new record type called customer with


four fields. Each field has a name and a type associated
with it.
• A banking enterprise may have several such record
types, including account, with fields account number and
balance &employee, with fields employee name and
salary.
? The database system hides many of the lowest level
storage details from database programmers.
? Database administrators, on the other hand, may be aware
of certain details of the physical organization of the data.
? At the logical level, each such record is described by a type
definition, and the interrelationship of these record types
is defined as well.
? Programmers using a programming language work at this
level of abstraction.Similarly, database administrators
usually work at this level of abstraction
? Finally, at the view level, computer users see a set of
application programs that hide details of the data types.
Similarly, at the view level, several views of the database
are defined, and database users see these views.
? The views also provide a security mechanism to prevent
users from accessing certain parts of the database.
database designs
and er diagrams
Database design and ER Model: Overview

1. Database Design

Database design is a crucial aspect of building efficient


and effective database systems.
It involves the process of defining the structure,
organization, and relationships of data elements within a
database to meet the requirements of an organization or
application.
One of the fundamental tools used in database design is
the Entity-Relationship (ER) model, which provides a
graphical representation of the database schema.
o Overview of Database Design
1. Requirements Analysis: The first step in database
design involves understanding the requirements of the
system or application for which the database is being
designed. This includes identifying the data entities,
attributes, relationships, and constraints that need to be
represented in the database.

2. Conceptual Design: In the conceptual design phase,


the high-level structure of the database is defined using
the Entity-Relationship (ER) model. Entities represent
real-world objects, attributes represent properties of
entities, and relationships represent associations
between entities.
3. Logical Design: The logical design phase involves
translating the conceptual model into a logical schema that
can be implemented in a database management system
(DBMS). This includes defining tables, columns, keys,
constraints, and other database objects based on the ER
model.

4. Normalization: Normalization is the process of


organizing the data in the database to minimize redundancy
and dependency. It involves breaking down tables into
smaller, more manageable units and establishing
relationships between them to reduce data duplication and
improve data integrity.
5. Physical Design: In the physical design phase, the
logical schema is translated into the physical storage
structures and access methods supported by the underlying
hardware and software platform. This includes defining
storage allocation, indexing, partitioning, and other
optimization techniques to enhance performance and
scalability.

6. Implementation and Testing: Once the database design


is complete, it is implemented in the chosen DBMS, and the
database is populated with data. Testing is performed to
ensure that the database functions as expected and meets
the requirements specified during the design phase
ER-Model
? The entity-relationship (E-R) data model was developed
to facilitate database design by allowing specification of
an enterprise schema that represents the overall Logical
structure of a database.
? The E-R model is very useful in mapping the meanings
and interactions of Real-world enterprises onto a
conceptual schema
? The E-R data model Employs three basic concepts: entity
sets, relationship sets, and attributes.
? The E-R model also has an associated diagrammatic
representation, the E-R diagram,
1.Entity: An entity is a “thing” or “object” in the real world
that is distinguishable from all other objects.
? For example, each student in an school is an entity.
? An entity has a set of properties, and the values for some set
of properties may uniquely identity an entity. For instance,
a person may have a person-id property whose value An
entity may be concrete (specific real thing you store
information about in DB), such as a student or a book, or it
may be abstract (more general concept that helps you to
organize and design DB), such as a loan, or a holiday,
person or a concept.
? uniquely identifies that person
2.Entity Set: An entity set is a set of entities of the same type
that share the same properties, or attributes.
? The set of all people who are instructors at a given
university
3.Attributes: An entity is represented by a set of attributes.
Attributes are descriptive properties possessed by each
member of an entity set.
4. Simple and Composite attributes: the attributes that have
not been divided into subparts are called simple attributes.
(eg Roll No)
? Composite attributes, on the other hand, can be divided
into subparts (that is, other attributes).
? For example, an attribute name could be structured as a
composite attribute consisting of first name, middle initial,
and last name
5. Single-valued and multivalued attributes: The attributes in
our examples all have a single value for a particular entity.
? For instance, the student ID attribute for a specific student
entity refers to only one student ID. Such attributes are
said to be single valued.
? An instructor may have zero, one, or several phone
numbers, and different instructors may have different
numbers of phones. This type of attribute is said to be
multivalued.
6. Derived attribute: The value for this type of attribute
can be derived from the values of other related attributes
or entities.
· Eg- age can be derived from the birthdate
7. Key Attribute – An entity type usually has an attribute
whose values are distinct for each individual Entity in the
collection. Such an attribute is called a key attribute and
its values can be used to identify each entity uniquely.
8. Null Attribute- This attribute takes NULL value when
entity does not have a value for it or value is missing for
the field
9. Required or Optional Attributes- A required attribute is
mandatory or compulsory to have a value .
 Eg- Student id attribute enforces to have a unique and
mandatory value for an entity.
An optional attribute can or cannot have a value.
Eg- email id, phone number
10. Relationship: A relationship is an association among several
entities.
For example, we can define a relationship advisor that
associates instructor Katz with student Shankar. This
relationship specifies that Katz is an advisor to student
Shankar.
A relationship set is a set of relationships of the same type.
Constraints

An E-R enterprise schema may define certain


constraints to which the contents of a database
must conform. Mapping cardinalities and
Participation constraints, are the two important
types of constraints.
1. Mapping Cardinalities:

Mapping cardinalities, express the number of entities to


which another entity can be associated via a relationship set.
Mapping cardinalities are most useful in describing binary
relationship sets, although they can contribute to the
description of relationship sets that involve more than two
entity sets.
For a binary relationship set R between entity sets A and B,
the mapping cardinality must be one of the following:

a
❖ One is to one:
A single record in one table is related to a single record in
another table.
E.g., One department can have only one manager, Each
person has one passport, and each passport is assigned to
one person.
❖ One is to Many:
A single record in one table can be related to multiple
records in another table.
E.g., One department may have many employees, but each
employee belongs to only one department.
❖ Many is to one:
Multiple records in one table are related to a single
record in another table.
E.g., Many students may be assigned to one professor,
but each student has only one professor.
❖ Many is to Many:
Multiple records in one table are related to multiple
records in another table.
E.g., Students can enroll in multiple courses, and each
course can have multiple students enrolled.
1.Participation Constraints:

The association between entity set is referred to


as participation.
The participation of an entity in a relationship is
said to be total if every entity participates in a
relationship with at least one realtion.
If only some entities are participated in a
relationship then it is said to be partial
participation of an entity. An Entity set may
participate in a relation totally or partially.
- Total Participation means every instance
of an entity must be involved in the
relationship. For instance, if every student
must be enrolled in at least one course, then
the participation of students in the
enrollment relationship is total.
- Partial Participation means that only
some instances of an entity are involved in
the relationship. For instance, not every
employee might be assigned a project, so the
participation of employees in the project
assignment relationship is partial.
❖ Keys:
➢ Primary Key: Uniquely identifies each record in a
table. It must be unique and cannot have null values.
➢ Foreign Key: Establishes a relationship between
two tables by referencing the primary key in another
table. It ensures referential integrity.
➢ Unique Key: Ensures that all values in a column or
a set of columns are unique across rows. It may
allow null values depending on the database system.
 Super Key: Any set of columns that can uniquely
identify a row. It includes both primary keys and
candidate keys, and can have extra columns.
 Composite Key: A super key consisting of more than
one column to uniquely identify a row.
 Simple Key: A key made up of a single column to
uniquely identify a row.
 Candidate Key: A minimal super key that is eligible
to become the primary key. It has no redundant
attributes.
ER-Diagrams

E-R diagram can express the overall logical structure of


a database graphically. E-R diagrams are simple and
clear.
 Rectangles divided into two parts represent entity sets. The first
part, which is shaded blue, contains the name of the entity set. The
second part contains the names of all the attributes of the entity set.
Diamonds represent relationship sets.
Undivided rectangles represent the attributes of a relationship set.
Attributes that are part of the primary key are underlined.
Lines link entity sets to relationship sets.
Dashed lines link attributes of a entity set to the relationship set.
Double lines indicate total participation of an entity in a relationship
set.
Double diamonds represent identifying relationship sets linked to
weak entity sets.
1. Mapping Cardinality:
The relationship set advisor, between the instructor and student entity
sets may be one-to-one, one-to-many, many-to-one, or
many-to-many. To distinguish among these types, we draw either a
directed line (→) or an undirected line (— ) between the relationship
set and the entity set in question, as follows
 One-to-one: We draw a directed line from the relationship set
advisor to both entity sets instructor and student (see Figure a). This
indicates that an instructor may advise at most one student, and a
student may have at most one advisor.
One-to-many: We draw a directed line from the relationship set
advisor to the entity set instructor and an undirected line to the
entity set student (see Figure b). This indicates that an instructor
may advise many students, but a student may have at most one
advisor
❖ Many-to-one: We draw an undirected line
from the relationship set advisor to the
entity set instructor and a directed line to
the entity set student. This indicates that
an instructor may advise at most one
student, but a student may have many
advisors.
❖ Many-to-many: We draw an undirected
line from the relationship set advisor to
both entity sets instructor and student (see
Figure (c). This indicates that instructor
may advise many students, and a student
may have many advisors
2. Strong Entity Set:
? A single rectangle is used for the representation of a
strong entity set.
? It contains sufficient attributes to form its primary key.
? A diamond symbol is used for the representation of the
relationship that exists between the two strong entity
sets.
? A single line is used for the representation of the
connection between the strong entity set and the
relationship.
? Total participation may or may not exist in the
relationship.
3. Weak entity set:
A double rectangle is used for the representation of a weak
entity set.
It does not contain sufficient attributes to form its primary
key.
A double diamond symbol is used for the representation of the
identifying relationship that exists between the strong and
weak entity set.
A double line is used for the representation of the connection
between the weak entity set and the relationship set.
Total participation always exists in the identifying relationship.
ERD Issues

? Complexity: As databases grow in complexity, ERDs can become intricate


and challenging to understand. Large numbers of entities, attributes, and
relationships can clutter the diagram, making it difficult to interpret.
? Ambiguity: ERDs may sometimes be ambiguous, especially when
representing complex relationships or business rules. Ambiguities can arise
in the interpretation of cardinality constraints, participation constraints, or
the nature of relationships between entities.
? Overgeneralization: ERDs may oversimplify the real-world domain they
represent, leading to overgeneralization of entities or relationships. This
oversimplification can result in a loss of important details or nuances in the
data model.
 Inconsistency: Inconsistencies can arise between the
conceptual model represented by the ERD and the
actual implementation in the database management
system. Changes made to the database schema may not
always be reflected accurately in the ERD, leading to
inconsistencies.
 Lack of Standardization: There is no universally
accepted standard notation for ERDs, leading to
potential confusion when different designers use
different symbols or conventions. This lack of
standardization can hinder communication and
collaboration among team members.
 Maintenance Challenges: ERDs require ongoing
maintenance to stay synchronized with changes in the
database schema. As the database evolves over time,
it may be challenging to update the ERD accurately,
especially in large and complex systems.
 Performance Issues: In some cases, overly complex
ERDs may lead to performance issues in database
systems. For example, a high degree of normalization
resulting from an ERD design may increase the
number of joins required for queries, impacting
performance.
 Documentation: ERDs should be well-documented
to ensure that other stakeholders can understand and
interpret them correctly. However, inadequate
documentation can make it challenging for users to
comprehend the design decisions and assumptions
underlying the ERD.
Codd’s rules
1. Information Rule: This rule simply requires that all data
should be presented in table form. This is the basis of relational
model.
2. Guaranteed Access Rule: Each data item (value) must be
accessible without ambiguity by providing table name and its
primary key of the row also include its column name to be
accessed.
3. Systematic Treatment of Null Values: Null values are not
equal to blank space or zero they are unknown unassigned values
which should be treated properly.
4. Active Online Catalog:
The structure description of the entire database must be
stored in an online Catalog, known as data dictionary
which can be accessed by authorized users. Users can
use the same query language to access the Catalog
which they use to access the database itself.
5. Comprehensive Data Sublanguage: The data access
language (SQL) must be the only means of accessing
data stored in the database and support DML, DDL etc.
6. View Updating Rule: All views of data are
theoretically updateable can be updated using system
also.
7. High Level Insert, Update And Delete: A database must
Support high-Level insertion, updation, and deletion. This
must not be limited to a single row, that is, it must also
support union, intersection and minus operations to yield sets
of data records.
8. Physical Data Independence: The data stored in a
database must be independent of the applications that access
the databases. Any Change in the physical structure of a
database must not have any impact on how the data is being
accessed by External applications.
9. Logical Data Independence: This rule states that changes
to the database to the database design should be done in a
way without the users being aware of it
10. Integrity Independence: The database language (like sql) should
support constraints on user input that maintain database integrity. No
component of a primary key can have a null value. If a foreign key is
defined in one table, any value in it must exist as a primary key in
another table.
11. Distribution Independence: The end-user must not be able to see
that the data is distributed over various locations. Users should always
get the impression that the data is located on one site only. This rule has
been regarded as the foundation of distributed database System.
12. Non-Subversion Rule : There should be no way to modify the
database structure other than through the multiple row database
Language (SQL). Most databases today support administrative tools that
allows some direct manipulation of the data Structure.
Relational Schemas

A relational schema is a blueprint that


defines the structure and organization of a
relational database. It represents the logical
view of the database and describes the
tables,
 Tables (Relations): In a relational schema, data is
organized into tables, also known as relations.
Each table represents an entity or concept, such as
customers, orders, products, etc.
Tables consist of rows and columns, where each
row represents a record or tuple, and each column
represents an attribute or field
❖ Attributes (Fields): Attributes define the
properties or characteristics of the entities
represented by the tables.
❖ Each column in a table corresponds to an
attribute, and each attribute has a specific data
type, such as integer, string, date, etc.
❖ Attributes are named and have domain
constraints that specify the allowable values for
that attribute.
❖ Primary Keys: A primary key is a unique
identifier for each record in a table.
❖ It uniquely identifies each row in the table and
ensures that there are no duplicate records.
❖ Primary keys can consist of one or more attributes
and are essential for establishing relationships
between tables.
❖ Foreign Keys: Foreign keys establish
relationships between tables in a relational
database.
❖ They represent attributes in one table that refer
to the primary key of another table.
❖ Foreign keys enforce referential integrity and
ensure that data remains consistent across
related tables.
❖ Relationships: Relationships define
associations and connections between entities in
the database.
❖ They specify how data in one table relates to
data in another table.
❖ Common types of relationships include
one-to-one, one-to-many, and many-to- many
relationships, which are represented using
foreign keys.
❖ Constraints: Constraints enforce rules and
conditions on the data stored in the database.
❖ They ensure data integrity and prevent
inconsistencies or errors.
❖ Common types of constraints include primary
key constraints, foreign key constraints, unique
constraints, and check constraints.
❖ Normalization: Normalization is the process of
organizing data in a relational database to minimize
redundancy and dependency.
❖ It involves dividing large tables into smaller, more
manageable tables and defining relationships
between them to eliminate data anomalies.

➔ A relational schema provides a structured and


organized representation of the database's logical
structure, allowing users and developers to
understand and interact with the data effectively.
➔ It serves as a foundation for database design,
implementation, and management, ensuring data
integrity, consistency, and efficiency.
Relational database
model
Logical view of data
Tables / Relations are logical structure which is a
collection of 2-dimensional tables consisting of
horizontal rows and vertical columns.
It is an abstract concept and do not represent how
data is stored in physical memory of computer
system.
Each table in database has its own unique table
name by which its contents can be referred.
● Each row in the table is known as tuple/record.
● Each column in table is known as
attributes/columns/ fields.
● Domain is a set of values that define the value range
for an attribute.
● A domain can be considered as a data type.
Examples of domains include integer, float, string,
date, etc.
● Relational instance, in the relational database
System, is represented, by a finite set of tuples
(same like instance of the table), Relation instance
do not have duplicate tuples.
● Relational schema contains the name of the
relation(table) and name of all columns or
attributes.
● Relational key -In relational database design, a
"relational key" is a fundamental concept used to
uniquely identify records in a table and to establish
relationships between tables.
1. Characteristics of Table/Relation:
A table is a 2-dimensional structure composed of rows and
columns.
Each table name/ relation name should be distinct from all other
relations.
 Each table row(tuple) represents a single entity (eg single
student) occurrence within the entity set.
 Each table column represents an attribute, and each column
has a distinct name.
Each row/column intersection represents a single data value .
 All values in a column must confirm to the same data format.
Each column has specific range of values known as
attribute domain.
The order of the rows and columns is immaterial to
DBMS.
 Each table must have an attribute or a combination
of attributes that uniquely identifies each row.
2. Attributes:

Each column in the table represents one data


item stored in database for that table.
Such column in database is called as attribute of
a table.
Tables must have at least one column in it and
no two columns can have same name.
 The ANSI/ISO SQL standard does not specify
a maximum number of columns in a table.
3. Tuple/Record

A single row or tuple contains all the information


about a single entity(eg single student data).
A table can have any number of rows from zero
to thousand.
If number of rows are zero, then it is called as
empty table.
Keys
The column value that uniquely identifies a single record in the table
is called as KEY of table.
An attribute or set of attributes whose values uniquely identify each
entity in an entity set is called as key for that entity set.
Any key consisting of single attribute is called a simple key while
that consisting of a combination of attributes is called a composite
key.
Keys are very important part of Relational database. They are used to
establish and identify relation between tables.
They also ensure that each record within a table can be uniquely
1. Types of keys:

 Super Key: Any set of columns that can uniquely identify


a row. It includes both primary keys and candidate keys,
and can have extra columns.
 Composite Key: A super key consisting of more than one
column to uniquely identify a row.
 Simple Key: A key made up of a single column to
uniquely identify a row.
 Candidate Key: A minimal super key that is eligible to
become the primary key. It has no redundant attributes.
➢ Primary Key: Uniquely identifies each record
in a table. It must be unique and cannot have
null values.
➢ Foreign Key: Establishes a relationship between
two tables by referencing the primary key in
another table. It ensures referential integrity.
➢ Unique Key: Ensures that all values in a column
or a set of columns are unique across rows. It
may allow null values depending on the database
system.
Integrity rules
● Integrity rules are simple and straightforward rules that each table
must follow.
● Integrity rules are used to maintain the quality of Information.
● These are very important in database design, when tables break any
of the integrity rules our database will contain errors when
retrieving information.
● Integrity constraints ensure that the data insertion, updation and
other processes have to be performed in such a way that the
integrity is not affected .
● There are four types of integrity rules that we will look at:
■ Entity Integrity Rule
■ Referential Integrity Rule
■ Domain Integrity Rule
■ Key Constraints
1. Entity Integrity Rule: The entity integrity rule
refers to rules the primary key must follow.
The primary key value cannot be null.
The primary key value must be unique.
If a table does not meet these two requirements,
we say the table is violating the entity integrity
rule.
For example, does this table violate entity
integrity rule? Where?
· The term Student violates the entity integrity rules at two places.

· Student Pooja - missing primary key

· Student Rani & Pankaj have the same primary key


2. Referential Integrity Rule:-The referential integrity
rule refers to the foreign key which is specified between
two tables.

· The foreign key may be null and may have the same
value but, the foreign key value must match a record
in the table it is referring to.

· Tables that do not follow this are violating the


referential integrity rule.
3. Domain Integrity Rule-
● Domain integrity is a fundamental concept in database
management systems (DBMS) that ensures the validity and
consistency of data within a database.
● Domain Constraints can be defined as the definition of a valid
set of valves for an attribute.
● The datatype of domain includes string, Character. integer, time,
date etc.
● The value of the attribute must be available in the
corresponding domain..
4.Key Constraints-
 Key constraints in a database management system (DBMS)
are rules that ensure the uniqueness and integrity of data
within a table.
 They help identify records and establish relationships
between tables.
 Keys are the Entity set that is used to identify an Entity
within its Entity set uniquely.
 An Entity Set can have multiple keys, but out of which one
key will be primary key.
 A primary key can contain a unique value in the relational
table.
 Foreign Key enforces a link between tables, ensuring
referential integrity.
 Unique Key ensures that all values in a column (or a
combination of columns) are unique across the table.
 Composite Key ensures uniqueness based on a
combination of columns.
By applying these key constraints, a database maintains
accurate, reliable, and consistent data, and helps to
establish meaningful relationships between different
tables.

You might also like