DBMS Introduction
DBMS Introduction
DBMS Introduction
System
INTRODUCTION TO DATABASE SYSTEMS
Formerly
Programme Manager, JCKIF, IIT Jodhpur
Senior Research Officer, RRSCW, ISRO
2
Database Management System
4AIML4-06 / 4CSDS4-06 / 4CYS4-06 / 4CM4-04
Entity Relationship
1. Introduction to database systems 11 Medium
model
CO1: Students will be able to learn and perform the concepts on transformation of the
conceptual model into logical data structures using E-R diagrams.
CO3: They will be able to apply decomposition of data into tables and also get to know
the conversion of un-normalized database into normalized database.
CO4: Students will be able to execute various advance SQL queries related to
Transaction Processing.
CO5: Understand the principles of Locking using concept of Concurrency control and
recovery management.
5
CO PO Mapping
Program Outcomes
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
Course Outcomes
CO1 M H
CO2 H
CO3 H M
CO4 H
H- HIGH
CO5 M M- MEDIUM
L- LOW
6
Teaching Scheme and Examination Scheme
Unit 2 Relationship Algebra and Calculus: Relationship Algebra Selection and Projection, Set Operations, Renaming, Joints,
Division, Relation Calculus, Expressive Power of Algebra and Calculus. SQL queries programming and Triggers: The Forms
of a Basic SQL Query, Union, Intersection and Except, Nested Queries, Correlated Nested Queries, Set-Comparison
Operations, Aggregate Operators, Null Values and Embedded SQL, Dynamic SQL, ODBC and JDBC, Triggers and Active
Databases.
Unit 3 Schema refinement and Normal forms: Introductions to Schema Refinement, Functional Dependencies, Boyce-Codd
Normal Forms, Third Normal Form, Normalization Decomposition into BCNF, Decomposition into 3-NF.
Unit 4 Transaction Processing: Introduction-Transaction State, Transaction properties, Concurrent Executions. Need of
Serializability, Conflict vs. View Serializability, Testing for Serializability, Recoverable Schedules, Cascade less Schedules.
Unit 5 Concurrency Control: Implementation of Concurrency: Lock-based protocols, Timestamp based protocols,
Validation-based protocols, Deadlock handling, Database Failure and Recovery: Database Failures, Recovery
Schemes: Shadow Paging and Log-based Recovery, Recovery with Concurrent transactions.
9
Introduction to Database
► Data :
► Facts or values used for reference or analysis.
► Data itself may not have clear meaning until it is processed.
► Data can exist in a variety of forms such as numbers, text or pictures on pieces of
paper, as bits and bytes stored in electronic memory, or as facts stored in a person's mind.
► Information:
► Processed data which give meaning and which can be used for making decisions.
► For example, consider the following list of facts
10
Introduction to Database
► The above fact satisfies the definition of data, but most people would agree that
the data are useless in present from.
► By adding a few additional data items and providing some structure the
above fact would look as presented below:
11
Introduction to Database
► Structured Data
❑ Data that resides in a fixed field within a record or file is called structured data.
These includes data contained in relational database & spreadsheets.
► Unstructured Data:
❑ These are the data that refers to information that does not reside in a
traditional row-column database.
❑ Unstructured data files often include text and multimedia like e-mail messages,
word documents, videos, photos, audio files etc.
13
Component of Database
► Traditional Applications:
► Numeric and Textual Databases
► More Recent Applications:
► Multimedia Databases
► Geographic Information Systems (GIS)
► Biological and Genome Databases
► Data Warehouses
► Mobile databases
► Real-time and Active Databases
15
Types of Databases and Database Applications
► Enterprise Information :
Sales – For customer, product, and purchase information.
Accounting – For payments, receipts, account balances and other accounting information.
Human Resources – For information about employees, salaries, payroll taxes, and benefits.
► Universities :
For student information.
For course registrations.
Standard enterprise information such as human resources and accounting like fees deposit and dues details.
► Telecommunication :
For keeping records of calls made.
For generating monthly bills.
For storing information about the communication networks
17
18
History of Database Systems
► Punched cards, invented by Herman Hollerith, were used at the very beginning of the
twentieth century to record U.S. census data, and mechanical systems were used to process the
cards and tabulate results.
► Punched cards were later widely used as a means of entering data into computers.
Techniques for data storage and processing have evolved over the years:
► 1950s and early 1960s: Magnetic tapes were developed for data storage.
► Late 1960s and 1970s: Widespread use of hard disks in the late 1960s changed the scenario
for data processing
► 1980s: A groundbreaking project at IBM Research that developed techniques for the
construction of an efficient relational database system - System R
The fully functional System R prototype led to IBM’s first relational database product, SQL/DS. At the same time, the
Ingres system was being developed at the University of California at Berkeley. It led to a commercial product of the
same name. Initial commercial relational database systems, such as IBM DB2, Oracle, Ingres, and DEC Rdb, played a
major role in advancing techniques for efficient processing of declarative queries.
► Early 1990s: The SQL language was designed primarily for decision support applications
19
History of Database Systems
► 1990s: The major event of the 1990s was the explosive growth of the World Wide Web. Databases were deployed
much more extensively than ever before. Database systems now had to support very high transaction-processing
rates, as well as very high reliability and 24 × 7 availability (availability 24 hours a day, 7 days a week, meaning
no downtime for scheduled maintenance activities). Database systems also had to support Web interfaces to data.
► 2000s: The first half of the 2000s saw the emerging of XML and the associated query language XQuery as a new
database technology.
In this time period we have also witnessed the growth in “autonomic-computing/auto-admin” techniques for
minimizing system administration effort.
This period also saw a significant growth in use of open-source database systems, particularly PostgreSQL and
MySQL.
20
Data Models
21
Data Models: Basic Building Blocks
22
Data Models: Basic Building Blocks
23
Data Models
Hierarchical Model
► The hierarchical model was developed in the 1960s to manage large amount of data for
complex manufacturing projects.
► Its basic logical structure is represented by an upside-down tree.
► The hierarchical structure contains levels of segments.
► It depicts a set of 1:M relationships between a parent and its children segments
24
Data Models
Hierarchical Model
25
Data Models
Network Model
► The network model was created to represent complex data relationships more
effectively than the hierarchical model, to improve database performance, and to
impose a database standard.
► A user perceives the network model as a collection of records in 1:M relationships
26
Data Models
Relational Model
► The relational model was introduced by E. F. Codd in 1970.
► This data model is implemented through RDBMS; which is easier to understand
and implement
► The most important advantage of the RDBMS is its ability to hide the complexities
of the relational model from the user.
► Another reason for the relational data model’s rise to dominance is its powerful
and flexible query language.
► Generally, SQL is used for this purpose
27
Data Models
Relational Model
28
Data Models
Entity-Relationship(ER) Model
► Peter Chen first introduced the ER data model in 1976; it was the graphical representation of
entities and their relationships in a database structure that quickly became popular.
► Thus, the ER-model has become a widely accepted standard for data modeling
► ER models are normally represented in an ER diagram
29
Data Models
Entity-Relationship(ER) Model
30
Data Models
Object-Oriented(OO) Model
► In object-oriented data model, both data and their relationships are
contained in a single structure called an object.
Object-Relational(OR) Model
► By combining the features of relational data model and object-oriented
data model, object-relational data model was created
Semi-structured Model
► The semi-structured data model permits the specification of data
where individual data items of the same type may have different
sets of attributes.
► The XML (Extensible Markup Language) is widely used to represent
semi-structured data.
31
File Systems
► Before database management systems (DBMSs) were introduced, organizations usually stored information in file
systems. The system stores permanent records in various files, and it needs different application programs to extract
records from, and add records to, the appropriate files. Keeping organizational information in a file-processing
system has a number of major disadvantages:
32
File Systems
► Data redundancy and inconsistency: Same information may be duplicated in several places (files). For example, if a student has a double
major (say, music and mathematics) the address and telephone number of that student may appear in a file that consists of student records of
students in the Music department and in a file that consists of student records of students in the Mathematics department. This redundancy
leads to higher storage and access cost.
► Difficulty in accessing data
► Data isolation. Because data are scattered in various files, and files may be in different formats, writing new application programs to retrieve
the appropriate data is difficult.
► Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints. Suppose the university
maintains an account for each department, and records the balance amount in each account. Suppose also that the university requires that the
account balance of a department may never fall below zero. Developers enforce these constraints in the system by adding appropriate code in
the various application programs.
► Atomicity problems. A computer system, like any other device, is subject to failure. In many applications, it is crucial that, if a failure
occurs, the data be restored to the consistent state that existed prior to the failure.
► Concurrent-access anomalies. For the sake of overall performance of the system and faster response, many systems allow multiple users to
update the data simultaneously.
► Security problems. Not every user of the database system should be able to access all the data.
DBMS vs. File System
File Management System Database Management System
• Data Inconsistency is more in file system. • Data Inconsistency is less in database management system.
• User locates the physical address of the files • In Database Management System, user is unaware of physical
to access data in File Management System. address where data is stored.
• Security is low in File Management System. • Security is high in Database Management System.
• File Management System stores unstructured • Database Management System stores structured data which
data as isolated data files/entities. have well defined constraints and interrelation.
34
Function of DBMS
► Defining database schema: it must give facility for defining the database
structure also specifies access rights to authorized users.
► Manipulation of the database: The DBMS must have functions like insertion
of record into database, updation of data, deletion of data, retrieval of data
► Sharing of database: The DBMS must share data items for multiple users by
maintaining consistency of data.
► Protection of database: It must protect the database against unauthorized users.
► Database recovery: If for any reason the system fails DBMS must facilitate
data base recovery.
35
Advantages of DBMS
► Reduction of redundancies:
Centralized control of data by the DBA avoids unnecessary duplication of data and effectively reduces the total amount of data storage
► Sharing of Data:
A database allows the sharing of data under its control by any number of application programs or users.
► Data Integrity:
Data integrity means that the data contained in the database is both accurate and consistent. Therefore data values being entered for storage could be
checked to ensure that they fall with in a specified range and are of the correct format.
► Data Security:
The DBA who has the ultimate responsibility for the data in the DBMS can ensure that proper access procedures are followed including proper
authentication to access to the DataBase System and additional check before permitting access to sensitive data.
► Conflict Resolution:
DBA resolve the conflict on requirements of various user and applications. The DBA chooses the best file structure and access method to get
optional performance for the application.
► Data Independence:
Data independence is usually considered from two points of views; physically data independence and logical data independence.
Physical Data Independence allows changes in the physical storage devices or organization of the files to be made without requiring changes in the
conceptual view or any of the external views and hence in the application programs using the data base.
Logical Data Independence indicates that the conceptual schema can be changed without affecting the existing external schema or any application
program.
36
Disadvantage of DBMS
► External Schema: The external view is described by means of a schema called external schema. It
corresponds to different view of the data.
► Conceptual Schema: The conceptual view is defined by conceptual schema, which describes all the
entities, attributes and their relationships with the integrity constraints.
► Internal Schema: The internal level is defined by internal schema, which is a complete description of the
internal model.
There is only 1 conceptual schema and 1 internal schema per database and more than 1 external schema
could be there.
The Schema is also known as INTENSION.
44
Mapping between the Levels
External / Conceptual Mapping: Each external schema is related to the conceptual schema by
external conceptual mapping.
► This mapping gives the correspondence among the records and the relationships of the external
& conceptual views.
► There is a mapping from a particular logical record in the external view to one or more
conceptual record in the conceptual view.
Logical data independency is much more difficult to achieve than physical data independency as it
requires the flexibility in the design of the database and programmer has to see the future requirement or
modification in the design.
46
Database Basics
DATA ITEM:
► Also called as field in data processing and is the smallest unit of data that has meaning to its users.
► Eg: “e101”, ”sumit”
► A subschema is derived schema derived from existing schema as per the user requirement. There may be
more then one subschema create for a single conceptual schema.
48
Data Dictionary
A data dictionary or metadata is the data about the data. It is the self describing
nature of database. It contains information about each data element : names,
types, range of values, access authorization, indicates which application program
uses the data etc.
Used by the developers to develop the program, queries to manage and
manipulate the data.
► Active Data Dictionary: Managed automatically by data management
software. It is always consistent with the current structure of the database.
► Passive Data Dictionary: It is use for documentation purposes. Managed by
the user of system and modified manually.
49
Different kinds of databases
In a database, data is organized into tables consisting of rows and columns and it is indexed so data can
be updated, expanded, and deleted easily. Computer databases typically contain file records data like
transactions money in one bank account to another bank account, sales and customer details, fee details
of students, and product details.
► Relational Database: A relational database is made up of a set of tables with data that fits into a
predefined category.
► Distributed Database: A distributed database is a database in which portions of the database are
stored in multiple physical locations, and in which processing is dispersed or replicated among
different points in a network.
► Cloud Database: A cloud database is a database that typically runs on a cloud computing platform.
Database service provides access to the database. Database services make the underlying
software-stack transparent to the user.
50
Database Management System (DBMS).
The software which is used to manage databases is called Database Management System (DBMS). For Example, MySQL,
Oracle, etc. are popular commercial DBMS used in different applications.
Structured Query Language (SQL) is a computer language for storing, manipulating, and retrieving data stored in
relational database management systems (RDBMS).
DBMS allows users the following tasks:
► Data Definition: It helps in the creation, modification, and removal of definitions that define the organization of data
in the database.
► Data Updation: It helps in the insertion, modification, and deletion of the actual data in the database.
► Data Retrieval: It helps in the retrieval of data from the database which can be used by applications for various
purposes.
► User Administration: It helps in registering and monitoring users, enforcing data security, monitoring performance,
maintaining data integrity, dealing with concurrency control, and recovering information corrupted by unexpected
failure.
MySQL is an open-source Relational Database Management System that stores data in a structured format using rows and
columns.
51
Data Languages
DDL is the short name for Data Definition Language, which deals with database schemas and descriptions, of
how the data should reside in the database.
► CREATE: to create a database and its objects like (table, index, views, store procedure, function, and triggers)
► ALTER: alters the structure of the existing database
► DROP: delete objects from the database
► TRUNCATE: remove all records from a table, including all spaces allocated for the records are removed
► COMMENT: add comments to the data dictionary
► RENAME: rename an object
DDL is used to define database objects .The conceptual schema is specified by a set of definitions expressed by this language. It also gives some details about how to
implement this schema in the physical devices used to store the data. This definition includes all the entity sets and their associated attributes and their relationships.
The result of DDL statements will be a set of tables that are stored in special file called data dictionary.
53
DML : Data Manipulation Language
DML is the short name for Data Manipulation Language which deals with data manipulation and includes most common
SQL statements such SELECT, INSERT, UPDATE, DELETE, etc., and it is used to store, modify, retrieve, delete and update
data in a database.
► SELECT: retrieve data from a database
► INSERT: insert data into a table
► UPDATE: updates existing data within a table
► DELETE: Delete all records from a database table
► MERGE: UPSERT operation (insert or update)
► CALL: call a PL/SQL or Java subprogram
► EXPLAIN PLAN: interpretation of the data access path
► LOCK TABLE: concurrency Control
A DML is a language that enables users to access or manipulate data stored in the database. Data manipulation involves retrieval of data from the database, insertion of new
data into the database and deletion of data or modification of existing data.
There are basically two types of DML:
● Procedural: Which requires a user to specify what data is needed and how to get it.
● Non-Procedural: which requires a user to specify what data is needed with out specifying how to get it.
54
DCL : Data Control Language
► DCL is short for Data Control Language which acts as an access specifier to the database.
(basically to grant and revoke permissions to users in the database)
► GRANT: grant permissions to the user for running DML(SELECT, INSERT, DELETE,…)
commands on the table
► REVOKE: revoke permissions to the user for running DML(SELECT, INSERT, DELETE,…)
command on the specified table
This language enables user to grant authorization and canceling authorization of database objects.
55
TCL : Transactional Control Language
TCL is short for Transactional Control Language which acts as an manager for all types of
transactional data and all transactions. Some of the command of TCL are
► Role Back: Used to cancel or Undo changes made in the database
► Commit: It is used to apply or save changes in the database
► Save Point: It is used to save the data on the temporary basis in the database
56
Transaction Management
► The transaction is a sequence of one or more SQL statements that together forms a logical unit of
work.
► Each statement in the transaction perform a part of the task but all of them are required to
complete the task.
► To ensure the integrity of the data, the database system should maintain ACID property of the
transactions
A – Atomicity
C – Consistency
I – Isolation
D - Durability
57
Database Users:
Naive Users :
► Users who need not be aware of the presence of the database system or any other system supporting their
usage are considered naïve users . A user of an automatic teller machine falls on this category.
Online Users :
► These are users who may communicate with the database directly via an online terminal or indirectly via
a user interface and application program. These users are aware of the database system and also know the
data manipulation language system.
Application Programmers :
► Professional programmers who are responsible for developing application programs or user interfaces
utilized by the naïve and online user falls into this category.
58
Database Users:
► Database Administration :
A person who has central control over the system is called database administrator .
The function of DBA are :
1. Creation and modification of conceptual Schema definition
2. Implementation of storage structure and access method.
3. Schema and physical organization modifications .
4. Granting of authorization for data access.
5. Integrity constraints specification.
6. Execute immediate recovery procedure in case of failures
7. Ensure physical security to database
59
Database Users:
► Sophisticated users interact with the system without writing programs. Instead, they form their requests
either using a database query language or by using tools such as data analysis software. Analysts who
submit queries to explore data in the database fall in this category.
► Specialized users are sophisticated users who write specialized database applications that do not fit into
the traditional data-processing framework. Among these applications are computer-aided design systems,
knowledgebase and expert systems, systems that store data with complex data types (for example,
graphics data and audio data), and environment-modeling systems.
60
Basic architecture of a Data Base Management System
STRUCTURE OF A DBMS
► A database system is partitioned into modules that deal with each of the responsibilities of the overall system.
► The functional components of a database system can be broadly divided into the storage manager and the query processor
components.
► The storage manager is important because databases typically require a large amount of storage space. Corporate
databases range in size from hundreds of gigabytes to, for the largest databases, terabytes of data.
► Since the main memory of computers cannot store this much information, the information is stored on disks. Data
are moved between disk storage and main memory as needed.
► Since the movement of data to and from disk is slow relative to the speed of the central processing unit, it is
imperative that the database system structure the data so as to minimize the need to move data between disk and
main memory.
► The query processor is important because it helps the database system to simplify and facilitate access to data.
► The query processor allows database users to obtain good performance while being able to work at the view level
and not be burdened with understanding the physical-level details of the implementation of the system.
61
Basic architecture of a Data Base Management System
STRUCTURE OF A DBMS
Storage Manager
► The storage manager is the component of a database system
that provides the interface between the low- level data stored in
the database and the application programs and queries
submitted to the system.
► The storage manager is responsible for the interaction with the
file manager.
► The raw data are stored on the disk using the file system
provided by the operating system.
► The storage man-ager translates the various DML statements
into low-level file-system commands.
► Thus, the storage manager is responsible for actual handling of
data i.e. storing, retrieving, and updating data in the database.
62
Basic architecture of a Data Base Management System
STRUCTURE OF A DBMS
Storage Manager
The storage manager components include:
∙ Authorization and integrity manager, which tests for the
satisfaction of integrity constraints and checks the authority of users
to access data.
∙ Transaction manager, which ensures that the database remains in
a consistent) state despite system failures, and that concurrent
transaction executions proceed without conflicting.
∙ File manager, which manages the allocation of space on disk
storage and the data structures used to represent information stored
on disk.
∙ Buffer manager, which is responsible for fetching data from disk
storage into main memory, and deciding what data to cache in main
memory. The buffer manager is a critical part of the database
system, since it enables the database to handle data sizes that are
much larger than the size of main memory.
63
Basic architecture of a Data Base Management System
STRUCTURE OF A DBMS
The Query Processor
The query processor components include:
∙ DDL interpreter, which interprets DDL statements and
records the definitions in the data dictionary.
∙ DML compiler, which translates DML statements in a query
language into an evaluation plan consisting of low-level
instructions that the query evaluation engine understands.
STRUCTURE OF A DBMS
Database Architecture
65
66
THANK YOU
“Success is the sum of small efforts, repeated.” —R. Collier