DATA MANAGEMENT II Lecture I

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

DATA MANAGEMENT II

Course Outline
Introduction
 Theory: Data, Database & DBMS, History & Characteristics, Stages in creating DB Tests: Week 7 & 10
 Practical: Setting up a Relational Database (e.g., MySQL, PostgreSQL, SQLServer) Group Assignments:
 Management Tools: GitHub and Jira 1. Perform ETL Process using SQL on Real-life datasets from Kaggle
Database System Concepts and Architecture (any industry of your choice)
 Type of Databases, DB Users Objective: Practice SQL Skills
 Data Models and Categories: Conceptual, Representational, Self Descriptive, Physical i. Data cleaning: handling duplicate, null values, format &
 Three-Schema Architecture and Data Independence inconsistencies
ii. Importing dataset from different sources (pdf, sql, csv, Json)
Data Modeling Using the Entity–Relationship (ER) & Enhanced ER Models (EERD) 2. Dashboarding: Create Visualization Using MS PowerBI or Tableau
 Keys (PK, FK, CandidateK, compositeKey) & Constraints Objective: Practice PowerQuery, DAX
 Cardinality (1,*,Optional, Mandatory), Participation (overlap/disjoin) Analyze dataset to uncover insights on key trends
The Mapping ERD to Relational Model and Normalization
 Normalization: 1NF, 2NF, 3NF, BCNF, 4NF
 Mapping ERD to RM considerations Reading List:
Structured Query Language and Database Programming 3. Fundamentals of Database Systems, Elmasri and Navathe, 7th
 DDL, DML; CREATE, SELECT, INSERT, UPDATE, DELETE, DROP, TRUNCATE etc. Edition
 Database Connection (connect web app to database) 4. Database System Concepts, Abraham, Henry, Sudarshan, 7th Edition
5. Database Systems: Design, Implementation, and Management,
Complex Query Peter & Carlos 7th Edition
 Aggregate Functions: SUM, AVG, COUNT
 GROUP BY, HAVING, JOIN ( All types)
 Case Statements, Sub-Query, Stored Procedures, CTEs, TempTables, Views & Triggers
Transaction Processing
 ACID Properties (Atomicity, Consistency, Isolation, Durability)
 Concurrency Control Techniques
 Locking Protocols and Deadlocks
Advanced Topics: Distributed Databases, NOSQL Systems, Big Data, Blockchain
 Architecture of Distributed Databases
 Distributed Query Processing
INTRODUCTION
 Database is a collection of related data. By data, we mean known facts that can be recorded and that has
implicit meaning. Database has some source from which data is derived (Mini World), some degree of
interaction with events in the real world, and an audience that is actively interested in its contents. A database
represents some aspect of the real world, sometimes called the miniworld or the universe of discourse (UoD).
Changes to the miniworld are reflected in the database. It is a logically coherent collection of data with some
inherent meaning. A random assortment of data cannot correctly be referred to as a database A database is
designed, built, and populated with data for a specific purpose. It has an intended group of users and some
preconceived applications in which these users are interested

 Database Management System (DBMS) is a computerized system that enables users to create and maintain a
database. The DBMS is a general-purpose software system that facilitates the processes of defining,
constructing, manipulating, and sharing databases among various users and applications. DBMS protect
database from malicious or unauthorize users and from hardware/software failure

 Database Systems means combination of both database and DBMS


Database Users

• Database Administrator
• Responsible for authorizing access to the database, coordinating and monitoring its use, and
acquiring software and hardware resources as needed. The DBA is accountable for problems
such as security breaches and poor system response time.
• Database Designer(s)
• Responsible for identifying the data to be stored in the database and for choosing
appropriate structures to represent and store this data.
• End Users
• End users are the people whose jobs require access to the database for querying, updating,
and generating reports; the database primarily exists for their use.
• Software Engineers
• System Analysts and application programmers that design the whole software application
Stages in Designing
Database Application

 Requirements Specification and Analysis


 Conceptual Design (ER – Diagram)
 Logical Design (Data Model – with DBMS)
 Physical Design (populating the database with records)
Characteristics of Database
Approach

 Self-describing nature of a database system (by storing schema in


catalog)
 Insulation between programs and data, and data abstraction (program
data and program operation independence)
 Support of multiple views of the data

 Sharing of data and multiuser transaction processing


Architecture of Database System Environment
XTICS OF DB APPROACH
Self-describing nature of a database system
Database system contain both database and description or
definition of its structure plus constrain. This definition is stored in
the DBMS catalog, which contains information such as the structure
of each file, the type and storage format of each data item, and
various constraints on the data. The information stored in the
catalog is called meta-data

Insulation between Programs and Data,


and Data Abstraction
The structure of data files is stored in the DBMS
catalog
separately from the access programs. We call this
property program-data independence.
The characteristic that allows program-data
independence and program-operation independence is
called data abstraction.
History of Database System
• Hierarchical and Network Systems
• Lack of data abstraction and program-data independence capabilities
• intermixed conceptual relationship with physical storage
• Relational Model
• Providing Data Abstraction and Application Flexibility
• Object Model
• Ability to model complex datastructure
• But the model is too complex and lacks early standardization
• Object Relational
• Combine the fruits of Relational and Object models
• XML (eXtensible Mackup Language)
• Interchanging Data on the Web for E-Commerce Using XML
• Extending Database Capabilities for New Applications
• Offer extensions to better support specialized apps. E.g. X-rays, MRI, GPS data etc.
• NOSQL Databases
• Emergence of Big Data necessitates new systems to model and manage huge datasets
• Not Only SQL – meaning data stored both as SQL compliant and NOSQL as well.
Advantages of Database Approach
• Controlling Redundancy
• Reduces waste of storage and man power
• Improve data consistency
• Restricting Unauthorized Access
• Different users requires certain access levels to resources.
• Providing Persistent Storage for Program Objects
• Ability to store complex data structure without explicit conversion
• Providing Storage Structures and Search Techniques for Efficient Query Processing
• Using index, buffering and catching techniques to optimize and to plan query execution
• Providing Backup and Recovery
• Ensures the systems is restored to last consistent stage
• Providing multiple User Interface
• CMD, API, Forms and Menus depending on users request and technical skills
• Representing Complex Relationships among Data
• Enforcing Integrity Constraints
• Permitting Inferencing and Actions
• Using Rules and Triggers
Data Model and Abstraction

Data abstraction
Suppression of details of data organization and storage
Highlighting of the essential features for an improved understanding of data
Enable users to perceive data at their preferred level of detail
Data model
collection of concepts that can be used to describe the structure of a database.
Data types
Relationships
Constraints
Data models support basic (retrieval or updates) & dynamic operations as user defined
functions (e.g. computeCGPA)
Categories of Data Models

 Conceptual Model

 Physical/low level Model

 Representational Model

 Self Descriptive Model

 Others
THREE-SCHEMA ARCHITECTURE

.
Categories of Data Models

Conceptual Models : provides concepts that are close to the way users perceived
data. Data values and meta-data are stored separately. Entities, attributes, and
relationships are the building block of conceptual model.

Entities: represent real world objects or concepts such as student, teacher,


employee, subject, job, course

Attributes: represent the features/characteristics of an entity that helps explain the


entity such as student name, doB, gender etc.

Relationship: represents association of two or more entities. Student registers_for

Course. This denotes the relationship between Student and Course entity
Categories of Data Models cont.
Physical/low level Models: Describe the details of how data is stored on the computer
storage media as files – with records format, ordering and access path.

Representational Models: An intermediate and most widely used model that


abstracted away the details storage representation but ready for implementation.
Relational model is a good example.

Self Describing Models: The data values and meta-data are combined and stored
together. Examples include XML and NOSQL data models.
Schemas, Instances, and
Database State
 Database Schema: refers to the description of the database and normally specified
during initial design. A displayed pictorial schema is known as database schema
diagram.
Note:
 No actual data, only representation
of the structure of the record type.
 Schema diagram displays only some
aspect of the schema not ALL.

We call each object in the schema—


such as STUDENT or SECTION— a schema construct.

A constraint such as students majoring in computer science must take CSC1201 b4 going for
SIWES is difficult to represent diagrammatically.
Database State & Instances

The data in the database at a particular moment in time is called a database state or snapshot or
current set of occurrences or instances in the database. In a given database state, each schema
construct has its own current set of instances.

 A database can have only one current state but many previous states

 A newly created database schema begins with an empty database state, thereafter any
insert/update/delete operation change its state to a new one. Initial state of the database is when
its first populated with records.

 An instance of the database represent the actual record or an object inserted into database.

DBMS maintains validity of the database state and ensures database conforms to its schema –
(description of constrains)
Classification of DMBS

• Criteria : can be categorized base on:-


• Data Model (relational, object, NOSQL etc.)

• No of concurrent users (single vs multiple users)

• No of Sites (centralized vs distributed)


• Homogeneous: same DBMS across multiple site

• Heterogeneous: different DBMS at each site

• Cost and License:


• Open source vs close source

• Free vs commercial

• General vs Specialize purpose


END OF LECTURE 1

You might also like