DATA MANAGEMENT II Lecture I
DATA MANAGEMENT II Lecture I
DATA MANAGEMENT II Lecture I
Course Outline
Introduction
Theory: Data, Database & DBMS, History & Characteristics, Stages in creating DB Tests: Week 7 & 10
Practical: Setting up a Relational Database (e.g., MySQL, PostgreSQL, SQLServer) Group Assignments:
Management Tools: GitHub and Jira 1. Perform ETL Process using SQL on Real-life datasets from Kaggle
Database System Concepts and Architecture (any industry of your choice)
Type of Databases, DB Users Objective: Practice SQL Skills
Data Models and Categories: Conceptual, Representational, Self Descriptive, Physical i. Data cleaning: handling duplicate, null values, format &
Three-Schema Architecture and Data Independence inconsistencies
ii. Importing dataset from different sources (pdf, sql, csv, Json)
Data Modeling Using the Entity–Relationship (ER) & Enhanced ER Models (EERD) 2. Dashboarding: Create Visualization Using MS PowerBI or Tableau
Keys (PK, FK, CandidateK, compositeKey) & Constraints Objective: Practice PowerQuery, DAX
Cardinality (1,*,Optional, Mandatory), Participation (overlap/disjoin) Analyze dataset to uncover insights on key trends
The Mapping ERD to Relational Model and Normalization
Normalization: 1NF, 2NF, 3NF, BCNF, 4NF
Mapping ERD to RM considerations Reading List:
Structured Query Language and Database Programming 3. Fundamentals of Database Systems, Elmasri and Navathe, 7th
DDL, DML; CREATE, SELECT, INSERT, UPDATE, DELETE, DROP, TRUNCATE etc. Edition
Database Connection (connect web app to database) 4. Database System Concepts, Abraham, Henry, Sudarshan, 7th Edition
5. Database Systems: Design, Implementation, and Management,
Complex Query Peter & Carlos 7th Edition
Aggregate Functions: SUM, AVG, COUNT
GROUP BY, HAVING, JOIN ( All types)
Case Statements, Sub-Query, Stored Procedures, CTEs, TempTables, Views & Triggers
Transaction Processing
ACID Properties (Atomicity, Consistency, Isolation, Durability)
Concurrency Control Techniques
Locking Protocols and Deadlocks
Advanced Topics: Distributed Databases, NOSQL Systems, Big Data, Blockchain
Architecture of Distributed Databases
Distributed Query Processing
INTRODUCTION
Database is a collection of related data. By data, we mean known facts that can be recorded and that has
implicit meaning. Database has some source from which data is derived (Mini World), some degree of
interaction with events in the real world, and an audience that is actively interested in its contents. A database
represents some aspect of the real world, sometimes called the miniworld or the universe of discourse (UoD).
Changes to the miniworld are reflected in the database. It is a logically coherent collection of data with some
inherent meaning. A random assortment of data cannot correctly be referred to as a database A database is
designed, built, and populated with data for a specific purpose. It has an intended group of users and some
preconceived applications in which these users are interested
Database Management System (DBMS) is a computerized system that enables users to create and maintain a
database. The DBMS is a general-purpose software system that facilitates the processes of defining,
constructing, manipulating, and sharing databases among various users and applications. DBMS protect
database from malicious or unauthorize users and from hardware/software failure
• Database Administrator
• Responsible for authorizing access to the database, coordinating and monitoring its use, and
acquiring software and hardware resources as needed. The DBA is accountable for problems
such as security breaches and poor system response time.
• Database Designer(s)
• Responsible for identifying the data to be stored in the database and for choosing
appropriate structures to represent and store this data.
• End Users
• End users are the people whose jobs require access to the database for querying, updating,
and generating reports; the database primarily exists for their use.
• Software Engineers
• System Analysts and application programmers that design the whole software application
Stages in Designing
Database Application
Data abstraction
Suppression of details of data organization and storage
Highlighting of the essential features for an improved understanding of data
Enable users to perceive data at their preferred level of detail
Data model
collection of concepts that can be used to describe the structure of a database.
Data types
Relationships
Constraints
Data models support basic (retrieval or updates) & dynamic operations as user defined
functions (e.g. computeCGPA)
Categories of Data Models
Conceptual Model
Representational Model
Others
THREE-SCHEMA ARCHITECTURE
.
Categories of Data Models
Conceptual Models : provides concepts that are close to the way users perceived
data. Data values and meta-data are stored separately. Entities, attributes, and
relationships are the building block of conceptual model.
Course. This denotes the relationship between Student and Course entity
Categories of Data Models cont.
Physical/low level Models: Describe the details of how data is stored on the computer
storage media as files – with records format, ordering and access path.
Self Describing Models: The data values and meta-data are combined and stored
together. Examples include XML and NOSQL data models.
Schemas, Instances, and
Database State
Database Schema: refers to the description of the database and normally specified
during initial design. A displayed pictorial schema is known as database schema
diagram.
Note:
No actual data, only representation
of the structure of the record type.
Schema diagram displays only some
aspect of the schema not ALL.
A constraint such as students majoring in computer science must take CSC1201 b4 going for
SIWES is difficult to represent diagrammatically.
Database State & Instances
The data in the database at a particular moment in time is called a database state or snapshot or
current set of occurrences or instances in the database. In a given database state, each schema
construct has its own current set of instances.
A database can have only one current state but many previous states
A newly created database schema begins with an empty database state, thereafter any
insert/update/delete operation change its state to a new one. Initial state of the database is when
its first populated with records.
An instance of the database represent the actual record or an object inserted into database.
DBMS maintains validity of the database state and ensures database conforms to its schema –
(description of constrains)
Classification of DMBS
• Free vs commercial