Data Wrangling

📖 Project Overview

This project involves working with a clinical trial dataset containing information on 500 patients, of which 350 participated in a trial comparing two insulin treatments: Novodra (injectable) and Auralin (oral).
The dataset includes patient details, treatment records, HbA1c measurements, and adverse reactions.

The main goal of data wrangling here is to:

Clean and organize raw data
Handle missing or inconsistent values
Prepare data for analysis (statistical testing, visualization, reporting)

📂 Dataset Description

Patients Table (`patients`)

Contains demographic and baseline details:

Identifiers (patient_id, name, contact, address)
Demographics (sex, birthdate, age)
Measurements (weight, height, BMI)

Treatments Table (`treatments`, `treatment_cut`)

Tracks treatment progress and effectiveness:

Insulin doses (Auralin, Novodra)
HbA1c levels (start, end, change)

Adverse Reactions Table (`adverse_reactions`)

Logs reported side effects for both treatment groups.

🛠️ Data Wrangling Steps

The wrangling process will include:

Loading Data – Import CSV/Excel files into Pandas
Exploring Structure – Use .info(), .head(), .describe()
Cleaning
- Remove duplicates
- Standardize column names
- Handle missing values (imputation or removal)
Transformations
- Convert datatypes (e.g., birthdate → datetime, zip_code → string)
- Calculate derived columns (e.g., age from birthdate, BMI categories)
Merging Tables – Combine patients, treatments, and adverse reactions for complete analysis
Validation – Ensure correct ranges (BMI, age ≥ 18, HbA1c values)

Data Wrangling - Clinical Trial Dataset

📖 Project Overview

This project works with a clinical trial dataset of 500 patients, where 350 participated in a study comparing two insulin treatments: Novodra (injectable) and Auralin (oral).
The dataset includes patient demographics, treatment details, HbA1c levels, and reported adverse reactions.

The goal is to clean, transform, and prepare the data for analysis.

📂 Dataset Structure

🧑 Patients Table (`patients`)

patient_id → Unique patient ID
assigned_sex → Sex at birth (Male/Female)
given_name, surname → Patient names
address, city, state, zip_code, country → Contact details (all US)
contact → Phone & email
birthdate → Patient’s date of birth (Age ≥ 18 included)
weight, height, bmi → Body stats (Inclusion BMI: 16–38)

💉 Treatments Table (`treatments`, `treatment_cut`)

given_name, surname → Patient identifiers
auralin → Baseline and final insulin doses (units “u”)
novodra → Same as above, for Novodra group
hba1c_start, hba1c_end → HbA1c levels at start and end (%)
hba1c_change → Change in HbA1c (start − end)

⚠️ Adverse Reactions Table (`adverse_reactions`)

given_name, surname → Patient identifiers
adverse_reaction → Reported side effect

🛠️ Data Wrangling Steps

Load Data → Import CSV/Excel files into Pandas
Explore → Use .info(), .head(), .describe()
Clean → Remove duplicates, standardize names, handle missing values
Transform → Convert datatypes, derive new columns (e.g., Age, BMI category)
Merge → Combine patients, treatments, and adverse reactions
Validate → Ensure correct ranges (Age ≥ 18, BMI 16–38, valid HbA1c values)

🔍 Example Pandas Functions

df.info()
df.describe()
df.isna().sum()
df.drop_duplicates()
df.fillna()
df.merge()
df.groupby()

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
1__adverse__reactions__Dataset.csv		1__adverse__reactions__Dataset.csv
2__treatments__cut__Dataset.csv		2__treatments__cut__Dataset.csv
3__treatments__Dataset.csv		3__treatments__Dataset.csv
4__patients__Dataset.csv		4__patients__Dataset.csv
5__ Data__Analysis__In__Data Wrangling.ipynb		5__ Data__Analysis__In__Data Wrangling.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Wrangling

📖 Project Overview

📂 Dataset Description

Patients Table (`patients`)

Treatments Table (`treatments`, `treatment_cut`)

Adverse Reactions Table (`adverse_reactions`)

🛠️ Data Wrangling Steps

Data Wrangling - Clinical Trial Dataset

📖 Project Overview

📂 Dataset Structure

🧑 Patients Table (`patients`)

💉 Treatments Table (`treatments`, `treatment_cut`)

⚠️ Adverse Reactions Table (`adverse_reactions`)

🛠️ Data Wrangling Steps

🔍 Example Pandas Functions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Wrangling

📖 Project Overview

📂 Dataset Description

Patients Table (patients)

Treatments Table (treatments, treatment_cut)

Adverse Reactions Table (adverse_reactions)

🛠️ Data Wrangling Steps

Data Wrangling - Clinical Trial Dataset

📖 Project Overview

📂 Dataset Structure

🧑 Patients Table (patients)

💉 Treatments Table (treatments, treatment_cut)

⚠️ Adverse Reactions Table (adverse_reactions)

🛠️ Data Wrangling Steps

🔍 Example Pandas Functions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Patients Table (`patients`)

Treatments Table (`treatments`, `treatment_cut`)

Adverse Reactions Table (`adverse_reactions`)

🧑 Patients Table (`patients`)

💉 Treatments Table (`treatments`, `treatment_cut`)

⚠️ Adverse Reactions Table (`adverse_reactions`)

Packages