machine learning +
101 Polars Exercises for Data Analysis (with Solutions)
101 NumPy Exercises for Data Analysis (Python)
The goal of the numpy exercises is to serve as a reference as well as to get you to apply numpy beyond the basics. The questions are of 4 levels of difficulties with L1 being the easiest to L4 being the hardest.
The goal of the numpy exercises is to serve as a reference as well as to get you to apply numpy beyond the basics. The questions are of 4 levels of difficulties with L1 being the easiest to L4 being the hardest.

If you want a quick refresher on numpy, the following tutorial is best:
Numpy Tutorial Part 1: Introduction
Numpy Tutorial Part 2: Advanced numpy tutorials.
Related Post:
101 Practice exercises with pandas.
This is an interactive version — you can edit and run every code block directly in your browser. No installation needed. All code runs locally in your browser and nothing is sent to any server.
Click ‘Run’ or press Ctrl+Enter on any code block to execute it. The first run may take a few seconds to initialize.
1. Import NumPy and Check the Version
Import numpy as np and print the version number.
Difficulty Level: L1
Solve:
# Task: Import numpy as np and print the version number import numpy as np # Write your code below
Desired Output:
1.13.3
Why this matters: Verifying the NumPy version ensures compatibility with your code and helps debug version-specific behavior differences.
2. Create a 1D Array
Create a 1D array of numbers from 0 to 9.
Difficulty Level: L1
Solve:
# Task: Create a 1D array of numbers from 0 to 9 import numpy as np # Write your code below
Desired Output:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Why this matters: Creating sequential arrays is the starting point for indexing exercises, test data generation, and understanding how NumPy stores data in memory.
3. Create a Boolean Array
Create a 3×3 numpy array of all True values.
Difficulty Level: L1
Solve:
# Task: Create a 3x3 numpy array of all True's import numpy as np # Write your code below
Desired Output:
[[ True True True] [ True True True] [ True True True]]
Why this matters: Boolean arrays are used as masks for filtering data, and understanding how to create them is essential for conditional selection in data analysis.
4. Extract Items That Satisfy a Condition
Extract all odd numbers from arr.
Difficulty Level: L1
Solve:
# Task: Extract all odd numbers from arr import numpy as np arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # Write your code below
Desired Output:
array([1, 3, 5, 7, 9])
Why this matters: Boolean indexing is the primary way to filter rows in data cleaning and subsetting datasets before analysis.
5. Replace Items That Satisfy a Condition
Replace all odd numbers in arr with -1.
Difficulty Level: L1
Solve:
# Task: Replace all odd numbers in arr with -1 import numpy as np arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # Write your code below
Desired Output:
array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])
Why this matters: Conditional replacement is used to handle outliers, recode categories, and clean invalid entries in datasets.
6. Replace Items Without Affecting the Original Array
Replace all odd numbers in arr with -1 without modifying the original arr.
Difficulty Level: L2
Solve:
# Task: Replace all odd numbers with -1 without changing the original arr import numpy as np arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # Write your code below
Desired Output:
out: [ 0 -1 2 -1 4 -1 6 -1 8 -1] arr: [0 1 2 3 4 5 6 7 8 9]
Why this matters: Preserving the original data while creating transformed copies is critical in ML pipelines where you need both raw and processed versions of features.
7. Reshape an Array
Convert the 1D array arr to a 2D array with 2 rows.
Difficulty Level: L1
Solve:
# Task: Convert a 1D array to a 2D array with 2 rows import numpy as np arr = np.arange(10) # Write your code below
Desired Output:
[[0 1 2 3 4] [5 6 7 8 9]]
Why this matters: Reshaping is one of the most common operations when preparing data for ML models, which expect inputs in specific shapes (e.g., batches of samples).
8. Stack Two Arrays Vertically
Stack arrays a and b vertically.
Difficulty Level: L2
Solve:
# Task: Stack arrays a and b vertically import numpy as np a = np.arange(10).reshape(2,-1) b = np.repeat(1, 10).reshape(2,-1) # Write your code below
Desired Output:
[[0 1 2 3 4] [5 6 7 8 9] [1 1 1 1 1] [1 1 1 1 1]]
Why this matters: Vertical stacking is used to combine datasets, append new observations to existing data, or merge training and validation sets.
9. Stack Two Arrays Horizontally
Stack arrays a and b horizontally.
Difficulty Level: L2
Solve:
# Task: Stack arrays a and b horizontally import numpy as np a = np.arange(10).reshape(2,-1) b = np.repeat(1, 10).reshape(2,-1) # Write your code below
Desired Output:
[[0 1 2 3 4 1 1 1 1 1] [5 6 7 8 9 1 1 1 1 1]]
Why this matters: Horizontal stacking is used to add new features (columns) to a dataset, such as appending engineered features alongside existing ones.
10. Generate Custom Sequences Without Hardcoding
Using only numpy functions and the input array a, produce an array that first repeats each element 3 times, then tiles the whole array 3 times.
Difficulty Level: L2
Solve:
# Task: Create the pattern without hardcoding using numpy functions import numpy as np a = np.array([1,2,3]) # Write your code below
Desired Output:
array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
Why this matters: Programmatic sequence generation is essential for creating repeated patterns in simulations, data augmentation, and feature engineering.
11. Get Common Items Between Two Arrays
Find the common items between arrays a and b.
Difficulty Level: L2
Solve:
# Task: Get the common items between a and b import numpy as np a = np.array([1,2,3,2,3,4,3,4,5,6]) b = np.array([7,2,10,2,7,4,9,4,9,8]) # Write your code below
Desired Output:
[2 4]
Why this matters: Set intersection is used in data cleaning to find shared records, overlapping features, or matching IDs across datasets.
12. Remove Items Present in Another Array
From array a, remove all items that are present in array b.
Difficulty Level: L2
Solve:
# Task: From array a remove all items present in array b import numpy as np a = np.array([1,2,3,4,5]) b = np.array([5,6,7,8,9]) # Write your code below
Desired Output:
[1 2 3 4]
Why this matters: Set difference operations are used to exclude known values, filter out stop words, or remove already-processed records from a pipeline.
13. Find Positions Where Two Arrays Match
Get the positions (indices) where elements of a and b are equal.
Difficulty Level: L2
Solve:
# Task: Get the positions where elements of a and b match import numpy as np a = np.array([1,2,3,2,3,4,3,4,5,6]) b = np.array([7,2,10,2,7,4,9,4,9,8]) # Write your code below
Desired Output:
(array([1, 3, 5, 7]),)
Why this matters: Finding matching positions is used in evaluation metrics (e.g., comparing predicted vs. actual labels) and aligning paired datasets.
14. Extract Numbers Within a Given Range
From array a, extract all items between 5 and 10 (inclusive).
Difficulty Level: L2
Solve:
# Task: Get all items between 5 and 10 from a import numpy as np a = np.array([2, 6, 1, 9, 10, 3, 27]) # Write your code below
Desired Output:
[ 6 9 10]
Why this matters: Range-based filtering is used for outlier removal, selecting data within valid bounds, and binning continuous variables.
15. Vectorize a Scalar Function to Work on Arrays
Convert the scalar function maxx into a vectorized version that works element-wise on arrays a and b.
Difficulty Level: L2
Solve:
# Task: Vectorize the maxx function to work on arrays
import numpy as np
def maxx(x, y):
"""Get the maximum of two items"""
if x >= y:
return x
else:
return y
a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])
# Write your code below
Desired Output:
[6. 7. 9. 8. 9. 7. 5.]
Why this matters: Vectorizing custom functions lets you apply complex business logic across entire arrays without slow Python loops, which is critical for performance in large-scale data processing.
16. Swap Two Columns in a 2D Array
Swap columns 1 and 2 in the array arr.
Difficulty Level: L2
Solve:
# Task: Swap columns 1 and 2 in the array arr import numpy as np arr = np.arange(9).reshape(3,3) # Write your code below
Desired Output:
[[1 0 2] [4 3 5] [7 6 8]]
Why this matters: Reordering columns is a common step when aligning feature order across datasets or when a model expects features in a specific sequence.
17. Swap Two Rows in a 2D Array
Swap rows 1 and 2 in the array arr.
Difficulty Level: L2
Solve:
# Task: Swap rows 1 and 2 in the array arr import numpy as np arr = np.arange(9).reshape(3,3) # Write your code below
Desired Output:
[[3 4 5] [0 1 2] [6 7 8]]
Why this matters: Row swapping is used in data shuffling, reordering observations, and implementing custom sorting logic for training data.
18. Reverse the Rows of a 2D Array
Reverse the row order of the 2D array arr.
Difficulty Level: L2
Solve:
# Task: Reverse the rows of a 2D array import numpy as np arr = np.arange(9).reshape(3,3) # Write your code below
Desired Output:
[[6 7 8] [3 4 5] [0 1 2]]
Why this matters: Reversing rows is used in image flipping for data augmentation and in time series analysis to process data from most recent to oldest.
19. Reverse the Columns of a 2D Array
Reverse the column order of the 2D array arr.
Difficulty Level: L2
Solve:
# Task: Reverse the columns of a 2D array import numpy as np arr = np.arange(9).reshape(3,3) # Write your code below
Desired Output:
[[2 1 0] [5 4 3] [8 7 6]]
Why this matters: Column reversal is used in image mirroring for data augmentation and when reordering features for visualization.
20. Create a 2D Array of Random Floats Between 5 and 10
Create a 2D array of shape 5×3 containing random decimal numbers between 5 and 10.
Difficulty Level: L2
Solve:
# Task: Create a 2D array of shape 5x3 with random floats between 5 and 10 import numpy as np # Write your code below
Desired Output:
[[ 8.501 9.105 6.859] [ 9.763 9.877 7.135] [ 7.49 8.334 6.168] [ 7.75 9.945 5.274] [ 8.085 5.562 7.312]]
Why this matters: Generating random arrays within a specific range is used in weight initialization for neural networks and in Monte Carlo simulations.
21. Print Only 3 Decimal Places
Set NumPy print options so that the array rand_arr displays only 3 decimal places.
Difficulty Level: L1
Solve:
# Task: Print only 3 decimal places of the numpy array import numpy as np rand_arr = np.random.random((5,3)) # Write your code below
Desired Output:
[[ 0.443 0.109 0.97 ] [ 0.388 0.447 0.191] [ 0.891 0.474 0.212] [ 0.609 0.518 0.403]]
Why this matters: Controlling decimal precision makes array output readable during debugging and prevents noisy floating-point digits from cluttering your analysis.
22. Suppress Scientific Notation in Print Output
Pretty print rand_arr by suppressing scientific notation (like 1e10).
Difficulty Level: L1
Solve:
# Task: Pretty print by suppressing scientific notation import numpy as np np.random.seed(100) rand_arr = np.random.random([3,3])/1e3 print(rand_arr) # Write your code below
Desired Output:
[[ 0.000543 0.000278 0.000425] [ 0.000845 0.000005 0.000122] [ 0.000671 0.000826 0.000137]]
Why this matters: Suppressing scientific notation makes small or large numbers human-readable in reports and logs, especially during exploratory data analysis.
23. Limit the Number of Printed Items
Set NumPy print options so that array a displays a maximum of 6 elements, with the rest replaced by ellipsis.
Difficulty Level: L1
Solve:
# Task: Limit the number of items printed to a maximum of 6 elements import numpy as np a = np.arange(15) print(a) # Write your code below
Desired Output:
[ 0 1 2 ... 12 13 14]
Why this matters: Limiting printed output prevents your console from flooding when working with large arrays, making debugging faster and more manageable.
24. Print the Full Array Without Truncating
Print the full numpy array a without truncation, even when the print threshold is set low.
Difficulty Level: L1
Solve:
# Task: Print the full numpy array without truncating import numpy as np np.set_printoptions(threshold=6) a = np.arange(15) print(a) # This truncates # Write your code below to print the full array
Desired Output:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
Why this matters: Seeing every element is sometimes necessary for validating data integrity or inspecting small-to-medium arrays during debugging.
25. Import a Dataset With Mixed Types
Import the iris dataset from the URL, keeping the text (species) column intact alongside the numeric columns.
Difficulty Level: L2
Solve:
# Task: Import the iris dataset keeping the text intact import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' # Write your code below
Desired Output:
[[b'5.1' b'3.5' b'1.4' b'0.2' b'Iris-setosa'] [b'4.9' b'3.0' b'1.4' b'0.2' b'Iris-setosa'] [b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa']]
Why this matters: Real-world datasets almost always mix numeric and categorical data, and loading them correctly is the first step in any data analysis pipeline.
26. Extract a Particular Column From a 1D Structured Array
Extract the text column species (5th field) from the 1D structured array iris_1d.
Difficulty Level: L2
Solve:
# Task: Extract the species column from 1D iris array import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_1d = np.genfromtxt(url, delimiter=',', dtype=None) # Write your code below
Desired Output:
(150,) [b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa']
Why this matters: Extracting specific columns from structured arrays is essential for isolating target labels or categorical features before model training.
27. Convert a 1D Structured Array to a 2D Numeric Array
Convert the 1D structured array iris_1d to a 2D array iris_2d by omitting the species text field and keeping only the four numeric columns.
Difficulty Level: L2
Solve:
# Task: Convert 1D iris to 2D array by omitting species text field import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_1d = np.genfromtxt(url, delimiter=',', dtype=None) # Write your code below
Desired Output:
[[ 5.1 3.5 1.4 0.2] [ 4.9 3. 1.4 0.2] [ 4.7 3.2 1.3 0.2] [ 4.6 3.1 1.5 0.2]]
Why this matters: ML models require pure numeric feature matrices, so stripping text columns and converting to a 2D float array is a routine preprocessing step.
28. Compute Mean, Median, and Standard Deviation
Find the mean, median, and standard deviation of the sepallength column (1st column) from the iris dataset.
Difficulty Level: L1
Solve:
# Task: Find the mean, median, standard deviation of sepallength import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') # Write your code below
Desired Output:
5.84333333333 5.8 0.825301291785
Why this matters: These three summary statistics are the foundation of exploratory data analysis and are used to understand data distribution before building any model.
29. Normalize an Array to the 0-1 Range
Normalize the sepallength array so the minimum maps to 0 and the maximum maps to 1.
Difficulty Level: L2
Solve:
# Task: Normalize sepallength to range between 0 and 1 import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Write your code below
Desired Output:
[ 0.222 0.167 0.111 0.083 0.194 0.306 0.083 0.194 0.028 0.167 0.306 0.139 0.139 0. 0.417 0.389 0.306 0.222 0.389 0.222 0.306 0.222 0.083 0.222 0.139 0.194 0.194 0.25 0.25 0.111 0.139 0.306 0.25 0.333 0.167 0.194 0.333 0.167 0.028 0.222 0.194 0.056 0.028 0.194 0.222 0.139 0.222 0.083 0.278 0.194 0.75 0.583 0.722 0.333 0.611 0.389 0.556 0.167 0.639 0.25 0.194 0.444 0.472 0.5 0.361 0.667 0.361 0.417 0.528 0.361 0.444 0.5 0.556 0.5 0.583 0.639 0.694 0.667 0.472 0.389 0.333 0.333 0.417 0.472 0.306 0.472 0.667 0.556 0.361 0.333 0.333 0.5 0.417 0.194 0.361 0.389 0.389 0.528 0.222 0.389 0.556 0.417 0.778 0.556 0.611 0.917 0.167 0.833 0.667 0.806 0.611 0.583 0.694 0.389 0.417 0.583 0.611 0.944 0.944 0.472 0.722 0.361 0.944 0.556 0.667 0.806 0.528 0.5 0.583 0.806 0.861 1. 0.583 0.556 0.5 0.944 0.556 0.583 0.472 0.722 0.667 0.722 0.417 0.694 0.667 0.667 0.556 0.611 0.528 0.444]
Why this matters: Min-max normalization is a standard preprocessing step for ML algorithms (e.g., KNN, SVM) that are sensitive to feature scales.
30. Compute the Softmax Score
Compute the softmax scores for the sepallength array.
Difficulty Level: L3
Solve:
# Task: Compute the softmax score of sepallength import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Write your code below
Desired Output:
[ 0.002 0.002 0.001 0.001 0.002 0.003 0.001 0.002 0.001 0.002 0.003 0.002 0.002 0.001 0.004 0.004 0.003 0.002 0.004 0.002 0.003 0.002 0.001 0.002 0.002 0.002 0.002 0.002 0.002 0.001 0.002 0.003 0.002 0.003 0.002 0.002 0.003 0.002 0.001 0.002 0.002 0.001 0.001 0.002 0.002 0.002 0.002 0.001 0.003 0.002 0.015 0.008 0.013 0.003 0.009 0.004 0.007 0.002 0.01 0.002 0.002 0.005 0.005 0.006 0.004 0.011 0.004 0.004 0.007 0.004 0.005 0.006 0.007 0.006 0.008 0.01 0.012 0.011 0.005 0.004 0.003 0.003 0.004 0.005 0.003 0.005 0.011 0.007 0.004 0.003 0.003 0.006 0.004 0.002 0.004 0.004 0.004 0.007 0.002 0.004 0.007 0.004 0.016 0.007 0.009 0.027 0.002 0.02 0.011 0.018 0.009 0.008 0.012 0.004 0.004 0.008 0.009 0.03 0.03 0.005 0.013 0.004 0.03 0.007 0.011 0.018 0.007 0.006 0.008 0.018 0.022 0.037 0.008 0.007 0.006 0.03 0.007 0.008 0.005 0.013 0.011 0.013 0.004 0.012 0.011 0.011 0.007 0.009 0.007 0.005]
Why this matters: Softmax converts raw scores into probabilities and is the final activation function in most classification neural networks.
31. Find Percentile Scores
Find the 5th and 95th percentile of the sepallength array.
Difficulty Level: L1
Solve:
# Task: Find the 5th and 95th percentile of sepallength import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Write your code below
Desired Output:
[ 4.6 7.255]
Why this matters: Percentiles are used to detect outliers, set clipping thresholds, and define confidence intervals in statistical analysis.
32. Insert Values at Random Positions
Insert np.nan at 20 random positions in the iris_2d dataset.
Difficulty Level: L2
Solve:
# Task: Insert np.nan at 20 random positions in iris_2d import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='object') # Write your code below
Desired Output:
[[b'5.1' b'3.5' b'1.4' b'0.2' b'Iris-setosa'] [b'4.9' b'3.0' b'1.4' b'0.2' b'Iris-setosa'] [b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa'] [b'4.6' b'3.1' b'1.5' b'0.2' b'Iris-setosa'] [b'5.0' b'3.6' b'1.4' b'0.2' b'Iris-setosa'] [b'5.4' b'3.9' b'1.7' b'0.4' b'Iris-setosa'] [b'4.6' b'3.4' b'1.4' b'0.3' b'Iris-setosa'] [b'5.0' b'3.4' b'1.5' b'0.2' b'Iris-setosa'] [b'4.4' nan b'1.4' b'0.2' b'Iris-setosa'] [b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']]
Why this matters: Injecting missing values is used to simulate real-world incomplete data when testing how your cleaning pipeline handles gaps.
33. Find the Position of Missing Values
Find the number and position of missing values in iris_2d‘s sepallength (1st column).
Difficulty Level: L2
Solve:
# Task: Find number and position of missing values in sepallength import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) np.random.seed(100) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Write your code below
Desired Output:
Number of missing values: 5 Position of missing values: (array([ 39, 88, 99, 130, 147]),)
Why this matters: Locating missing values is the first step in any data cleaning workflow — you need to know where gaps exist before deciding how to fill or drop them.
34. Filter a NumPy Array Based on Multiple Conditions
Filter the rows of iris_2d where petallength (3rd column) > 1.5 and sepallength (1st column) < 5.0.
Difficulty Level: L3
Solve:
# Task: Filter rows where petallength > 1.5 and sepallength < 5.0 import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) # Write your code below
Desired Output:
[[ 4.8 3.4 1.6 0.2] [ 4.8 3.4 1.9 0.2] [ 4.7 3.2 1.6 0.2] [ 4.8 3.1 1.6 0.2] [ 4.9 2.4 3.3 1. ] [ 4.9 2.5 4.5 1.7]]
Why this matters: Multi-condition filtering is used constantly in data analysis to subset records that meet specific criteria, such as selecting high-risk patients or underperforming products.
35. Drop Rows Containing Missing Values
Select the rows of iris_2d that do not have any nan value, keeping only complete rows.
Difficulty Level: L3
Solve:
# Task: Drop rows containing any nan value from iris_2d import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) # Introduce some nan values for the exercise iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Write your code below
Desired Output:
array([[ 4.9, 3. , 1.4, 0.2],
[ 4.7, 3.2, 1.3, 0.2],
[ 4.6, 3.1, 1.5, 0.2],
[ 5. , 3.6, 1.4, 0.2],
[ 5.4, 3.9, 1.7, 0.4]])Why this matters: Dropping rows with missing values is a standard data cleaning step before training ML models that cannot handle NaN inputs.
36. Find Correlation Between Two Columns
Compute the Pearson correlation between SepalLength (1st column) and PetalLength (3rd column) in iris_2d.
Difficulty Level: L2
Solve:
# Task: Find correlation between SepalLength and PetalLength import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) # Write your code below
Desired Output:
0.871754157305
Why this matters: Correlation analysis is a core step in exploratory data analysis and feature selection for ML models.
37. Check for Missing Values in an Array
Determine whether iris_2d has any missing (nan) values and print the boolean result.
Difficulty Level: L2
Solve:
# Task: Check if iris_2d has any missing values import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) # Write your code below
Desired Output:
False
Why this matters: Detecting missing values early prevents silent errors in downstream computations like aggregation and model training.
38. Replace All Missing Values with Zero
Replace all nan values in iris_2d with 0 and print the first four rows.
Difficulty Level: L2
Solve:
# Task: Replace all nan values with 0 import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Write your code below
Desired Output:
array([[ 5.1, 3.5, 1.4, 0. ],
[ 4.9, 3. , 1.4, 0.2],
[ 4.7, 3.2, 1.3, 0.2],
[ 4.6, 3.1, 1.5, 0.2]])Why this matters: Imputing missing values with zero is a quick baseline strategy in data cleaning pipelines before more sophisticated imputation.
39. Count Unique Values in an Array
Find the unique values and their counts in the species column of the iris dataset.
Difficulty Level: L2
Solve:
# Task: Find unique values and their counts in iris species
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
# Write your code below
Desired Output:
(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
dtype='|S15'), array([50, 50, 50]))Why this matters: Checking class distribution is essential before training classifiers to detect imbalanced datasets.
40. Bin a Numeric Column into Categories
Bin the petal length (3rd column) of iris into a text array using these rules: less than 3 becomes 'small', 3 to 5 becomes 'medium', and 5 or above becomes 'large'.
Difficulty Level: L2
Solve:
# Task: Bin petal length into 'small', 'medium', 'large' categories
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
# Write your code below
Desired Output:
['small', 'small', 'small', 'small']
Why this matters: Binning continuous features into categories is a common feature engineering technique for decision trees and rule-based models.
41. Create a New Column from Existing Columns
Add a new column to iris_2d for volume, computed as (pi x petallength x sepallength^2) / 3.
Difficulty Level: L2
Solve:
# Task: Create a volume column from existing columns
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
# Write your code below
Desired Output:
array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa', 38.13265162927291],
[b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa', 35.200498485922445],
[b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa', 30.0723720777127],
[b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa', 33.238050274980004]], dtype=object)Why this matters: Deriving new features from existing columns is a fundamental feature engineering step that can improve model performance.
42. Probabilistic Sampling from a Categorical Array
Randomly sample from iris‘s species column such that setosa appears twice as often as versicolor and virginica.
Difficulty Level: L3
Solve:
# Task: Probabilistic sampling with setosa twice as likely import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') # Write your code below
Desired Output:
(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'], dtype=object), array([77, 37, 36]))
Why this matters: Probabilistic sampling is used in class balancing, bootstrapping, and data augmentation for imbalanced datasets.
43. Get the Second Largest Value by Group
Find the second longest petallength among species setosa in the iris dataset.
Difficulty Level: L2
Solve:
# Task: Find the second longest petallength of species setosa
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
# Write your code below
Desired Output:
1.7
Why this matters: Computing grouped statistics beyond simple max/min is common in exploratory data analysis and outlier detection.
44. Sort a 2D Array by a Column
Sort the iris dataset based on the sepallength column (1st column) in ascending order.
Difficulty Level: L2
Solve:
# Task: Sort iris dataset by sepallength (1st column)
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
# Write your code below
Desired Output:
[[b'4.3' b'3.0' b'1.1' b'0.1' b'Iris-setosa'] [b'4.4' b'3.2' b'1.3' b'0.2' b'Iris-setosa'] [b'4.4' b'3.0' b'1.3' b'0.2' b'Iris-setosa'] [b'4.4' b'2.9' b'1.4' b'0.2' b'Iris-setosa'] [b'4.5' b'2.3' b'1.3' b'0.3' b'Iris-setosa'] [b'4.6' b'3.6' b'1.0' b'0.2' b'Iris-setosa'] [b'4.6' b'3.1' b'1.5' b'0.2' b'Iris-setosa'] [b'4.6' b'3.4' b'1.4' b'0.3' b'Iris-setosa'] [b'4.6' b'3.2' b'1.4' b'0.2' b'Iris-setosa'] [b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa']]
Why this matters: Sorting datasets by a specific feature is fundamental for ranking, binary search, and ordered visualizations.
45. Find the Most Frequent Value in an Array
Find the most frequent value of petal length (3rd column) in the iris dataset.
Difficulty Level: L1
Solve:
# Task: Find the most frequent petal length value
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
# Write your code below
Desired Output:
b'1.5'
Why this matters: Identifying the mode of a feature helps detect dominant patterns and is used in imputation strategies.
46. Find the First Occurrence Greater Than a Value
Find the position of the first occurrence of a value greater than 1.0 in the petalwidth (4th column) of the iris dataset.
Difficulty Level: L2
Solve:
# Task: Find position of first value > 1.0 in petalwidth column import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') # Write your code below
Desired Output:
50
Why this matters: Locating threshold crossings is used in signal processing, event detection, and conditional filtering of datasets.
47. Clip Array Values to a Range
From the array a, replace all values greater than 30 with 30 and all values less than 10 with 10.
Difficulty Level: L2
Solve:
# Task: Clip values to range [10, 30] import numpy as np np.random.seed(100) a = np.random.uniform(1, 50, 20) # Write your code below
Desired Output:
[ 27.63 14.64 21.8 30. 10. 10. 30. 30. 10. 29.18 30. 11.25 10.08 10. 11.77 30. 30. 10. 30. 14.43]
Why this matters: Clipping values to a valid range is essential for outlier handling and ensuring inputs stay within expected bounds for models.
48. Get the Positions of Top N Values
Get the positions of the top 5 maximum values in the array a.
Difficulty Level: L2
Solve:
# Task: Find positions of top 5 maximum values import numpy as np np.random.seed(100) a = np.random.uniform(1, 50, 20) # Write your code below
Desired Output:
[18 7 3 10 15]
Why this matters: Finding top-N indices is used in ranking systems, recommendation engines, and selecting the highest-confidence predictions.
49. Compute Row-Wise Counts of All Possible Values
For each row in arr, count the occurrences of every value from 1 to 10.
Difficulty Level: L4
Solve:
# Task: Count occurrences of each value (1-10) per row import numpy as np np.random.seed(100) arr = np.random.randint(1, 11, size=(6, 10)) print(arr) # Write your code below
Desired Output:
[ 1 2 3 4 5 6 7 8 9 10] [[1, 0, 2, 1, 1, 1, 0, 2, 2, 0], [2, 1, 3, 0, 1, 0, 1, 0, 1, 1], [0, 3, 0, 2, 3, 1, 0, 1, 0, 0], [1, 0, 2, 1, 0, 1, 0, 2, 1, 2], [2, 2, 2, 0, 0, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]
Why this matters: Row-wise value counting is used in bag-of-words representations and histogram-based feature extraction.
50. Flatten an Array of Arrays into a 1D Array
Convert array_of_arrays (which contains arr1, arr2, and arr3) into a single flat 1D array.
Difficulty Level: L2
Solve:
# Task: Flatten an array of arrays into a 1d array import numpy as np arr1 = np.arange(3) arr2 = np.arange(3, 7) arr3 = np.arange(7, 10) array_of_arrays = np.array([arr1, arr2, arr3], dtype=object) print(array_of_arrays) # Write your code below
Desired Output:
[0 1 2 3 4 5 6 7 8 9]
Why this matters: Flattening nested array structures is a routine step when combining data from multiple sources into a single feature vector.
51. Generate One-Hot Encodings for an Array
Compute one-hot encodings (dummy binary variables) for each unique value in arr.
Difficulty Level: L4
Solve:
# Task: Create one-hot encodings for the array import numpy as np np.random.seed(101) arr = np.random.randint(1, 4, size=6) print(arr) # Write your code below
Desired Output:
array([[ 0., 1., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 1., 0.],
[ 0., 1., 0.],
[ 1., 0., 0.]])Why this matters: One-hot encoding converts categorical integers into binary vectors, which is required by most ML algorithms that expect numeric input.
52. Create Row Numbers Grouped by a Categorical Variable
Assign within-group row numbers to each element in species_small, restarting the count at 0 for each new species.
Difficulty Level: L3
Solve:
# Task: Create row numbers grouped by species import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4) np.random.seed(100) species_small = np.sort(np.random.choice(species, size=20)) print(species_small) # Write your code below
Desired Output:
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5]
Why this matters: Grouped row numbering is useful for creating sequence features and ranking within categories in data analysis.
53. Create Group IDs from a Categorical Variable
Assign a numeric group ID (0, 1, 2, …) to each element in species_small based on its unique species category.
Difficulty Level: L4
Solve:
# Task: Create group ids for each species category import numpy as np url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4) np.random.seed(100) species_small = np.sort(np.random.choice(species, size=20)) print(species_small) # Write your code below
Desired Output:
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]
Why this matters: Converting categorical labels to integer group IDs is a standard preprocessing step for label encoding in ML pipelines.
54. Rank Items in a 1D Array
Create the ranks for each value in the numeric array a, where rank 0 is the smallest value.
Difficulty Level: L2
Solve:
# Task: Create ranks for the array values import numpy as np np.random.seed(10) a = np.random.randint(20, size=10) print(a) # Write your code below
Desired Output:
[4 2 6 0 8 7 9 3 5 1]
Why this matters: Ranking transforms are used in non-parametric statistics and for creating ordinal features from numeric data.
55. Rank Items in a Multidimensional Array
Create a rank array of the same shape as the 2D array a, where ranking is applied globally across all elements.
Difficulty Level: L3
Solve:
# Task: Create ranks for a 2D array (global ranking) import numpy as np np.random.seed(10) a = np.random.randint(20, size=[2, 5]) print(a) # Write your code below
Desired Output:
[[4 2 6 0 8] [7 9 3 5 1]]
Why this matters: Global ranking across a matrix is useful for percentile-based normalization and creating competition-style leaderboards.
56. Find the Maximum Value in Each Row
Compute the maximum value for each row in the 2D array a.
Difficulty Level: L2
Solve:
# Task: Find maximum value in each row import numpy as np np.random.seed(100) a = np.random.randint(1, 10, [5, 3]) print(a) # Write your code below
Desired Output:
array([9, 8, 6, 3, 9])
Why this matters: Row-wise max is used in neural network output processing (e.g., selecting the predicted class) and feature aggregation.
57. Compute the Min-by-Max Ratio for Each Row
Compute the ratio of the minimum to the maximum value for each row in the 2D array a.
Difficulty Level: L3
Solve:
# Task: Compute min/max ratio for each row import numpy as np np.random.seed(100) a = np.random.randint(1, 10, [5, 3]) print(a) # Write your code below
Desired Output:
array([ 0.44444444, 0.125 , 0.5 , 1. , 0.11111111])
Why this matters: Min/max ratios help measure the spread within each observation, which is useful for anomaly detection and data quality checks.
58. Find Duplicate Records in an Array
Mark duplicate entries in array a as True (2nd occurrence onwards), with first occurrences marked as False.
Difficulty Level: L3
Solve:
# Task: Find duplicate entries (mark 2nd+ occurrences as True)
import numpy as np
np.random.seed(100)
a = np.random.randint(0, 5, 10)
print('Array: ', a)
# Write your code below
Desired Output:
[False True False True False False True True True True]
Why this matters: Identifying duplicates is a critical data cleaning step to prevent data leakage and inflated counts in analysis.
59. Find the Grouped Mean
Compute the mean of sepalwidth (2nd column) grouped by species (5th column) in the iris dataset.
Difficulty Level: L3
Solve:
# Task: Find mean sepalwidth grouped by species
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
# Write your code below
Desired Output:
[[b'Iris-setosa', 3.418], [b'Iris-versicolor', 2.770], [b'Iris-virginica', 2.974]]
Why this matters: Grouped aggregation is a fundamental operation in exploratory data analysis and feature engineering for structured data.
60. Convert a PIL Image to a NumPy Array
Import the image from the given URL and convert it to a numpy array, printing its shape.
Difficulty Level: L3
Solve:
# Task: Import an image from URL and convert to numpy array import numpy as np from io import BytesIO from PIL import Image import PIL, requests URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg' # Write your code below
Desired Output:
A numpy array representation of the image with shape (height, width, 3)
Why this matters: Converting images to numpy arrays is the first step in any image processing or computer vision pipeline.
61. Drop All Missing Values from a 1D Array
Remove all nan values from the 1D array a and return only the valid elements.
Difficulty Level: L2
Solve:
# Task: Drop all nan values from a 1D array import numpy as np a = np.array([1, 2, 3, np.nan, 5, 6, 7, np.nan]) print(a) # Write your code below
Desired Output:
array([ 1., 2., 3., 5., 6., 7.])
Why this matters: Stripping NaN values from arrays is necessary before computing statistics or passing data to functions that do not handle missing values.
62. Compute the Euclidean Distance Between Two Arrays
Compute the Euclidean distance between arrays a and b.
Difficulty Level: L3
Solve:
# Task: Compute euclidean distance between two arrays import numpy as np a = np.array([1, 2, 3, 4, 5]) b = np.array([4, 5, 6, 7, 8]) # Write your code below
Desired Output:
6.7082039324993694
Why this matters: Euclidean distance is the foundation of KNN, K-means clustering, and many similarity-based ML algorithms.
63. Find All Local Maxima (Peaks) in a 1D Array
Find the positions of all peaks in array a, where peaks are values surrounded by smaller values on both sides.
Difficulty Level: L4
Solve:
# Task: Find positions of all peaks (local maxima) in the array import numpy as np a = np.array([1, 3, 7, 1, 2, 6, 0, 1]) # Write your code below
Desired Output:
array([2, 5])
Why this matters: Peak detection is widely used in signal processing, time-series analysis, and identifying turning points in financial data.
64. Subtract a 1D Array from a 2D Array Row-Wise
Subtract each element of b_1d from the corresponding row of a_2d, so that b_1d[0] is subtracted from all elements in row 0, b_1d[1] from row 1, and so on.
Difficulty Level: L2
Solve:
# Task: Subtract 1d array from 2d array row-wise import numpy as np a_2d = np.array([[3, 3, 3], [4, 4, 4], [5, 5, 5]]) b_1d = np.array([1, 2, 3]) # Write your code below
Desired Output:
[[2 2 2] [2 2 2] [2 2 2]]
Why this matters: Row-wise subtraction using broadcasting is essential for centering data and computing deviations from row-level baselines.
65. Find the Index of the N’th Repetition of a Value
Find the index of the 5th repetition of the value 1 in array x.
Difficulty Level: L2
Solve:
# Task: Find the index of the 5th occurrence of value 1 import numpy as np x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2]) # Write your code below
Desired Output:
8
Why this matters: Locating the n’th occurrence of a value is useful for event-based analysis and pattern matching in sequential data.
66. Convert NumPy datetime64 to Python datetime
Convert the numpy datetime64 object dt64 to a Python datetime.datetime object.
Difficulty Level: L2
Solve:
# Task: Convert numpy datetime64 to Python datetime
import numpy as np
dt64 = np.datetime64('2018-02-25 22:10:10')
print(dt64)
# Write your code below
Desired Output:
datetime.datetime(2018, 2, 25, 22, 10, 10)
Why this matters: Converting between NumPy and Python datetime types is necessary when interfacing with libraries that only accept native Python datetime objects.
67. Compute the Moving Average of an Array
Compute the moving average of array Z with a window size of 3.
Difficulty Level: L3
Solve:
# Task: Compute moving average with window size 3 import numpy as np np.random.seed(100) Z = np.random.randint(10, size=10) print(Z) # Write your code below
Desired Output:
array: [8 8 3 7 7 0 4 2 5 2] moving average: [ 6.33 6. 5.67 4.67 3.67 2. 3.67 3. ]
Why this matters: Moving averages smooth noisy data and are widely used in time-series forecasting, signal processing, and financial analysis.
68. Create an Array Sequence Given Start, Length, and Step
Create a numpy array of length 10, starting from 5, with a step of 3 between consecutive numbers using the variables start, length, and step.
Difficulty Level: L2
Solve:
# Task: Create array sequence with start=5, length=10, step=3 import numpy as np length = 10 start = 5 step = 3 # Write your code below
Desired Output:
array([ 5, 8, 11, 14, 17, 20, 23, 26, 29, 32])
Why this matters: Generating custom sequences is used for creating feature grids, time steps, and index ranges in scientific computing.
69. Fill in Missing Dates in an Irregular Date Series
Given the array dates containing every other day, fill in the missing dates to create a continuous daily sequence.
Difficulty Level: L3
Solve:
# Task: Fill in missing dates to make a continuous date sequence
import numpy as np
dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2)
print(dates)
# Write your code below
Desired Output:
['2018-02-01' '2018-02-02' '2018-02-03' '2018-02-04' '2018-02-05' '2018-02-06' '2018-02-07' '2018-02-08' '2018-02-09' '2018-02-10' '2018-02-11' '2018-02-12' '2018-02-13' '2018-02-14' '2018-02-15' '2018-02-16' '2018-02-17' '2018-02-18' '2018-02-19' '2018-02-20' '2018-02-21' '2018-02-22' '2018-02-23']
Why this matters: Filling gaps in date sequences is a standard step in time-series preprocessing to ensure regular intervals for forecasting models.
70. Create Strides from a 1D Array
From the array arr, generate a 2D matrix using strides with a window length of 4 and stride of 2, producing rows like [0,1,2,3], [2,3,4,5], [4,5,6,7]....
Difficulty Level: L4
Solve:
# Task: Create a 2D matrix of strides with window length 4 and stride 2 import numpy as np arr = np.arange(15) print(arr) # Write your code below
Desired Output:
[[ 0 1 2 3] [ 2 3 4 5] [ 4 5 6 7] [ 6 7 8 9] [ 8 9 10 11] [10 11 12 13]]
Why this matters: Sliding window views via strides are used in convolutional neural networks, rolling statistics, and efficient time-series feature extraction.
71. How to create an array using a function of its row and column indices?
Create a 4×5 array where each element equals the sum of its row and column index using np.fromfunction.
Difficulty Level: L1
Solve:
# Task: Create a 4x5 array where element (i,j) = i + j using np.fromfunction import numpy as np # Write your code below
Desired Output:
[[0 1 2 3 4] [1 2 3 4 5] [2 3 4 5 6] [3 4 5 6 7]]
Why this matters: This is handy when the value at each array position follows a mathematical formula based on its coordinates.
72. How does array broadcasting work?
Add the column vector a (shape 3×1) and row vector b (shape 3,) using broadcasting to produce a 3×3 result.
Difficulty Level: L1
Solve:
# Task: Add a column vector and row vector using broadcasting import numpy as np a = np.array([[1], [2], [3]]) # shape (3, 1) b = np.array([10, 20, 30]) # shape (3,) # Write your code below
Desired Output:
[[11 21 31] [12 22 32] [13 23 33]]
Why this matters: Broadcasting lets NumPy do math on arrays of different shapes without making copies, which is fundamental to efficient vectorized computation.
73. How to create coordinate grids with np.meshgrid?
Create coordinate matrices X and Y from arrays x and y using np.meshgrid, and print both.
Difficulty Level: L1
Solve:
# Task: Create coordinate matrices X and Y from x and y arrays import numpy as np x = np.array([1, 2, 3]) y = np.array([4, 5]) # Write your code below
Desired Output:
X: [[1 2 3] [1 2 3]] Y: [[4 4 4] [5 5 5]]
Why this matters: Coordinate grids are essential for evaluating functions over a 2D plane, such as plotting 3D surfaces or computing heatmaps.
74. How to standardize columns of a 2D array (zero mean, unit variance)?
Standardize each column of the random array a so that each column has mean 0 and standard deviation 1, then verify by printing the resulting means and stds.
Difficulty Level: L2
Solve:
# Task: Standardize each column of a 2D array to mean=0, std=1
import numpy as np
np.random.seed(42)
a = np.random.randint(1, 100, size=(5, 3))
print('Original:')
print(a)
# Write your code below
Desired Output:
Means after standardization: [-0. 0. 0.] Stds after standardization: [1. 1. 1.]
Why this matters: Standardization is a critical preprocessing step in ML — features on different scales (e.g., age vs salary) can distort model training if not standardized.
75. How to create and use structured arrays?
Create a structured array with fields name (string), age (int), and weight (float) containing three records, then extract and print the name field.
Difficulty Level: L2
Solve:
# Task: Create a structured array with name, age, weight fields import numpy as np # Write your code below # Create the dtype and the structured array, then extract names
Desired Output:
[('Alice', 25, 55.5) ('Bob', 30, 75. ) ('Charlie', 35, 65.2)]
Names: ['Alice' 'Bob' 'Charlie']Why this matters: Structured arrays let you store heterogeneous data types in a single NumPy array, serving as a lightweight alternative to pandas DataFrames.
76. How to use fancy indexing to select specific elements?
Select elements at positions (0,1), (1,3), and (3,4) from the 4×5 array a using fancy indexing with row and column index arrays.
Difficulty Level: L1
Solve:
# Task: Select elements at (0,1), (1,3), (3,4) using fancy indexing import numpy as np a = np.arange(20).reshape(4, 5) print(a) # Write your code below
Desired Output:
[ 1 8 19]
Why this matters: Fancy indexing lets you grab arbitrary combinations of elements from an array, unlike regular slicing which only gives contiguous blocks.
77. How to compute cumulative sum and cumulative product?
Compute and print the cumulative sum and cumulative product of the array a = [1, 2, 3, 4, 5] using np.cumsum and np.cumprod.
Difficulty Level: L1
Solve:
# Task: Compute cumulative sum and cumulative product import numpy as np a = np.array([1, 2, 3, 4, 5]) # Write your code below
Desired Output:
Cumulative sum: [ 1 3 6 10 15] Cumulative product: [ 1 2 6 24 120]
Why this matters: Running totals and cumulative products are essential for time series analysis, computing factorials, and tracking cumulative returns.
78. How to compute cumulative sum along a specific axis of a 2D array?
Compute the cumulative sum of the 2D array a along axis=0 (down columns) and axis=1 (across rows), and print both results.
Difficulty Level: L2
Solve:
# Task: Compute cumulative sum along axis=0 and axis=1
import numpy as np
np.random.seed(42)
a = np.random.randint(1, 10, size=(3, 4))
print('Array:')
print(a)
# Write your code below
Desired Output:
Cumsum along rows (axis=0): [[ 7 4 8 5] [14 7 15 13] [19 11 23 21]] Cumsum along columns (axis=1): [[ 7 11 19 24] [ 7 10 17 25] [ 5 9 17 25]]
Why this matters: Axis-specific cumulative sums are important when processing tabular data where rows and columns represent different dimensions, such as cumulative sales by month and region.
79. How to compute a dot product using np.einsum?
Compute the dot product of arrays a = [1, 2, 3] and b = [4, 5, 6] using np.einsum with the subscript string 'i,i->'.
Difficulty Level: L2
Solve:
# Task: Compute the dot product using np.einsum import numpy as np a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) # Write your code below # Hint: the einsum subscript string for dot product is 'i,i->'
Desired Output:
32
Why this matters: `np.einsum` is a powerful one-liner for expressing array operations using index notation, and mastering it replaces dozens of specialized NumPy calls.
80. How to multiply two matrices using np.einsum?
Multiply matrices A and B using np.einsum with the subscript string 'ij,jk->ik' and print the result.
Difficulty Level: L2
Solve:
# Task: Matrix multiplication using np.einsum import numpy as np A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) # Write your code below # Hint: the einsum subscript string for matrix multiply is 'ij,jk->ik'
Desired Output:
[[19 22] [43 50]]
Why this matters: Understanding einsum for matrix multiplication is the gateway to expressing far more complex tensor operations (batch matmul, traces, contractions) in one readable line.
81. How to apply a custom function to every row or column of an array?
Compute the range (max minus min) of each row in array a using np.apply_along_axis with axis=1.
Difficulty Level: L2
Solve:
# Task: Compute range (max - min) of each row using np.apply_along_axis
import numpy as np
np.random.seed(50)
a = np.random.randint(1, 20, size=(4, 5))
print('Array:')
print(a)
# Write your code below
Desired Output:
Range per row: [16 9 13 14]
Why this matters: `np.apply_along_axis` lets you run any custom summary function across rows or columns when NumPy doesn’t provide a built-in for it.
82. How to get the rank of each element in an array?
Get the rank (sorted position) of each element in array a using the double argsort trick: a.argsort().argsort().
Difficulty Level: L3
Solve:
# Task: Get the rank of each element using double argsort
import numpy as np
np.random.seed(10)
a = np.random.randint(20, size=10)
print('Array:', a)
# Write your code below
Desired Output:
Ranks: [4 2 6 0 8 7 9 3 5 1]
Why this matters: Ranking is used in non-parametric statistics, percentile calculations, and rank-based feature engineering for ML models.
83. How to compute matrix determinant and inverse?
Compute and print the determinant and inverse of matrix A = [[1, 2], [3, 4]] using np.linalg.det and np.linalg.inv.
Difficulty Level: L2
Solve:
# Task: Compute the determinant and inverse of matrix A import numpy as np A = np.array([[1, 2], [3, 4]]) # Write your code below
Desired Output:
Determinant: -2.0 Inverse: [[-2. 1. ] [ 1.5 -0.5]]
Why this matters: Determinants and matrix inverses are fundamental in linear regression, solving linear systems, and many ML algorithms like Gaussian processes.
84. How to solve a system of linear equations with NumPy?
Solve the system 3x + y = 9, x + 2y = 8 by passing the coefficient matrix A and constants vector b to np.linalg.solve.
Difficulty Level: L2
Solve:
# Task: Solve the system 3x + y = 9, x + 2y = 8 import numpy as np # Coefficient matrix and constants vector A = np.array([[3, 1], [1, 2]]) b = np.array([9, 8]) # Write your code below
Desired Output:
Solution [x, y]: [2. 3.]
Why this matters: Solving `Ax = b` underpins linear regression, circuit analysis, and many optimization problems in engineering and data science.
85. How to compute eigenvalues and eigenvectors?
Compute and print the eigenvalues and eigenvectors of matrix A = [[4, -2], [1, 1]] using np.linalg.eig.
Difficulty Level: L3
Solve:
# Task: Compute eigenvalues and eigenvectors of matrix A import numpy as np A = np.array([[4, -2], [1, 1]]) # Write your code below
Desired Output:
Eigenvalues: [3. 2.] Eigenvectors: [[ 0.89442719 0.70710678] [ 0.4472136 0.70710678]]
Why this matters: Eigendecomposition is the backbone of PCA — it reveals the directions of maximum variance in your data and how much variance each direction explains.
86. How to fit a polynomial curve to noisy data?
Fit a degree-2 polynomial to the noisy data arrays x and y using np.polyfit, then evaluate it with np.polyval and print the fitted coefficients and first 5 predicted values.
Difficulty Level: L3
Solve:
# Task: Fit a quadratic polynomial to noisy data and recover coefficients import numpy as np np.random.seed(42) x = np.linspace(0, 5, 10) y = 2*x**2 - 3*x + 1 + np.random.randn(10)*0.5 # Write your code below # Fit a degree-2 polynomial and print the coefficients
Desired Output:
Fitted coefficients [a, b, c]: [ 1.98 -2.89 1.15] First 5 predicted values: [1.15 0.16 0.38 1.83 4.49]
Why this matters: Polynomial curve fitting is used for trend lines, simple regression, and approximating nonlinear relationships in data.
87. How to select rows with a boolean mask and specific columns at the same time?
Select rows from array a where mask is True, and take only the first 3 columns, using combined boolean and slice indexing: a[mask, :3].
Difficulty Level: L2
Solve:
# Task: Select rows by boolean mask, then first 3 columns
import numpy as np
a = np.arange(20).reshape(4, 5)
mask = np.array([True, False, True, False])
print('Array:')
print(a)
# Write your code below
Desired Output:
[[ 0 1 2] [10 11 12]]
Why this matters: Combining boolean row filters with column slicing is a common pattern when filtering data by a condition and extracting specific features in one step.
88. How to compute the outer product of two vectors?
Compute the outer product of arrays a = [1, 2, 3] and b = [4, 5, 6] using np.multiply.outer and print the resulting 3×3 matrix.
Difficulty Level: L1
Solve:
# Task: Compute outer product using np.multiply.outer import numpy as np a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) # Write your code below
Desired Output:
[[ 4 5 6] [ 8 10 12] [12 15 18]]
Why this matters: Outer products appear in attention mechanisms, covariance matrix computation, and rank-1 matrix updates.
89. How to compute a rolling/moving average of a 1D array?
Compute a 3-element moving average of array a using np.convolve with a uniform kernel of size 3 and mode='valid'.
Difficulty Level: L3
Solve:
# Task: Compute a 3-element moving average of a 1D array
import numpy as np
np.random.seed(42)
a = np.random.randint(1, 20, size=10)
print('Array:', a)
# Write your code below
# Compute a moving average with window size 3
Desired Output:
Moving average (window=3): [ 7.67 5.33 7.33 8.67 8.33 10.67 13. 12. ]
Why this matters: Moving averages smooth noisy data and are widely used in time series analysis, stock price charts, and signal processing.
90. How to rotate a 2D array (like an image)?
Rotate the 3×4 array img by 90 degrees counter-clockwise using np.rot90 and print the result.
Difficulty Level: L1
Solve:
# Task: Rotate a 3x4 array 90 degrees counter-clockwise
import numpy as np
img = np.arange(12).reshape(3, 4)
print('Original:')
print(img)
# Write your code below
Desired Output:
Rotated 90 degrees: [[ 3 7 11] [ 2 6 10] [ 1 5 9] [ 0 4 8]]
Why this matters: Array rotation is a common operation in image processing and data augmentation for training CNNs.
91. How to flip an array horizontally and vertically?
Flip the 3×4 array img left-to-right using np.fliplr and top-to-bottom using np.flipud, printing both results.
Difficulty Level: L1
Solve:
# Task: Flip a 3x4 array horizontally and vertically
import numpy as np
img = np.arange(12).reshape(3, 4)
print('Original:')
print(img)
# Write your code below
Desired Output:
Flipped left-right: [[ 3 2 1 0] [ 7 6 5 4] [11 10 9 8]] Flipped up-down: [[ 8 9 10 11] [ 4 5 6 7] [ 0 1 2 3]]
Why this matters: Flipping is one of the simplest and most effective data augmentation techniques for increasing training data variety in deep learning.
92. How to pad an array with zeros on all sides?
Pad the 3×3 array of ones a with one layer of zeros on all sides using np.pad with mode='constant'.
Difficulty Level: L2
Solve:
# Task: Pad a 3x3 array of ones with zeros on all sides import numpy as np a = np.ones((3, 3)) # Write your code below
Desired Output:
[[0. 0. 0. 0. 0.] [0. 1. 1. 1. 0.] [0. 1. 1. 1. 0.] [0. 1. 1. 1. 0.] [0. 0. 0. 0. 0.]]
Why this matters: Zero-padding is essential in convolution operations (CNNs use it to control output size) and when aligning arrays of different sizes.
93. How to create sliding windows over an array?
Create sliding windows of size 4 over the array a = np.arange(10) using sliding_window_view and print the resulting 2D array of windows.
Difficulty Level: L3
Solve:
# Task: Create sliding windows of size 4 import numpy as np from numpy.lib.stride_tricks import sliding_window_view a = np.arange(10) # Write your code below
Desired Output:
[[0 1 2 3] [1 2 3 4] [2 3 4 5] [3 4 5 6] [4 5 6 7] [5 6 7 8] [6 7 8 9]]
Why this matters: Sliding windows are used for computing rolling statistics, feature engineering on time series, and understanding how 1D convolution works under the hood.
94. How to compute min-max normalization to scale values between 0 and 1?
Min-max normalize each column of the 2D array a to the range [0, 1] by subtracting the column min and dividing by the column range, then verify the min and max of each column.
Difficulty Level: L2
Solve:
# Task: Min-max normalize each column of a 2D array to [0, 1]
import numpy as np
np.random.seed(42)
a = np.random.randint(1, 100, size=(5, 3))
print('Original:')
print(a)
# Write your code below
Desired Output:
Min after normalization: [0. 0. 0.] Max after normalization: [1. 1. 1.]
Why this matters: Min-max normalization is commonly used in neural networks and distance-based algorithms like KNN that are sensitive to feature magnitudes.
95. How to compute a weighted average?
Compute the weighted average of scores using weights with np.average, and compare it against the simple mean from np.mean.
Difficulty Level: L2
Solve:
# Task: Compute the weighted average of exam scores import numpy as np scores = np.array([85, 90, 78, 92, 88]) weights = np.array([0.1, 0.2, 0.3, 0.25, 0.15]) # must sum to 1 # Write your code below
Desired Output:
Weighted average: 85.9 Simple average: 86.6
Why this matters: Weighted averages assign different importance to values and are used in portfolio returns, ensemble model predictions, and survey analysis.
96. How to find the top-k similar items using cosine similarity?
Compute the cosine similarity between the query vector and each row in items using dot products and norms, then find the index of the most similar item with np.argmax.
Difficulty Level: L3
Solve:
# Task: Compute cosine similarity between a query vector and 5 item vectors
import numpy as np
np.random.seed(42)
items = np.random.rand(5, 4) # 5 items, 4 features each
query = np.random.rand(4) # query vector
print('Items shape:', items.shape)
print('Query:', np.round(query, 3))
# Write your code below
# Find the most similar item to the query
Desired Output:
Cosine similarities: [0.896 0.928 0.811 0.852 0.866] Most similar item index: 1
Why this matters: Cosine similarity is the standard metric for comparing embeddings in NLP, recommendation systems, and information retrieval.
97. How to compute the covariance matrix of a dataset?
Compute the covariance matrix of the 3-feature data array using np.cov(data.T), then print the covariance between feature 0 and feature 1 (positive) and between feature 0 and feature 2 (negative).
Difficulty Level: L3
Solve:
# Task: Compute the covariance matrix of a dataset with 3 features import numpy as np np.random.seed(42) # Simulate 100 samples with 3 correlated features data = np.random.randn(100, 3) data[:, 1] = data[:, 0] * 2 + np.random.randn(100) * 0.5 # feature 1 correlated with feature 0 data[:, 2] = -data[:, 0] + np.random.randn(100) * 0.3 # feature 2 negatively correlated # Write your code below
Desired Output:
Covariance matrix shape: (3, 3) Feature 0-1 covariance (should be positive): 1.88 Feature 0-2 covariance (should be negative): -0.96
Why this matters: The covariance matrix reveals how features co-vary and is the starting point for PCA, Mahalanobis distance, and multivariate statistics.
98. How to perform one-step PCA using eigendecomposition?
Reduce the 4-feature data array to 2 principal components by centering the data, computing the covariance matrix, performing eigendecomposition with np.linalg.eig, selecting the top-2 eigenvectors, and projecting the data.
Difficulty Level: L4
Solve:
# Task: Reduce a 4-feature dataset to 2 principal components using PCA import numpy as np np.random.seed(42) # 50 samples, 4 features data = np.random.randn(50, 4) data[:, 1] = data[:, 0] * 1.5 + np.random.randn(50) * 0.3 data[:, 3] = data[:, 2] * -0.8 + np.random.randn(50) * 0.2 # Write your code below # 1. Center the data # 2. Compute covariance matrix # 3. Get eigenvalues/eigenvectors # 4. Project data onto top-2 components
Desired Output:
Original shape: (50, 4) Projected shape: (50, 2) Variance explained by top 2 components: 89.2%
Why this matters: PCA via eigendecomposition is the foundational dimensionality reduction technique in ML, and implementing it from scratch builds deep understanding of how it works.
99. How to compute 2D sliding windows over a matrix?
Create 3×3 sliding windows over the 5×5 array a using sliding_window_view(a, (3, 3)), then print the shape of the result and the first window (top-left corner).
Difficulty Level: L4
Solve:
# Task: Create 3x3 sliding windows over a 5x5 array
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view
a = np.arange(25).reshape(5, 5)
print('Array:')
print(a)
# Write your code below
# Print the shape of the result and the first window (top-left corner)
Desired Output:
Windows shape: (3, 3, 3, 3) First window (top-left): [[ 0 1 2] [ 5 6 7] [10 11 12]]
Why this matters: 2D sliding windows are exactly what a convolution layer does in a CNN — understanding this builds intuition for how neural network filters work.
100. How to do batch matrix multiplication with np.einsum?
Multiply each pair of matrices in batches A (shape 3x2x3) and B (shape 3x3x2) using np.einsum('bij,bjk->bik', A, B) so that result[i] = A[i] @ B[i].
Difficulty Level: L4
Solve:
# Task: Batch matrix multiplication using np.einsum
import numpy as np
np.random.seed(42)
A = np.random.randint(1, 5, size=(3, 2, 3)) # 3 matrices of shape 2x3
B = np.random.randint(1, 5, size=(3, 3, 2)) # 3 matrices of shape 3x2
print('A shape:', A.shape)
print('B shape:', B.shape)
# Write your code below
# Multiply each pair: result[i] = A[i] @ B[i]
Desired Output:
Result shape: (3, 2, 2) Result: [[[22 17] [26 28]] [[18 16] [26 22]] [[23 13] [25 13]]]
Why this matters: Batch matrix multiplication is essential in deep learning for computing operations like attention scores across all samples in a batch simultaneously.
101. How to compute a pairwise Euclidean distance matrix without loops?
Compute the 5×5 pairwise Euclidean distance matrix for the points array (5 points in 3D) using broadcasting and np.einsum, without any Python loops.
Difficulty Level: L4
Solve:
# Task: Compute pairwise Euclidean distance matrix (no loops)
import numpy as np
np.random.seed(42)
points = np.random.rand(5, 3) # 5 points in 3D
print('Points:')
print(np.round(points, 4))
# Write your code below
# Compute the 5x5 distance matrix using broadcasting
Desired Output:
Distance matrix: [[0. 1.0068 0.3527 1.0164 1.0284] [1.0068 0. 0.9973 0.8323 0.2419] [0.3527 0.9973 0. 1.1285 1.0968] [1.0164 0.8323 1.1285 0. 0.8206] [1.0284 0.2419 1.0968 0.8206 0. ]]
Why this matters: Pairwise distance matrices are the foundation of KNN, hierarchical clustering, and DBSCAN, and computing them vectorized is critical for performance.
Free Course
Master Core Python — Your First Step into AI/ML
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →Trusted by 50,000+ learners
Related Course
Master Python — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
