101 NumPy Exercises for Data Analysis (Python)

The goal of the numpy exercises is to serve as a reference as well as to get you to apply numpy beyond the basics. The questions are of 4 levels of difficulties with L1 being the easiest to L4 being the hardest.

Written by Selva Prabhakaran | 57 min read

101 Numpy Exercises for Data Analysis. Photo by Ana Justin Luebke.

If you want a quick refresher on numpy, the following tutorial is best:
Numpy Tutorial Part 1: Introduction
Numpy Tutorial Part 2: Advanced numpy tutorials.

Related Post:
101 Practice exercises with pandas.

This is an interactive version — you can edit and run every code block directly in your browser. No installation needed. All code runs locally in your browser and nothing is sent to any server.

Click ‘Run’ or press Ctrl+Enter on any code block to execute it. The first run may take a few seconds to initialize.

1. Import NumPy and Check the Version

Import numpy as np and print the version number.

Difficulty Level: L1

Solve:

# Task: Import numpy as np and print the version number
import numpy as np

# Write your code below

Desired Output:

1.13.3

Why this matters: Verifying the NumPy version ensures compatibility with your code and helps debug version-specific behavior differences.

Show Solution

import numpy as np
print(np.__version__)

2. Create a 1D Array

Create a 1D array of numbers from 0 to 9.

Difficulty Level: L1

Solve:

# Task: Create a 1D array of numbers from 0 to 9
import numpy as np

# Write your code below

Desired Output:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Why this matters: Creating sequential arrays is the starting point for indexing exercises, test data generation, and understanding how NumPy stores data in memory.

Show Solution

import numpy as np
arr = np.arange(10)
print(arr)

3. Create a Boolean Array

Create a 3×3 numpy array of all True values.

Difficulty Level: L1

Solve:

# Task: Create a 3x3 numpy array of all True's
import numpy as np

# Write your code below

Desired Output:

[[ True  True  True]
 [ True  True  True]
 [ True  True  True]]

Why this matters: Boolean arrays are used as masks for filtering data, and understanding how to create them is essential for conditional selection in data analysis.

Show Solution

import numpy as np
result = np.full((3, 3), True, dtype=bool)
print(result)

# Alternate method:
# np.ones((3,3), dtype=bool)

4. Extract Items That Satisfy a Condition

Extract all odd numbers from arr.

Difficulty Level: L1

Solve:

# Task: Extract all odd numbers from arr
import numpy as np
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Write your code below

Desired Output:

array([1, 3, 5, 7, 9])

Why this matters: Boolean indexing is the primary way to filter rows in data cleaning and subsetting datasets before analysis.

Show Solution

import numpy as np
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

result = arr[arr % 2 == 1]
print(result)

5. Replace Items That Satisfy a Condition

Replace all odd numbers in arr with -1.

Difficulty Level: L1

Solve:

# Task: Replace all odd numbers in arr with -1
import numpy as np
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Write your code below

Desired Output:

array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

Why this matters: Conditional replacement is used to handle outliers, recode categories, and clean invalid entries in datasets.

Show Solution

import numpy as np
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr[arr % 2 == 1] = -1
print(arr)

6. Replace Items Without Affecting the Original Array

Replace all odd numbers in arr with -1 without modifying the original arr.

Difficulty Level: L2

Solve:

# Task: Replace all odd numbers with -1 without changing the original arr
import numpy as np
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Write your code below

Desired Output:

out: [ 0 -1  2 -1  4 -1  6 -1  8 -1]
arr: [0 1 2 3 4 5 6 7 8 9]

Why this matters: Preserving the original data while creating transformed copies is critical in ML pipelines where you need both raw and processed versions of features.

Show Solution

import numpy as np
arr = np.arange(10)

out = np.where(arr % 2 == 1, -1, arr)
print("out:", out)
print("arr:", arr)

7. Reshape an Array

Convert the 1D array arr to a 2D array with 2 rows.

Difficulty Level: L1

Solve:

# Task: Convert a 1D array to a 2D array with 2 rows
import numpy as np
arr = np.arange(10)

# Write your code below

Desired Output:

[[0 1 2 3 4]
 [5 6 7 8 9]]

Why this matters: Reshaping is one of the most common operations when preparing data for ML models, which expect inputs in specific shapes (e.g., batches of samples).

Show Solution

import numpy as np
arr = np.arange(10)

result = arr.reshape(2, -1)  # Setting to -1 automatically decides the number of cols
print(result)

8. Stack Two Arrays Vertically

Stack arrays a and b vertically.

Difficulty Level: L2

Solve:

# Task: Stack arrays a and b vertically
import numpy as np
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)

# Write your code below

Desired Output:

[[0 1 2 3 4]
 [5 6 7 8 9]
 [1 1 1 1 1]
 [1 1 1 1 1]]

Why this matters: Vertical stacking is used to combine datasets, append new observations to existing data, or merge training and validation sets.

Show Solution

import numpy as np
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)

# Method 1:
result = np.concatenate([a, b], axis=0)
print(result)

# Method 2:
# np.vstack([a, b])

# Method 3:
# np.r_[a, b]

9. Stack Two Arrays Horizontally

Stack arrays a and b horizontally.

Difficulty Level: L2

Solve:

# Task: Stack arrays a and b horizontally
import numpy as np
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)

# Write your code below

Desired Output:

[[0 1 2 3 4 1 1 1 1 1]
 [5 6 7 8 9 1 1 1 1 1]]

Why this matters: Horizontal stacking is used to add new features (columns) to a dataset, such as appending engineered features alongside existing ones.

Show Solution

import numpy as np
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)

# Method 1:
result = np.concatenate([a, b], axis=1)
print(result)

# Method 2:
# np.hstack([a, b])

# Method 3:
# np.c_[a, b]

10. Generate Custom Sequences Without Hardcoding

Using only numpy functions and the input array a, produce an array that first repeats each element 3 times, then tiles the whole array 3 times.

Difficulty Level: L2

Solve:

# Task: Create the pattern without hardcoding using numpy functions
import numpy as np
a = np.array([1,2,3])

# Write your code below

Desired Output:

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

Why this matters: Programmatic sequence generation is essential for creating repeated patterns in simulations, data augmentation, and feature engineering.

Show Solution

import numpy as np
a = np.array([1,2,3])

result = np.r_[np.repeat(a, 3), np.tile(a, 3)]
print(result)

11. Get Common Items Between Two Arrays

Find the common items between arrays a and b.

Difficulty Level: L2

Solve:

# Task: Get the common items between a and b
import numpy as np
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

# Write your code below

Desired Output:

[2 4]

Why this matters: Set intersection is used in data cleaning to find shared records, overlapping features, or matching IDs across datasets.

Show Solution

import numpy as np
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

result = np.intersect1d(a, b)
print(result)

12. Remove Items Present in Another Array

From array a, remove all items that are present in array b.

Difficulty Level: L2

Solve:

# Task: From array a remove all items present in array b
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([5,6,7,8,9])

# Write your code below

Desired Output:

[1 2 3 4]

Why this matters: Set difference operations are used to exclude known values, filter out stop words, or remove already-processed records from a pipeline.

Show Solution

import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([5,6,7,8,9])

result = np.setdiff1d(a, b)
print(result)

13. Find Positions Where Two Arrays Match

Get the positions (indices) where elements of a and b are equal.

Difficulty Level: L2

Solve:

# Task: Get the positions where elements of a and b match
import numpy as np
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

# Write your code below

Desired Output:

(array([1, 3, 5, 7]),)

Why this matters: Finding matching positions is used in evaluation metrics (e.g., comparing predicted vs. actual labels) and aligning paired datasets.

Show Solution

import numpy as np
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

result = np.where(a == b)
print(result)

14. Extract Numbers Within a Given Range

From array a, extract all items between 5 and 10 (inclusive).

Difficulty Level: L2

Solve:

# Task: Get all items between 5 and 10 from a
import numpy as np
a = np.array([2, 6, 1, 9, 10, 3, 27])

# Write your code below

Desired Output:

[ 6  9 10]

Why this matters: Range-based filtering is used for outlier removal, selecting data within valid bounds, and binning continuous variables.

Show Solution

import numpy as np
a = np.array([2, 6, 1, 9, 10, 3, 27])

# Method 1
result = a[(a >= 5) & (a <= 10)]
print(result)

# Method 2:
# index = np.where((a >= 5) & (a <= 10))
# print(a[index])

# Method 3:
# index = np.where(np.logical_and(a>=5, a<=10))
# print(a[index])

15. Vectorize a Scalar Function to Work on Arrays

Convert the scalar function maxx into a vectorized version that works element-wise on arrays a and b.

Difficulty Level: L2

Solve:

# Task: Vectorize the maxx function to work on arrays
import numpy as np

def maxx(x, y):
    """Get the maximum of two items"""
    if x >= y:
        return x
    else:
        return y

a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])

# Write your code below

Desired Output:

[6. 7. 9. 8. 9. 7. 5.]

Why this matters: Vectorizing custom functions lets you apply complex business logic across entire arrays without slow Python loops, which is critical for performance in large-scale data processing.

Show Solution

import numpy as np

def maxx(x, y):
    """Get the maximum of two items"""
    if x >= y:
        return x
    else:
        return y

pair_max = np.vectorize(maxx, otypes=[float])

a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])

result = pair_max(a, b)
print(result)

16. Swap Two Columns in a 2D Array

Swap columns 1 and 2 in the array arr.

Difficulty Level: L2

Solve:

# Task: Swap columns 1 and 2 in the array arr
import numpy as np
arr = np.arange(9).reshape(3,3)

# Write your code below

Desired Output:

[[1 0 2]
 [4 3 5]
 [7 6 8]]

Why this matters: Reordering columns is a common step when aligning feature order across datasets or when a model expects features in a specific sequence.

Show Solution

import numpy as np
arr = np.arange(9).reshape(3,3)

result = arr[:, [1,0,2]]
print(result)

17. Swap Two Rows in a 2D Array

Swap rows 1 and 2 in the array arr.

Difficulty Level: L2

Solve:

# Task: Swap rows 1 and 2 in the array arr
import numpy as np
arr = np.arange(9).reshape(3,3)

# Write your code below

Desired Output:

[[3 4 5]
 [0 1 2]
 [6 7 8]]

Why this matters: Row swapping is used in data shuffling, reordering observations, and implementing custom sorting logic for training data.

Show Solution

import numpy as np
arr = np.arange(9).reshape(3,3)

result = arr[[1,0,2], :]
print(result)

18. Reverse the Rows of a 2D Array

Reverse the row order of the 2D array arr.

Difficulty Level: L2

Solve:

# Task: Reverse the rows of a 2D array
import numpy as np
arr = np.arange(9).reshape(3,3)

# Write your code below

Desired Output:

[[6 7 8]
 [3 4 5]
 [0 1 2]]

Why this matters: Reversing rows is used in image flipping for data augmentation and in time series analysis to process data from most recent to oldest.

Show Solution

import numpy as np
arr = np.arange(9).reshape(3,3)

result = arr[::-1]
print(result)

19. Reverse the Columns of a 2D Array

Reverse the column order of the 2D array arr.

Difficulty Level: L2

Solve:

# Task: Reverse the columns of a 2D array
import numpy as np
arr = np.arange(9).reshape(3,3)

# Write your code below

Desired Output:

[[2 1 0]
 [5 4 3]
 [8 7 6]]

Why this matters: Column reversal is used in image mirroring for data augmentation and when reordering features for visualization.

Show Solution

import numpy as np
arr = np.arange(9).reshape(3,3)

result = arr[:, ::-1]
print(result)

20. Create a 2D Array of Random Floats Between 5 and 10

Create a 2D array of shape 5×3 containing random decimal numbers between 5 and 10.

Difficulty Level: L2

Solve:

# Task: Create a 2D array of shape 5x3 with random floats between 5 and 10
import numpy as np

# Write your code below

Desired Output:

[[ 8.501  9.105  6.859]
 [ 9.763  9.877  7.135]
 [ 7.49   8.334  6.168]
 [ 7.75   9.945  5.274]
 [ 8.085  5.562  7.312]]

Why this matters: Generating random arrays within a specific range is used in weight initialization for neural networks and in Monte Carlo simulations.

Show Solution

import numpy as np

# Method 1:
rand_arr = np.random.randint(low=5, high=10, size=(5,3)) + np.random.random((5,3))
# print(rand_arr)

# Method 2:
rand_arr = np.random.uniform(5, 10, size=(5,3))
print(rand_arr)

21. Print Only 3 Decimal Places

Set NumPy print options so that the array rand_arr displays only 3 decimal places.

Difficulty Level: L1

Solve:

# Task: Print only 3 decimal places of the numpy array
import numpy as np
rand_arr = np.random.random((5,3))

# Write your code below

Desired Output:

[[ 0.443  0.109  0.97 ]
 [ 0.388  0.447  0.191]
 [ 0.891  0.474  0.212]
 [ 0.609  0.518  0.403]]

Why this matters: Controlling decimal precision makes array output readable during debugging and prevents noisy floating-point digits from cluttering your analysis.

Show Solution

import numpy as np
rand_arr = np.random.random([5,3])

# Limit to 3 decimal places
np.set_printoptions(precision=3)
print(rand_arr[:4])

22. Suppress Scientific Notation in Print Output

Pretty print rand_arr by suppressing scientific notation (like 1e10).

Difficulty Level: L1

Solve:

# Task: Pretty print by suppressing scientific notation
import numpy as np
np.random.seed(100)
rand_arr = np.random.random([3,3])/1e3
print(rand_arr)

# Write your code below

Desired Output:

[[ 0.000543  0.000278  0.000425]
 [ 0.000845  0.000005  0.000122]
 [ 0.000671  0.000826  0.000137]]

Why this matters: Suppressing scientific notation makes small or large numbers human-readable in reports and logs, especially during exploratory data analysis.

Show Solution

import numpy as np
np.random.seed(100)
rand_arr = np.random.random([3,3])/1e3

np.set_printoptions(suppress=True, precision=6)
print(rand_arr)

23. Limit the Number of Printed Items

Set NumPy print options so that array a displays a maximum of 6 elements, with the rest replaced by ellipsis.

Difficulty Level: L1

Solve:

# Task: Limit the number of items printed to a maximum of 6 elements
import numpy as np
a = np.arange(15)
print(a)

# Write your code below

Desired Output:

[ 0  1  2 ... 12 13 14]

Why this matters: Limiting printed output prevents your console from flooding when working with large arrays, making debugging faster and more manageable.

Show Solution

import numpy as np
np.set_printoptions(threshold=6)
a = np.arange(15)
print(a)

24. Print the Full Array Without Truncating

Print the full numpy array a without truncation, even when the print threshold is set low.

Difficulty Level: L1

Solve:

# Task: Print the full numpy array without truncating
import numpy as np
np.set_printoptions(threshold=6)
a = np.arange(15)
print(a)  # This truncates

# Write your code below to print the full array

Desired Output:

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]

Why this matters: Seeing every element is sometimes necessary for validating data integrity or inspecting small-to-medium arrays during debugging.

Show Solution

import numpy as np
import sys
np.set_printoptions(threshold=6)
a = np.arange(15)

# Solution
np.set_printoptions(threshold=sys.maxsize)
print(a)

25. Import a Dataset With Mixed Types

Import the iris dataset from the URL, keeping the text (species) column intact alongside the numeric columns.

Difficulty Level: L2

Solve:

# Task: Import the iris dataset keeping the text intact
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'

# Write your code below

Desired Output:

[[b'5.1' b'3.5' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.9' b'3.0' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa']]

Why this matters: Real-world datasets almost always mix numeric and categorical data, and loading them correctly is the first step in any data analysis pipeline.

Show Solution

import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Print the first 3 rows
print(iris[:3])

26. Extract a Particular Column From a 1D Structured Array

Extract the text column species (5th field) from the 1D structured array iris_1d.

Difficulty Level: L2

Solve:

# Task: Extract the species column from 1D iris array
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Write your code below

Desired Output:

(150,)
[b'Iris-setosa' b'Iris-setosa' b'Iris-setosa' b'Iris-setosa'
 b'Iris-setosa']

Why this matters: Extracting specific columns from structured arrays is essential for isolating target labels or categorical features before model training.

Show Solution

import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
print(iris_1d.shape)

species = np.array([row[4] for row in iris_1d])
print(species[:5])

27. Convert a 1D Structured Array to a 2D Numeric Array

Convert the 1D structured array iris_1d to a 2D array iris_2d by omitting the species text field and keeping only the four numeric columns.

Difficulty Level: L2

Solve:

# Task: Convert 1D iris to 2D array by omitting species text field
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Write your code below

Desired Output:

[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]]

Why this matters: ML models require pure numeric feature matrices, so stripping text columns and converting to a 2D float array is a routine preprocessing step.

Show Solution

import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Method 1: Convert each row to a list and get the first 4 items
iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
print(iris_2d[:4])

# Alt Method 2: Import only the first 4 columns from source url
# iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
# print(iris_2d[:4])

28. Compute Mean, Median, and Standard Deviation

Find the mean, median, and standard deviation of the sepallength column (1st column) from the iris dataset.

Difficulty Level: L1

Solve:

# Task: Find the mean, median, standard deviation of sepallength
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Write your code below

Desired Output:

5.84333333333 5.8 0.825301291785

Why this matters: These three summary statistics are the foundation of exploratory data analysis and are used to understand data distribution before building any model.

Show Solution

import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

mu, med, sd = np.mean(sepallength), np.median(sepallength), np.std(sepallength)
print(mu, med, sd)

29. Normalize an Array to the 0-1 Range

Normalize the sepallength array so the minimum maps to 0 and the maximum maps to 1.

Difficulty Level: L2

Solve:

# Task: Normalize sepallength to range between 0 and 1
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

# Write your code below

Desired Output:

[ 0.222  0.167  0.111  0.083  0.194  0.306  0.083  0.194  0.028  0.167
  0.306  0.139  0.139  0.     0.417  0.389  0.306  0.222  0.389  0.222
  0.306  0.222  0.083  0.222  0.139  0.194  0.194  0.25   0.25   0.111
  0.139  0.306  0.25   0.333  0.167  0.194  0.333  0.167  0.028  0.222
  0.194  0.056  0.028  0.194  0.222  0.139  0.222  0.083  0.278  0.194
  0.75   0.583  0.722  0.333  0.611  0.389  0.556  0.167  0.639  0.25
  0.194  0.444  0.472  0.5    0.361  0.667  0.361  0.417  0.528  0.361
  0.444  0.5    0.556  0.5    0.583  0.639  0.694  0.667  0.472  0.389
  0.333  0.333  0.417  0.472  0.306  0.472  0.667  0.556  0.361  0.333
  0.333  0.5    0.417  0.194  0.361  0.389  0.389  0.528  0.222  0.389
  0.556  0.417  0.778  0.556  0.611  0.917  0.167  0.833  0.667  0.806
  0.611  0.583  0.694  0.389  0.417  0.583  0.611  0.944  0.944  0.472
  0.722  0.361  0.944  0.556  0.667  0.806  0.528  0.5    0.583  0.806
  0.861  1.     0.583  0.556  0.5    0.944  0.556  0.583  0.472  0.722
  0.667  0.722  0.417  0.694  0.667  0.667  0.556  0.611  0.528  0.444]

Why this matters: Min-max normalization is a standard preprocessing step for ML algorithms (e.g., KNN, SVM) that are sensitive to feature scales.

Show Solution

import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

Smax, Smin = sepallength.max(), sepallength.min()
S = (sepallength - Smin) / (Smax - Smin)
print(S)

30. Compute the Softmax Score

Compute the softmax scores for the sepallength array.

Difficulty Level: L3

Solve:

# Task: Compute the softmax score of sepallength
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

# Write your code below

Desired Output:

[ 0.002  0.002  0.001  0.001  0.002  0.003  0.001  0.002  0.001  0.002
  0.003  0.002  0.002  0.001  0.004  0.004  0.003  0.002  0.004  0.002
  0.003  0.002  0.001  0.002  0.002  0.002  0.002  0.002  0.002  0.001
  0.002  0.003  0.002  0.003  0.002  0.002  0.003  0.002  0.001  0.002
  0.002  0.001  0.001  0.002  0.002  0.002  0.002  0.001  0.003  0.002
  0.015  0.008  0.013  0.003  0.009  0.004  0.007  0.002  0.01   0.002
  0.002  0.005  0.005  0.006  0.004  0.011  0.004  0.004  0.007  0.004
  0.005  0.006  0.007  0.006  0.008  0.01   0.012  0.011  0.005  0.004
  0.003  0.003  0.004  0.005  0.003  0.005  0.011  0.007  0.004  0.003
  0.003  0.006  0.004  0.002  0.004  0.004  0.004  0.007  0.002  0.004
  0.007  0.004  0.016  0.007  0.009  0.027  0.002  0.02   0.011  0.018
  0.009  0.008  0.012  0.004  0.004  0.008  0.009  0.03   0.03   0.005
  0.013  0.004  0.03   0.007  0.011  0.018  0.007  0.006  0.008  0.018
  0.022  0.037  0.008  0.007  0.006  0.03   0.007  0.008  0.005  0.013
  0.011  0.013  0.004  0.012  0.011  0.011  0.007  0.009  0.007  0.005]

Why this matters: Softmax converts raw scores into probabilities and is the final activation function in most classification neural networks.

Show Solution

import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
sepallength = np.array([float(row[0]) for row in iris])

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

print(softmax(sepallength))

31. Find Percentile Scores

Find the 5th and 95th percentile of the sepallength array.

Difficulty Level: L1

Solve:

# Task: Find the 5th and 95th percentile of sepallength
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

# Write your code below

Desired Output:

[ 4.6    7.255]

Why this matters: Percentiles are used to detect outliers, set clipping thresholds, and define confidence intervals in statistical analysis.

Show Solution

import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

result = np.percentile(sepallength, q=[5, 95])
print(result)

32. Insert Values at Random Positions

Insert np.nan at 20 random positions in the iris_2d dataset.

Difficulty Level: L2

Solve:

# Task: Insert np.nan at 20 random positions in iris_2d
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')

# Write your code below

Desired Output:

[[b'5.1' b'3.5' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.9' b'3.0' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa']
 [b'4.6' b'3.1' b'1.5' b'0.2' b'Iris-setosa']
 [b'5.0' b'3.6' b'1.4' b'0.2' b'Iris-setosa']
 [b'5.4' b'3.9' b'1.7' b'0.4' b'Iris-setosa']
 [b'4.6' b'3.4' b'1.4' b'0.3' b'Iris-setosa']
 [b'5.0' b'3.4' b'1.5' b'0.2' b'Iris-setosa']
 [b'4.4' nan b'1.4' b'0.2' b'Iris-setosa']
 [b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']]

Why this matters: Injecting missing values is used to simulate real-world incomplete data when testing how your cleaning pipeline handles gaps.

Show Solution

import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')

# Method 1
np.random.seed(100)
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Print first 10 rows
print(iris_2d[:10])

33. Find the Position of Missing Values

Find the number and position of missing values in iris_2d‘s sepallength (1st column).

Difficulty Level: L2

Solve:

# Task: Find number and position of missing values in sepallength
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
np.random.seed(100)
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Write your code below

Desired Output:

Number of missing values:
 5
Position of missing values:
 (array([ 39,  88,  99, 130, 147]),)

Why this matters: Locating missing values is the first step in any data cleaning workflow — you need to know where gaps exist before deciding how to fill or drop them.

Show Solution

import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
np.random.seed(100)
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

print("Number of missing values: \n", np.isnan(iris_2d[:, 0]).sum())
print("Position of missing values: \n", np.where(np.isnan(iris_2d[:, 0])))

34. Filter a NumPy Array Based on Multiple Conditions

Filter the rows of iris_2d where petallength (3rd column) > 1.5 and sepallength (1st column) < 5.0.

Difficulty Level: L3

Solve:

# Task: Filter rows where petallength > 1.5 and sepallength < 5.0
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Write your code below

Desired Output:

[[ 4.8  3.4  1.6  0.2]
 [ 4.8  3.4  1.9  0.2]
 [ 4.7  3.2  1.6  0.2]
 [ 4.8  3.1  1.6  0.2]
 [ 4.9  2.4  3.3  1. ]
 [ 4.9  2.5  4.5  1.7]]

Why this matters: Multi-condition filtering is used constantly in data analysis to subset records that meet specific criteria, such as selecting high-risk patients or underperforming products.

Show Solution

import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

condition = (iris_2d[:, 2] > 1.5) & (iris_2d[:, 0] < 5.0)
result = iris_2d[condition]
print(result)

35. Drop Rows Containing Missing Values

Select the rows of iris_2d that do not have any nan value, keeping only complete rows.

Difficulty Level: L3

Solve:

# Task: Drop rows containing any nan value from iris_2d
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Introduce some nan values for the exercise
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Write your code below

Desired Output:

array([[ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2],
       [ 5.4,  3.9,  1.7,  0.4]])

Why this matters: Dropping rows with missing values is a standard data cleaning step before training ML models that cannot handle NaN inputs.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Method 1:
any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d])
print(iris_2d[any_nan_in_row][:5])

# Method 2: (By Rong)
result = iris_2d[np.sum(np.isnan(iris_2d), axis=1) == 0][:5]
print(result)

36. Find Correlation Between Two Columns

Compute the Pearson correlation between SepalLength (1st column) and PetalLength (3rd column) in iris_2d.

Difficulty Level: L2

Solve:

# Task: Find correlation between SepalLength and PetalLength
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Write your code below

Desired Output:

0.871754157305

Why this matters: Correlation analysis is a core step in exploratory data analysis and feature selection for ML models.

Show Solution

import numpy as np
from scipy.stats.stats import pearsonr

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Solution 1
print(np.corrcoef(iris[:, 0], iris[:, 2])[0, 1])

# Solution 2
corr, p_value = pearsonr(iris[:, 0], iris[:, 2])
print(corr)

37. Check for Missing Values in an Array

Determine whether iris_2d has any missing (nan) values and print the boolean result.

Difficulty Level: L2

Solve:

# Task: Check if iris_2d has any missing values
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

# Write your code below

Desired Output:

False

Why this matters: Detecting missing values early prevents silent errors in downstream computations like aggregation and model training.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

print(np.isnan(iris_2d).any())

38. Replace All Missing Values with Zero

Replace all nan values in iris_2d with 0 and print the first four rows.

Difficulty Level: L2

Solve:

# Task: Replace all nan values with 0
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Write your code below

Desired Output:

array([[ 5.1,  3.5,  1.4,  0. ],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2]])

Why this matters: Imputing missing values with zero is a quick baseline strategy in data cleaning pipelines before more sophisticated imputation.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Solution
iris_2d[np.isnan(iris_2d)] = 0
print(iris_2d[:4])

39. Count Unique Values in an Array

Find the unique values and their counts in the species column of the iris dataset.

Difficulty Level: L2

Solve:

# Task: Find unique values and their counts in iris species
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Write your code below

Desired Output:

(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
       dtype='|S15'), array([50, 50, 50]))

Why this matters: Checking class distribution is essential before training classifiers to detect imbalanced datasets.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Extract the species column
species = np.array([row.tolist()[4] for row in iris])

# Get the unique values and the counts
print(np.unique(species, return_counts=True))

40. Bin a Numeric Column into Categories

Bin the petal length (3rd column) of iris into a text array using these rules: less than 3 becomes 'small', 3 to 5 becomes 'medium', and 5 or above becomes 'large'.

Difficulty Level: L2

Solve:

# Task: Bin petal length into 'small', 'medium', 'large' categories
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Write your code below

Desired Output:

['small', 'small', 'small', 'small']

Why this matters: Binning continuous features into categories is a common feature engineering technique for decision trees and rule-based models.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Bin petallength
petal_length_bin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10])

# Map it to respective category
label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan}
petal_length_cat = [label_map[x] for x in petal_length_bin]

# View
print(petal_length_cat[:4])

41. Create a New Column from Existing Columns

Add a new column to iris_2d for volume, computed as (pi x petallength x sepallength^2) / 3.

Difficulty Level: L2

Solve:

# Task: Create a volume column from existing columns
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Write your code below

Desired Output:

array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa', 38.13265162927291],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa', 35.200498485922445],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa', 30.0723720777127],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa', 33.238050274980004]], dtype=object)

Why this matters: Deriving new features from existing columns is a fundamental feature engineering step that can improve model performance.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')

# Compute volume
sepallength = iris_2d[:, 0].astype('float')
petallength = iris_2d[:, 2].astype('float')
volume = (np.pi * petallength * (sepallength**2)) / 3

# Introduce new dimension to match iris_2d's
volume = volume[:, np.newaxis]

# Add the new column
out = np.hstack([iris_2d, volume])

# View
print(out[:4])

42. Probabilistic Sampling from a Categorical Array

Randomly sample from iris‘s species column such that setosa appears twice as often as versicolor and virginica.

Difficulty Level: L3

Solve:

# Task: Probabilistic sampling with setosa twice as likely
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Write your code below

Desired Output:

(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'], dtype=object), array([77, 37, 36]))

Why this matters: Probabilistic sampling is used in class balancing, bootstrapping, and data augmentation for imbalanced datasets.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Get the species column
species = iris[:, 4]

# Approach 1: Generate Probabilistically
np.random.seed(100)
a = np.array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])
species_out = np.random.choice(a, 150, p=[0.5, 0.25, 0.25])

# Approach 2: Probabilistic Sampling (preferred)
np.random.seed(100)
probs = np.r_[np.linspace(0, 0.500, num=50), np.linspace(0.501, .750, num=50), np.linspace(.751, 1.0, num=50)]
index = np.searchsorted(probs, np.random.random(150))
species_out = species[index]
print(np.unique(species_out, return_counts=True))

43. Get the Second Largest Value by Group

Find the second longest petallength among species setosa in the iris dataset.

Difficulty Level: L2

Solve:

# Task: Find the second longest petallength of species setosa
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Write your code below

Desired Output:

1.7

Why this matters: Computing grouped statistics beyond simple max/min is common in exploratory data analysis and outlier detection.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Get the species and petal length columns
petal_len_setosa = iris[iris[:, 4] == b'Iris-setosa', [2]].astype('float')

# Get the second last value
print(np.unique(np.sort(petal_len_setosa))[-2])

44. Sort a 2D Array by a Column

Sort the iris dataset based on the sepallength column (1st column) in ascending order.

Difficulty Level: L2

Solve:

# Task: Sort iris dataset by sepallength (1st column)
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Write your code below

Desired Output:

[[b'4.3' b'3.0' b'1.1' b'0.1' b'Iris-setosa']
 [b'4.4' b'3.2' b'1.3' b'0.2' b'Iris-setosa']
 [b'4.4' b'3.0' b'1.3' b'0.2' b'Iris-setosa']
 [b'4.4' b'2.9' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.5' b'2.3' b'1.3' b'0.3' b'Iris-setosa']
 [b'4.6' b'3.6' b'1.0' b'0.2' b'Iris-setosa']
 [b'4.6' b'3.1' b'1.5' b'0.2' b'Iris-setosa']
 [b'4.6' b'3.4' b'1.4' b'0.3' b'Iris-setosa']
 [b'4.6' b'3.2' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa']]

Why this matters: Sorting datasets by a specific feature is fundamental for ranking, binary search, and ordered visualizations.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Sort by column position 0: SepalLength
print(iris[iris[:,0].argsort()][:20])

45. Find the Most Frequent Value in an Array

Find the most frequent value of petal length (3rd column) in the iris dataset.

Difficulty Level: L1

Solve:

# Task: Find the most frequent petal length value
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Write your code below

Desired Output:

b'1.5'

Why this matters: Identifying the mode of a feature helps detect dominant patterns and is used in imputation strategies.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Solution
vals, counts = np.unique(iris[:, 2], return_counts=True)
print(vals[np.argmax(counts)])

46. Find the First Occurrence Greater Than a Value

Find the position of the first occurrence of a value greater than 1.0 in the petalwidth (4th column) of the iris dataset.

Difficulty Level: L2

Solve:

# Task: Find position of first value > 1.0 in petalwidth column
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Write your code below

Desired Output:

Why this matters: Locating threshold crossings is used in signal processing, event detection, and conditional filtering of datasets.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Solution
print(np.argwhere(iris[:, 3].astype(float) > 1.0)[0])

47. Clip Array Values to a Range

From the array a, replace all values greater than 30 with 30 and all values less than 10 with 10.

Difficulty Level: L2

Solve:

# Task: Clip values to range [10, 30]
import numpy as np

np.random.seed(100)
a = np.random.uniform(1, 50, 20)

# Write your code below

Desired Output:

[ 27.63  14.64  21.8   30.    10.    10.    30.    30.    10.    29.18  30.
  11.25  10.08  10.    11.77  30.    30.    10.    30.    14.43]

Why this matters: Clipping values to a valid range is essential for outlier handling and ensuring inputs stay within expected bounds for models.

Show Solution

import numpy as np

np.set_printoptions(precision=2)
np.random.seed(100)
a = np.random.uniform(1, 50, 20)

# Solution 1: Using np.clip
print(np.clip(a, a_min=10, a_max=30))

# Solution 2: Using np.where
print(np.where(a < 10, 10, np.where(a > 30, 30, a)))

48. Get the Positions of Top N Values

Get the positions of the top 5 maximum values in the array a.

Difficulty Level: L2

Solve:

# Task: Find positions of top 5 maximum values
import numpy as np

np.random.seed(100)
a = np.random.uniform(1, 50, 20)

# Write your code below

Desired Output:

[18 7 3 10 15]

Why this matters: Finding top-N indices is used in ranking systems, recommendation engines, and selecting the highest-confidence predictions.

Show Solution

import numpy as np

np.random.seed(100)
a = np.random.uniform(1, 50, 20)

# Solution 1:
print(a.argsort()[-5:])

# Solution 2:
print(np.argpartition(-a, 5)[:5])

# Get the actual values:
print(a[a.argsort()][-5:])

49. Compute Row-Wise Counts of All Possible Values

For each row in arr, count the occurrences of every value from 1 to 10.

Difficulty Level: L4

Solve:

# Task: Count occurrences of each value (1-10) per row
import numpy as np

np.random.seed(100)
arr = np.random.randint(1, 11, size=(6, 10))
print(arr)

# Write your code below

Desired Output:

[ 1  2  3  4  5  6  7  8  9 10]
[[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
 [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
 [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
 [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
 [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
 [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]

Why this matters: Row-wise value counting is used in bag-of-words representations and histogram-based feature extraction.

Show Solution

import numpy as np

np.random.seed(100)
arr = np.random.randint(1, 11, size=(6, 10))
print(arr)

# Solution
def counts_of_all_values_rowwise(arr2d):
    # Unique values and its counts row wise
    num_counts_array = [np.unique(row, return_counts=True) for row in arr2d]

    # Counts of all values row wise
    return([[int(b[a==i]) if i in a else 0 for i in np.unique(arr2d)] for a, b in num_counts_array])

# Print
print(np.arange(1, 11))
result = counts_of_all_values_rowwise(arr)
for row in result:
    print(row)

50. Flatten an Array of Arrays into a 1D Array

Convert array_of_arrays (which contains arr1, arr2, and arr3) into a single flat 1D array.

Difficulty Level: L2

Solve:

# Task: Flatten an array of arrays into a 1d array
import numpy as np

arr1 = np.arange(3)
arr2 = np.arange(3, 7)
arr3 = np.arange(7, 10)

array_of_arrays = np.array([arr1, arr2, arr3], dtype=object)
print(array_of_arrays)

# Write your code below

Desired Output:

[0 1 2 3 4 5 6 7 8 9]

Why this matters: Flattening nested array structures is a routine step when combining data from multiple sources into a single feature vector.

Show Solution

import numpy as np

arr1 = np.arange(3)
arr2 = np.arange(3, 7)
arr3 = np.arange(7, 10)

array_of_arrays = np.array([arr1, arr2, arr3], dtype=object)
print('array_of_arrays: ', array_of_arrays)

# Solution 1
arr_2d = np.array([a for arr in array_of_arrays for a in arr])
print(arr_2d)

# Solution 2:
arr_2d = np.concatenate(array_of_arrays)
print(arr_2d)

51. Generate One-Hot Encodings for an Array

Compute one-hot encodings (dummy binary variables) for each unique value in arr.

Difficulty Level: L4

Solve:

# Task: Create one-hot encodings for the array
import numpy as np

np.random.seed(101)
arr = np.random.randint(1, 4, size=6)
print(arr)

# Write your code below

Desired Output:

array([[ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 1.,  0.,  0.]])

Why this matters: One-hot encoding converts categorical integers into binary vectors, which is required by most ML algorithms that expect numeric input.

Show Solution

import numpy as np

np.random.seed(101)
arr = np.random.randint(1, 4, size=6)
print(arr)

# Solution 1:
def one_hot_encodings(arr):
    uniqs = np.unique(arr)
    out = np.zeros((arr.shape[0], uniqs.shape[0]))
    for i, k in enumerate(arr):
        out[i, k-1] = 1
    return out

print(one_hot_encodings(arr))

# Method 2:
print((arr[:, None] == np.unique(arr)).view(np.int8))

52. Create Row Numbers Grouped by a Categorical Variable

Assign within-group row numbers to each element in species_small, restarting the count at 0 for each new species.

Difficulty Level: L3

Solve:

# Task: Create row numbers grouped by species
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
species_small = np.sort(np.random.choice(species, size=20))
print(species_small)

# Write your code below

Desired Output:

[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5]

Why this matters: Grouped row numbering is useful for creating sequence features and ranking within categories in data analysis.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
species_small = np.sort(np.random.choice(species, size=20))

# Solution
print([i for val in np.unique(species_small) for i, grp in enumerate(species_small[species_small==val])])

53. Create Group IDs from a Categorical Variable

Assign a numeric group ID (0, 1, 2, …) to each element in species_small based on its unique species category.

Difficulty Level: L4

Solve:

# Task: Create group ids for each species category
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
species_small = np.sort(np.random.choice(species, size=20))
print(species_small)

# Write your code below

Desired Output:

[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]

Why this matters: Converting categorical labels to integer group IDs is a standard preprocessing step for label encoding in ML pipelines.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
species_small = np.sort(np.random.choice(species, size=20))

# Solution
output = []
uniqs = np.unique(species_small)

for val in uniqs:  # uniq values in group
    for s in species_small[species_small==val]:  # each element in group
        groupid = np.argwhere(uniqs == s).tolist()[0][0]  # groupid
        output.append(groupid)

print(output)

54. Rank Items in a 1D Array

Create the ranks for each value in the numeric array a, where rank 0 is the smallest value.

Difficulty Level: L2

Solve:

# Task: Create ranks for the array values
import numpy as np

np.random.seed(10)
a = np.random.randint(20, size=10)
print(a)

# Write your code below

Desired Output:

[4 2 6 0 8 7 9 3 5 1]

Why this matters: Ranking transforms are used in non-parametric statistics and for creating ordinal features from numeric data.

Show Solution

import numpy as np

np.random.seed(10)
a = np.random.randint(20, size=10)
print('Array: ', a)

# Solution
print(a.argsort().argsort())

55. Rank Items in a Multidimensional Array

Create a rank array of the same shape as the 2D array a, where ranking is applied globally across all elements.

Difficulty Level: L3

Solve:

# Task: Create ranks for a 2D array (global ranking)
import numpy as np

np.random.seed(10)
a = np.random.randint(20, size=[2, 5])
print(a)

# Write your code below

Desired Output:

[[4 2 6 0 8]
 [7 9 3 5 1]]

Why this matters: Global ranking across a matrix is useful for percentile-based normalization and creating competition-style leaderboards.

Show Solution

import numpy as np

np.random.seed(10)
a = np.random.randint(20, size=[2, 5])
print(a)

# Solution
print(a.ravel().argsort().argsort().reshape(a.shape))

56. Find the Maximum Value in Each Row

Compute the maximum value for each row in the 2D array a.

Difficulty Level: L2

Solve:

# Task: Find maximum value in each row
import numpy as np

np.random.seed(100)
a = np.random.randint(1, 10, [5, 3])
print(a)

# Write your code below

Desired Output:

array([9, 8, 6, 3, 9])

Why this matters: Row-wise max is used in neural network output processing (e.g., selecting the predicted class) and feature aggregation.

Show Solution

import numpy as np

np.random.seed(100)
a = np.random.randint(1, 10, [5, 3])
print(a)

# Solution 1
print(np.amax(a, axis=1))

# Solution 2
print(np.apply_along_axis(np.max, arr=a, axis=1))

57. Compute the Min-by-Max Ratio for Each Row

Compute the ratio of the minimum to the maximum value for each row in the 2D array a.

Difficulty Level: L3

Solve:

# Task: Compute min/max ratio for each row
import numpy as np

np.random.seed(100)
a = np.random.randint(1, 10, [5, 3])
print(a)

# Write your code below

Desired Output:

array([ 0.44444444,  0.125     ,  0.5       ,  1.        ,  0.11111111])

Why this matters: Min/max ratios help measure the spread within each observation, which is useful for anomaly detection and data quality checks.

Show Solution

import numpy as np

np.random.seed(100)
a = np.random.randint(1, 10, [5, 3])
print(a)

# Solution
print(np.apply_along_axis(lambda x: np.min(x)/np.max(x), arr=a, axis=1))

58. Find Duplicate Records in an Array

Mark duplicate entries in array a as True (2nd occurrence onwards), with first occurrences marked as False.

Difficulty Level: L3

Solve:

# Task: Find duplicate entries (mark 2nd+ occurrences as True)
import numpy as np

np.random.seed(100)
a = np.random.randint(0, 5, 10)
print('Array: ', a)

# Write your code below

Desired Output:

[False  True False  True False False  True  True  True  True]

Why this matters: Identifying duplicates is a critical data cleaning step to prevent data leakage and inflated counts in analysis.

Show Solution

import numpy as np

np.random.seed(100)
a = np.random.randint(0, 5, 10)

# Solution
# Create an all True array
out = np.full(a.shape[0], True)

# Find the index positions of unique elements
unique_positions = np.unique(a, return_index=True)[1]

# Mark those positions as False
out[unique_positions] = False

print(out)

59. Find the Grouped Mean

Compute the mean of sepalwidth (2nd column) grouped by species (5th column) in the iris dataset.

Difficulty Level: L3

Solve:

# Task: Find mean sepalwidth grouped by species
import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Write your code below

Desired Output:

[[b'Iris-setosa', 3.418],
 [b'Iris-versicolor', 2.770],
 [b'Iris-virginica', 2.974]]

Why this matters: Grouped aggregation is a fundamental operation in exploratory data analysis and feature engineering for structured data.

Show Solution

import numpy as np

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Solution
numeric_column = iris[:, 1].astype('float')  # sepalwidth
grouping_column = iris[:, 4]  # species

# List comprehension version
output = [[group_val, numeric_column[grouping_column==group_val].mean()] for group_val in np.unique(grouping_column)]

print(output)

60. Convert a PIL Image to a NumPy Array

Import the image from the given URL and convert it to a numpy array, printing its shape.

Difficulty Level: L3

Solve:

# Task: Import an image from URL and convert to numpy array
import numpy as np
from io import BytesIO
from PIL import Image
import PIL, requests

URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg'

# Write your code below

Desired Output:

A numpy array representation of the image with shape (height, width, 3)

Why this matters: Converting images to numpy arrays is the first step in any image processing or computer vision pipeline.

Show Solution

import numpy as np
from io import BytesIO
from PIL import Image
import PIL, requests

# Import image from URL
URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg'
response = requests.get(URL)

# Read it as Image
I = Image.open(BytesIO(response.content))

# Optionally resize
I = I.resize([150, 150])

# Convert to numpy array
arr = np.asarray(I)
print(arr.shape)

# Optionally convert it back to an image
im = PIL.Image.fromarray(np.uint8(arr))
print(type(arr))

61. Drop All Missing Values from a 1D Array

Remove all nan values from the 1D array a and return only the valid elements.

Difficulty Level: L2

Solve:

# Task: Drop all nan values from a 1D array
import numpy as np

a = np.array([1, 2, 3, np.nan, 5, 6, 7, np.nan])
print(a)

# Write your code below

Desired Output:

array([ 1.,  2.,  3.,  5.,  6.,  7.])

Why this matters: Stripping NaN values from arrays is necessary before computing statistics or passing data to functions that do not handle missing values.

Show Solution

import numpy as np

a = np.array([1, 2, 3, np.nan, 5, 6, 7, np.nan])

# Solution
print(a[~np.isnan(a)])

62. Compute the Euclidean Distance Between Two Arrays

Compute the Euclidean distance between arrays a and b.

Difficulty Level: L3

Solve:

# Task: Compute euclidean distance between two arrays
import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([4, 5, 6, 7, 8])

# Write your code below

Desired Output:

6.7082039324993694

Why this matters: Euclidean distance is the foundation of KNN, K-means clustering, and many similarity-based ML algorithms.

Show Solution

import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([4, 5, 6, 7, 8])

# Solution
dist = np.linalg.norm(a - b)
print(dist)

63. Find All Local Maxima (Peaks) in a 1D Array

Find the positions of all peaks in array a, where peaks are values surrounded by smaller values on both sides.

Difficulty Level: L4

Solve:

# Task: Find positions of all peaks (local maxima) in the array
import numpy as np

a = np.array([1, 3, 7, 1, 2, 6, 0, 1])

# Write your code below

Desired Output:

array([2, 5])

Why this matters: Peak detection is widely used in signal processing, time-series analysis, and identifying turning points in financial data.

Show Solution

import numpy as np

a = np.array([1, 3, 7, 1, 2, 6, 0, 1])

doublediff = np.diff(np.sign(np.diff(a)))
peak_locations = np.where(doublediff == -2)[0] + 1
print(peak_locations)

64. Subtract a 1D Array from a 2D Array Row-Wise

Subtract each element of b_1d from the corresponding row of a_2d, so that b_1d[0] is subtracted from all elements in row 0, b_1d[1] from row 1, and so on.

Difficulty Level: L2

Solve:

# Task: Subtract 1d array from 2d array row-wise
import numpy as np

a_2d = np.array([[3, 3, 3], [4, 4, 4], [5, 5, 5]])
b_1d = np.array([1, 2, 3])

# Write your code below

Desired Output:

[[2 2 2]
 [2 2 2]
 [2 2 2]]

Why this matters: Row-wise subtraction using broadcasting is essential for centering data and computing deviations from row-level baselines.

Show Solution

import numpy as np

a_2d = np.array([[3, 3, 3], [4, 4, 4], [5, 5, 5]])
b_1d = np.array([1, 2, 3])

# Solution
print(a_2d - b_1d[:, None])

65. Find the Index of the N’th Repetition of a Value

Find the index of the 5th repetition of the value 1 in array x.

Difficulty Level: L2

Solve:

# Task: Find the index of the 5th occurrence of value 1
import numpy as np

x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])

# Write your code below

Desired Output:

Why this matters: Locating the n’th occurrence of a value is useful for event-based analysis and pattern matching in sequential data.

Show Solution

import numpy as np

x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])
n = 5

# Solution 1: List comprehension
print([i for i, v in enumerate(x) if v == 1][n-1])

# Solution 2: Numpy version
print(np.where(x == 1)[0][n-1])

66. Convert NumPy datetime64 to Python datetime

Convert the numpy datetime64 object dt64 to a Python datetime.datetime object.

Difficulty Level: L2

Solve:

# Task: Convert numpy datetime64 to Python datetime
import numpy as np

dt64 = np.datetime64('2018-02-25 22:10:10')
print(dt64)

# Write your code below

Desired Output:

datetime.datetime(2018, 2, 25, 22, 10, 10)

Why this matters: Converting between NumPy and Python datetime types is necessary when interfacing with libraries that only accept native Python datetime objects.

Show Solution

import numpy as np
from datetime import datetime

dt64 = np.datetime64('2018-02-25 22:10:10')

# Solution 1
print(dt64.tolist())

# Solution 2
print(dt64.astype(datetime))

67. Compute the Moving Average of an Array

Compute the moving average of array Z with a window size of 3.

Difficulty Level: L3

Solve:

# Task: Compute moving average with window size 3
import numpy as np

np.random.seed(100)
Z = np.random.randint(10, size=10)
print(Z)

# Write your code below

Desired Output:

array:  [8 8 3 7 7 0 4 2 5 2]
moving average:  [ 6.33  6.    5.67  4.67  3.67  2.    3.67  3.  ]

Why this matters: Moving averages smooth noisy data and are widely used in time-series forecasting, signal processing, and financial analysis.

Show Solution

import numpy as np

def moving_average(a, n=3):
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

np.random.seed(100)
Z = np.random.randint(10, size=10)
print('array: ', Z)

# Method 1
print('moving average: ', moving_average(Z, n=3).round(2))

# Method 2: using convolve
print('moving average: ', np.convolve(Z, np.ones(3)/3, mode='valid').round(2))

68. Create an Array Sequence Given Start, Length, and Step

Create a numpy array of length 10, starting from 5, with a step of 3 between consecutive numbers using the variables start, length, and step.

Difficulty Level: L2

Solve:

# Task: Create array sequence with start=5, length=10, step=3
import numpy as np

length = 10
start = 5
step = 3

# Write your code below

Desired Output:

array([ 5,  8, 11, 14, 17, 20, 23, 26, 29, 32])

Why this matters: Generating custom sequences is used for creating feature grids, time steps, and index ranges in scientific computing.

Show Solution

import numpy as np

length = 10
start = 5
step = 3

def seq(start, length, step):
    end = start + (step * length)
    return np.arange(start, end, step)

print(seq(start, length, step))

69. Fill in Missing Dates in an Irregular Date Series

Given the array dates containing every other day, fill in the missing dates to create a continuous daily sequence.

Difficulty Level: L3

Solve:

# Task: Fill in missing dates to make a continuous date sequence
import numpy as np

dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2)
print(dates)

# Write your code below

Desired Output:

['2018-02-01' '2018-02-02' '2018-02-03' '2018-02-04'
 '2018-02-05' '2018-02-06' '2018-02-07' '2018-02-08'
 '2018-02-09' '2018-02-10' '2018-02-11' '2018-02-12'
 '2018-02-13' '2018-02-14' '2018-02-15' '2018-02-16'
 '2018-02-17' '2018-02-18' '2018-02-19' '2018-02-20'
 '2018-02-21' '2018-02-22' '2018-02-23']

Why this matters: Filling gaps in date sequences is a standard step in time-series preprocessing to ensure regular intervals for forecasting models.

Show Solution

import numpy as np

# Input
dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2)
print(dates)

# Solution
filled_in = np.array([np.arange(date, (date+d)) for date, d in zip(dates, np.diff(dates))]).reshape(-1)

# Add the last day
output = np.hstack([filled_in, dates[-1]])
print(output)

70. Create Strides from a 1D Array

From the array arr, generate a 2D matrix using strides with a window length of 4 and stride of 2, producing rows like [0,1,2,3], [2,3,4,5], [4,5,6,7]....

Difficulty Level: L4

Solve:

# Task: Create a 2D matrix of strides with window length 4 and stride 2
import numpy as np

arr = np.arange(15)
print(arr)

# Write your code below

Desired Output:

[[ 0  1  2  3]
 [ 2  3  4  5]
 [ 4  5  6  7]
 [ 6  7  8  9]
 [ 8  9 10 11]
 [10 11 12 13]]

Why this matters: Sliding window views via strides are used in convolutional neural networks, rolling statistics, and efficient time-series feature extraction.

Show Solution

import numpy as np

arr = np.arange(15)

def gen_strides(a, stride_len=5, window_len=5):
    n_strides = ((a.size - window_len) // stride_len) + 1
    return np.array([a[s:(s+window_len)] for s in np.arange(0, n_strides*stride_len, stride_len)])

print(gen_strides(arr, stride_len=2, window_len=4))

71. How to create an array using a function of its row and column indices?

Create a 4×5 array where each element equals the sum of its row and column index using np.fromfunction.

Difficulty Level: L1

Solve:

# Task: Create a 4x5 array where element (i,j) = i + j using np.fromfunction
import numpy as np

# Write your code below

Desired Output:

[[0 1 2 3 4]
 [1 2 3 4 5]
 [2 3 4 5 6]
 [3 4 5 6 7]]

Why this matters: This is handy when the value at each array position follows a mathematical formula based on its coordinates.

Show Solution

import numpy as np

arr = np.fromfunction(lambda i, j: i + j, (4, 5), dtype=int)
print(arr)

72. How does array broadcasting work?

Add the column vector a (shape 3×1) and row vector b (shape 3,) using broadcasting to produce a 3×3 result.

Difficulty Level: L1

Solve:

# Task: Add a column vector and row vector using broadcasting
import numpy as np

a = np.array([[1], [2], [3]])   # shape (3, 1)
b = np.array([10, 20, 30])      # shape (3,)

# Write your code below

Desired Output:

[[11 21 31]
 [12 22 32]
 [13 23 33]]

Why this matters: Broadcasting lets NumPy do math on arrays of different shapes without making copies, which is fundamental to efficient vectorized computation.

Show Solution

import numpy as np

a = np.array([[1], [2], [3]])   # shape (3, 1)
b = np.array([10, 20, 30])      # shape (3,)

# Broadcasting automatically expands a (3,1) + (3,) -> (3,3)
result = a + b
print(result)

73. How to create coordinate grids with np.meshgrid?

Create coordinate matrices X and Y from arrays x and y using np.meshgrid, and print both.

Difficulty Level: L1

Solve:

# Task: Create coordinate matrices X and Y from x and y arrays
import numpy as np

x = np.array([1, 2, 3])
y = np.array([4, 5])

# Write your code below

Desired Output:

X:
[[1 2 3]
 [1 2 3]]
Y:
[[4 4 4]
 [5 5 5]]

Why this matters: Coordinate grids are essential for evaluating functions over a 2D plane, such as plotting 3D surfaces or computing heatmaps.

Show Solution

import numpy as np

x = np.array([1, 2, 3])
y = np.array([4, 5])

X, Y = np.meshgrid(x, y)
print('X:')
print(X)
print('Y:')
print(Y)

74. How to standardize columns of a 2D array (zero mean, unit variance)?

Standardize each column of the random array a so that each column has mean 0 and standard deviation 1, then verify by printing the resulting means and stds.

Difficulty Level: L2

Solve:

# Task: Standardize each column of a 2D array to mean=0, std=1
import numpy as np

np.random.seed(42)
a = np.random.randint(1, 100, size=(5, 3))
print('Original:')
print(a)

# Write your code below

Desired Output:

Means after standardization: [-0.  0.  0.]
Stds after standardization: [1. 1. 1.]

Why this matters: Standardization is a critical preprocessing step in ML — features on different scales (e.g., age vs salary) can distort model training if not standardized.

Show Solution

import numpy as np

np.random.seed(42)
a = np.random.randint(1, 100, size=(5, 3))
print('Original:')
print(a)

# Standardize: subtract column mean, divide by column std
col_mean = a.mean(axis=0)
col_std = a.std(axis=0)
standardized = (a - col_mean) / col_std

print('Means after standardization:', np.round(standardized.mean(axis=0), 1))
print('Stds after standardization:', np.round(standardized.std(axis=0), 1))

75. How to create and use structured arrays?

Create a structured array with fields name (string), age (int), and weight (float) containing three records, then extract and print the name field.

Difficulty Level: L2

Solve:

# Task: Create a structured array with name, age, weight fields
import numpy as np

# Write your code below
# Create the dtype and the structured array, then extract names

Desired Output:

[('Alice', 25, 55.5) ('Bob', 30, 75. ) ('Charlie', 35, 65.2)]
Names: ['Alice' 'Bob' 'Charlie']

Why this matters: Structured arrays let you store heterogeneous data types in a single NumPy array, serving as a lightweight alternative to pandas DataFrames.

Show Solution

import numpy as np

dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f8')])
arr = np.array([('Alice', 25, 55.5), ('Bob', 30, 75.0), ('Charlie', 35, 65.2)], dtype=dt)
print(arr)
print('Names:', arr['name'])

76. How to use fancy indexing to select specific elements?

Select elements at positions (0,1), (1,3), and (3,4) from the 4×5 array a using fancy indexing with row and column index arrays.

Difficulty Level: L1

Solve:

# Task: Select elements at (0,1), (1,3), (3,4) using fancy indexing
import numpy as np

a = np.arange(20).reshape(4, 5)
print(a)

# Write your code below

Desired Output:

[ 1  8 19]

Why this matters: Fancy indexing lets you grab arbitrary combinations of elements from an array, unlike regular slicing which only gives contiguous blocks.

Show Solution

import numpy as np

a = np.arange(20).reshape(4, 5)

rows = np.array([0, 1, 3])
cols = np.array([1, 3, 4])
result = a[rows, cols]
print(result)

77. How to compute cumulative sum and cumulative product?

Compute and print the cumulative sum and cumulative product of the array a = [1, 2, 3, 4, 5] using np.cumsum and np.cumprod.

Difficulty Level: L1

Solve:

# Task: Compute cumulative sum and cumulative product
import numpy as np

a = np.array([1, 2, 3, 4, 5])

# Write your code below

Desired Output:

Cumulative sum: [ 1  3  6 10 15]
Cumulative product: [  1   2   6  24 120]

Why this matters: Running totals and cumulative products are essential for time series analysis, computing factorials, and tracking cumulative returns.

Show Solution

import numpy as np

a = np.array([1, 2, 3, 4, 5])

print('Cumulative sum:', np.cumsum(a))
print('Cumulative product:', np.cumprod(a))

78. How to compute cumulative sum along a specific axis of a 2D array?

Compute the cumulative sum of the 2D array a along axis=0 (down columns) and axis=1 (across rows), and print both results.

Difficulty Level: L2

Solve:

# Task: Compute cumulative sum along axis=0 and axis=1
import numpy as np

np.random.seed(42)
a = np.random.randint(1, 10, size=(3, 4))
print('Array:')
print(a)

# Write your code below

Desired Output:

Cumsum along rows (axis=0):
[[ 7  4  8  5]
 [14  7 15 13]
 [19 11 23 21]]
Cumsum along columns (axis=1):
[[ 7 11 19 24]
 [ 7 10 17 25]
 [ 5  9 17 25]]

Why this matters: Axis-specific cumulative sums are important when processing tabular data where rows and columns represent different dimensions, such as cumulative sales by month and region.

Show Solution

import numpy as np

np.random.seed(42)
a = np.random.randint(1, 10, size=(3, 4))
print('Array:')
print(a)

print('Cumsum along rows (axis=0):')
print(np.cumsum(a, axis=0))

print('Cumsum along columns (axis=1):')
print(np.cumsum(a, axis=1))

79. How to compute a dot product using np.einsum?

Compute the dot product of arrays a = [1, 2, 3] and b = [4, 5, 6] using np.einsum with the subscript string 'i,i->'.

Difficulty Level: L2

Solve:

# Task: Compute the dot product using np.einsum
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Write your code below
# Hint: the einsum subscript string for dot product is 'i,i->'

Desired Output:

Why this matters: `np.einsum` is a powerful one-liner for expressing array operations using index notation, and mastering it replaces dozens of specialized NumPy calls.

Show Solution

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# 'i,i->' means: multiply element-wise, then sum all (scalar output)
result = np.einsum('i,i->', a, b)
print(result)

# Equivalent to:
# np.dot(a, b)  -> 32

80. How to multiply two matrices using np.einsum?

Multiply matrices A and B using np.einsum with the subscript string 'ij,jk->ik' and print the result.

Difficulty Level: L2

Solve:

# Task: Matrix multiplication using np.einsum
import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Write your code below
# Hint: the einsum subscript string for matrix multiply is 'ij,jk->ik'

Desired Output:

[[19 22]
 [43 50]]

Why this matters: Understanding einsum for matrix multiplication is the gateway to expressing far more complex tensor operations (batch matmul, traces, contractions) in one readable line.

Show Solution

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# 'ij,jk->ik' means: sum over j (the shared index)
result = np.einsum('ij,jk->ik', A, B)
print(result)

# Equivalent to:
# A @ B  or  np.matmul(A, B)

81. How to apply a custom function to every row or column of an array?

Compute the range (max minus min) of each row in array a using np.apply_along_axis with axis=1.

Difficulty Level: L2

Solve:

# Task: Compute range (max - min) of each row using np.apply_along_axis
import numpy as np

np.random.seed(50)
a = np.random.randint(1, 20, size=(4, 5))
print('Array:')
print(a)

# Write your code below

Desired Output:

Range per row: [16  9 13 14]

Why this matters: `np.apply_along_axis` lets you run any custom summary function across rows or columns when NumPy doesn’t provide a built-in for it.

Show Solution

import numpy as np

np.random.seed(50)
a = np.random.randint(1, 20, size=(4, 5))
print('Array:')
print(a)

result = np.apply_along_axis(lambda x: x.max() - x.min(), axis=1, arr=a)
print('Range per row:', result)

82. How to get the rank of each element in an array?

Get the rank (sorted position) of each element in array a using the double argsort trick: a.argsort().argsort().

Difficulty Level: L3

Solve:

# Task: Get the rank of each element using double argsort
import numpy as np

np.random.seed(10)
a = np.random.randint(20, size=10)
print('Array:', a)

# Write your code below

Desired Output:

Ranks: [4 2 6 0 8 7 9 3 5 1]

Why this matters: Ranking is used in non-parametric statistics, percentile calculations, and rank-based feature engineering for ML models.

Show Solution

import numpy as np

np.random.seed(10)
a = np.random.randint(20, size=10)
print('Array:', a)

# Double argsort trick: first argsort gives sort order,
# second argsort gives the rank of each element
ranks = a.argsort().argsort()
print('Ranks:', ranks)

83. How to compute matrix determinant and inverse?

Compute and print the determinant and inverse of matrix A = [[1, 2], [3, 4]] using np.linalg.det and np.linalg.inv.

Difficulty Level: L2

Solve:

# Task: Compute the determinant and inverse of matrix A
import numpy as np

A = np.array([[1, 2], [3, 4]])

# Write your code below

Desired Output:

Determinant: -2.0
Inverse:
[[-2.   1. ]
 [ 1.5 -0.5]]

Why this matters: Determinants and matrix inverses are fundamental in linear regression, solving linear systems, and many ML algorithms like Gaussian processes.

Show Solution

import numpy as np

A = np.array([[1, 2], [3, 4]])

det = np.linalg.det(A)
print('Determinant:', round(det, 1))

inv = np.linalg.inv(A)
print('Inverse:')
print(inv)

84. How to solve a system of linear equations with NumPy?

Solve the system 3x + y = 9, x + 2y = 8 by passing the coefficient matrix A and constants vector b to np.linalg.solve.

Difficulty Level: L2

Solve:

# Task: Solve the system 3x + y = 9, x + 2y = 8
import numpy as np

# Coefficient matrix and constants vector
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])

# Write your code below

Desired Output:

Solution [x, y]: [2. 3.]

Why this matters: Solving `Ax = b` underpins linear regression, circuit analysis, and many optimization problems in engineering and data science.

Show Solution

import numpy as np

A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])

x = np.linalg.solve(A, b)
print('Solution [x, y]:', x)

85. How to compute eigenvalues and eigenvectors?

Compute and print the eigenvalues and eigenvectors of matrix A = [[4, -2], [1, 1]] using np.linalg.eig.

Difficulty Level: L3

Solve:

# Task: Compute eigenvalues and eigenvectors of matrix A
import numpy as np

A = np.array([[4, -2], [1, 1]])

# Write your code below

Desired Output:

Eigenvalues: [3. 2.]
Eigenvectors:
[[ 0.89442719  0.70710678]
 [ 0.4472136   0.70710678]]

Why this matters: Eigendecomposition is the backbone of PCA — it reveals the directions of maximum variance in your data and how much variance each direction explains.

Show Solution

import numpy as np

A = np.array([[4, -2], [1, 1]])

eigenvalues, eigenvectors = np.linalg.eig(A)
print('Eigenvalues:', eigenvalues)
print('Eigenvectors:')
print(eigenvectors)

86. How to fit a polynomial curve to noisy data?

Fit a degree-2 polynomial to the noisy data arrays x and y using np.polyfit, then evaluate it with np.polyval and print the fitted coefficients and first 5 predicted values.

Difficulty Level: L3

Solve:

# Task: Fit a quadratic polynomial to noisy data and recover coefficients
import numpy as np

np.random.seed(42)
x = np.linspace(0, 5, 10)
y = 2*x**2 - 3*x + 1 + np.random.randn(10)*0.5

# Write your code below
# Fit a degree-2 polynomial and print the coefficients

Desired Output:

Fitted coefficients [a, b, c]: [ 1.98 -2.89  1.15]
First 5 predicted values: [1.15 0.16 0.38 1.83 4.49]

Why this matters: Polynomial curve fitting is used for trend lines, simple regression, and approximating nonlinear relationships in data.

Show Solution

import numpy as np

np.random.seed(42)
x = np.linspace(0, 5, 10)
y = 2*x**2 - 3*x + 1 + np.random.randn(10)*0.5

# Fit a 2nd degree polynomial: y = ax^2 + bx + c
coeffs = np.polyfit(x, y, 2)
print('Fitted coefficients [a, b, c]:', np.round(coeffs, 2))

# Evaluate the fitted polynomial at the original x values
y_pred = np.polyval(coeffs, x)
print('First 5 predicted values:', np.round(y_pred[:5], 2))

87. How to select rows with a boolean mask and specific columns at the same time?

Select rows from array a where mask is True, and take only the first 3 columns, using combined boolean and slice indexing: a[mask, :3].

Difficulty Level: L2

Solve:

# Task: Select rows by boolean mask, then first 3 columns
import numpy as np

a = np.arange(20).reshape(4, 5)
mask = np.array([True, False, True, False])
print('Array:')
print(a)

# Write your code below

Desired Output:

[[ 0  1  2]
 [10 11 12]]

Why this matters: Combining boolean row filters with column slicing is a common pattern when filtering data by a condition and extracting specific features in one step.

Show Solution

import numpy as np

a = np.arange(20).reshape(4, 5)
mask = np.array([True, False, True, False])

# Boolean indexing for rows, integer slicing for columns
result = a[mask, :3]
print(result)

88. How to compute the outer product of two vectors?

Compute the outer product of arrays a = [1, 2, 3] and b = [4, 5, 6] using np.multiply.outer and print the resulting 3×3 matrix.

Difficulty Level: L1

Solve:

# Task: Compute outer product using np.multiply.outer
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Write your code below

Desired Output:

[[ 4  5  6]
 [ 8 10 12]
 [12 15 18]]

Why this matters: Outer products appear in attention mechanisms, covariance matrix computation, and rank-1 matrix updates.

Show Solution

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# np.multiply.outer computes the outer product
result = np.multiply.outer(a, b)
print(result)

# Also equivalent to:
# np.outer(a, b)
# np.einsum('i,j->ij', a, b)

89. How to compute a rolling/moving average of a 1D array?

Compute a 3-element moving average of array a using np.convolve with a uniform kernel of size 3 and mode='valid'.

Difficulty Level: L3

Solve:

# Task: Compute a 3-element moving average of a 1D array
import numpy as np

np.random.seed(42)
a = np.random.randint(1, 20, size=10)
print('Array:', a)

# Write your code below
# Compute a moving average with window size 3

Desired Output:

Moving average (window=3): [ 7.67  5.33  7.33  8.67  8.33 10.67 13.   12.  ]

Why this matters: Moving averages smooth noisy data and are widely used in time series analysis, stock price charts, and signal processing.

Show Solution

import numpy as np

np.random.seed(42)
a = np.random.randint(1, 20, size=10)
print('Array:', a)

# Moving average using np.convolve with a uniform kernel
window = 3
kernel = np.ones(window) / window
moving_avg = np.convolve(a, kernel, mode='valid')
print('Moving average (window=3):', np.round(moving_avg, 2))

90. How to rotate a 2D array (like an image)?

Rotate the 3×4 array img by 90 degrees counter-clockwise using np.rot90 and print the result.

Difficulty Level: L1

Solve:

# Task: Rotate a 3x4 array 90 degrees counter-clockwise
import numpy as np

img = np.arange(12).reshape(3, 4)
print('Original:')
print(img)

# Write your code below

Desired Output:

Rotated 90 degrees:
[[ 3  7 11]
 [ 2  6 10]
 [ 1  5  9]
 [ 0  4  8]]

Why this matters: Array rotation is a common operation in image processing and data augmentation for training CNNs.

Show Solution

import numpy as np

img = np.arange(12).reshape(3, 4)
print('Original:')
print(img)

rotated = np.rot90(img)
print('Rotated 90 degrees:')
print(rotated)

91. How to flip an array horizontally and vertically?

Flip the 3×4 array img left-to-right using np.fliplr and top-to-bottom using np.flipud, printing both results.

Difficulty Level: L1

Solve:

# Task: Flip a 3x4 array horizontally and vertically
import numpy as np

img = np.arange(12).reshape(3, 4)
print('Original:')
print(img)

# Write your code below

Desired Output:

Flipped left-right:
[[ 3  2  1  0]
 [ 7  6  5  4]
 [11 10  9  8]]
Flipped up-down:
[[ 8  9 10 11]
 [ 4  5  6  7]
 [ 0  1  2  3]]

Why this matters: Flipping is one of the simplest and most effective data augmentation techniques for increasing training data variety in deep learning.

Show Solution

import numpy as np

img = np.arange(12).reshape(3, 4)
print('Original:')
print(img)

print('Flipped left-right:')
print(np.fliplr(img))

print('Flipped up-down:')
print(np.flipud(img))

92. How to pad an array with zeros on all sides?

Pad the 3×3 array of ones a with one layer of zeros on all sides using np.pad with mode='constant'.

Difficulty Level: L2

Solve:

# Task: Pad a 3x3 array of ones with zeros on all sides
import numpy as np

a = np.ones((3, 3))

# Write your code below

Desired Output:

[[0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 0.]
 [0. 1. 1. 1. 0.]
 [0. 1. 1. 1. 0.]
 [0. 0. 0. 0. 0.]]

Why this matters: Zero-padding is essential in convolution operations (CNNs use it to control output size) and when aligning arrays of different sizes.

Show Solution

import numpy as np

a = np.ones((3, 3))

padded = np.pad(a, pad_width=1, mode='constant', constant_values=0)
print(padded)

93. How to create sliding windows over an array?

Create sliding windows of size 4 over the array a = np.arange(10) using sliding_window_view and print the resulting 2D array of windows.

Difficulty Level: L3

Solve:

# Task: Create sliding windows of size 4
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view

a = np.arange(10)

# Write your code below

Desired Output:

[[0 1 2 3]
 [1 2 3 4]
 [2 3 4 5]
 [3 4 5 6]
 [4 5 6 7]
 [5 6 7 8]
 [6 7 8 9]]

Why this matters: Sliding windows are used for computing rolling statistics, feature engineering on time series, and understanding how 1D convolution works under the hood.

Show Solution

import numpy as np
from numpy.lib.stride_tricks import sliding_window_view

a = np.arange(10)

windows = sliding_window_view(a, window_shape=4)
print(windows)

94. How to compute min-max normalization to scale values between 0 and 1?

Min-max normalize each column of the 2D array a to the range [0, 1] by subtracting the column min and dividing by the column range, then verify the min and max of each column.

Difficulty Level: L2

Solve:

# Task: Min-max normalize each column of a 2D array to [0, 1]
import numpy as np

np.random.seed(42)
a = np.random.randint(1, 100, size=(5, 3))
print('Original:')
print(a)

# Write your code below

Desired Output:

Min after normalization: [0. 0. 0.]
Max after normalization: [1. 1. 1.]

Why this matters: Min-max normalization is commonly used in neural networks and distance-based algorithms like KNN that are sensitive to feature magnitudes.

Show Solution

import numpy as np

np.random.seed(42)
a = np.random.randint(1, 100, size=(5, 3))
print('Original:')
print(a)

col_min = a.min(axis=0)
col_max = a.max(axis=0)
normalized = (a - col_min) / (col_max - col_min)

print('Normalized:')
print(np.round(normalized, 3))
print('Min after normalization:', normalized.min(axis=0))
print('Max after normalization:', normalized.max(axis=0))

95. How to compute a weighted average?

Compute the weighted average of scores using weights with np.average, and compare it against the simple mean from np.mean.

Difficulty Level: L2

Solve:

# Task: Compute the weighted average of exam scores
import numpy as np

scores = np.array([85, 90, 78, 92, 88])
weights = np.array([0.1, 0.2, 0.3, 0.25, 0.15])  # must sum to 1

# Write your code below

Desired Output:

Weighted average: 85.9
Simple average: 86.6

Why this matters: Weighted averages assign different importance to values and are used in portfolio returns, ensemble model predictions, and survey analysis.

Show Solution

import numpy as np

scores = np.array([85, 90, 78, 92, 88])
weights = np.array([0.1, 0.2, 0.3, 0.25, 0.15])

weighted_avg = np.average(scores, weights=weights)
simple_avg = np.mean(scores)

print('Weighted average:', weighted_avg)
print('Simple average:', simple_avg)

96. How to find the top-k similar items using cosine similarity?

Compute the cosine similarity between the query vector and each row in items using dot products and norms, then find the index of the most similar item with np.argmax.

Difficulty Level: L3

Solve:

# Task: Compute cosine similarity between a query vector and 5 item vectors
import numpy as np

np.random.seed(42)
items = np.random.rand(5, 4)  # 5 items, 4 features each
query = np.random.rand(4)      # query vector

print('Items shape:', items.shape)
print('Query:', np.round(query, 3))

# Write your code below
# Find the most similar item to the query

Desired Output:

Cosine similarities: [0.896 0.928 0.811 0.852 0.866]
Most similar item index: 1

Why this matters: Cosine similarity is the standard metric for comparing embeddings in NLP, recommendation systems, and information retrieval.

Show Solution

import numpy as np

np.random.seed(42)
items = np.random.rand(5, 4)
query = np.random.rand(4)

# Cosine similarity = dot(a,b) / (||a|| * ||b||)
dot_products = items @ query
item_norms = np.linalg.norm(items, axis=1)
query_norm = np.linalg.norm(query)

similarities = dot_products / (item_norms * query_norm)

print('Cosine similarities:', np.round(similarities, 3))
print('Most similar item index:', np.argmax(similarities))

97. How to compute the covariance matrix of a dataset?

Compute the covariance matrix of the 3-feature data array using np.cov(data.T), then print the covariance between feature 0 and feature 1 (positive) and between feature 0 and feature 2 (negative).

Difficulty Level: L3

Solve:

# Task: Compute the covariance matrix of a dataset with 3 features
import numpy as np

np.random.seed(42)
# Simulate 100 samples with 3 correlated features
data = np.random.randn(100, 3)
data[:, 1] = data[:, 0] * 2 + np.random.randn(100) * 0.5  # feature 1 correlated with feature 0
data[:, 2] = -data[:, 0] + np.random.randn(100) * 0.3      # feature 2 negatively correlated

# Write your code below

Desired Output:

Covariance matrix shape: (3, 3)
Feature 0-1 covariance (should be positive): 1.88
Feature 0-2 covariance (should be negative): -0.96

Why this matters: The covariance matrix reveals how features co-vary and is the starting point for PCA, Mahalanobis distance, and multivariate statistics.

Show Solution

import numpy as np

np.random.seed(42)
data = np.random.randn(100, 3)
data[:, 1] = data[:, 0] * 2 + np.random.randn(100) * 0.5
data[:, 2] = -data[:, 0] + np.random.randn(100) * 0.3

# np.cov expects each row to be a variable, so transpose
cov_matrix = np.cov(data.T)

print('Covariance matrix shape:', cov_matrix.shape)
print('Feature 0-1 covariance (should be positive):', round(cov_matrix[0, 1], 2))
print('Feature 0-2 covariance (should be negative):', round(cov_matrix[0, 2], 2))

98. How to perform one-step PCA using eigendecomposition?

Reduce the 4-feature data array to 2 principal components by centering the data, computing the covariance matrix, performing eigendecomposition with np.linalg.eig, selecting the top-2 eigenvectors, and projecting the data.

Difficulty Level: L4

Solve:

# Task: Reduce a 4-feature dataset to 2 principal components using PCA
import numpy as np

np.random.seed(42)
# 50 samples, 4 features
data = np.random.randn(50, 4)
data[:, 1] = data[:, 0] * 1.5 + np.random.randn(50) * 0.3
data[:, 3] = data[:, 2] * -0.8 + np.random.randn(50) * 0.2

# Write your code below
# 1. Center the data
# 2. Compute covariance matrix
# 3. Get eigenvalues/eigenvectors
# 4. Project data onto top-2 components

Desired Output:

Original shape: (50, 4)
Projected shape: (50, 2)
Variance explained by top 2 components: 89.2%

Why this matters: PCA via eigendecomposition is the foundational dimensionality reduction technique in ML, and implementing it from scratch builds deep understanding of how it works.

Show Solution

import numpy as np

np.random.seed(42)
data = np.random.randn(50, 4)
data[:, 1] = data[:, 0] * 1.5 + np.random.randn(50) * 0.3
data[:, 3] = data[:, 2] * -0.8 + np.random.randn(50) * 0.2

# 1. Center the data (subtract mean of each feature)
centered = data - data.mean(axis=0)

# 2. Compute covariance matrix
cov_matrix = np.cov(centered.T)

# 3. Eigendecomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# 4. Sort by eigenvalue (largest first) and pick top 2
sorted_idx = np.argsort(eigenvalues)[::-1]
top2_vectors = eigenvectors[:, sorted_idx[:2]]

# 5. Project data onto top-2 components
projected = centered @ top2_vectors

print('Original shape:', data.shape)
print('Projected shape:', projected.shape)
variance_explained = eigenvalues[sorted_idx[:2]].sum() / eigenvalues.sum() * 100
print(f'Variance explained by top 2 components: {variance_explained:.1f}%')

99. How to compute 2D sliding windows over a matrix?

Create 3×3 sliding windows over the 5×5 array a using sliding_window_view(a, (3, 3)), then print the shape of the result and the first window (top-left corner).

Difficulty Level: L4

Solve:

# Task: Create 3x3 sliding windows over a 5x5 array
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view

a = np.arange(25).reshape(5, 5)
print('Array:')
print(a)

# Write your code below
# Print the shape of the result and the first window (top-left corner)

Desired Output:

Windows shape: (3, 3, 3, 3)
First window (top-left):
[[ 0  1  2]
 [ 5  6  7]
 [10 11 12]]

Why this matters: 2D sliding windows are exactly what a convolution layer does in a CNN — understanding this builds intuition for how neural network filters work.

Show Solution

import numpy as np
from numpy.lib.stride_tricks import sliding_window_view

a = np.arange(25).reshape(5, 5)
print('Array:')
print(a)

# 2D sliding window: window_shape=(rows, cols)
windows = sliding_window_view(a, (3, 3))
print('Windows shape:', windows.shape)
print('First window (top-left):')
print(windows[0, 0])

100. How to do batch matrix multiplication with np.einsum?

Multiply each pair of matrices in batches A (shape 3x2x3) and B (shape 3x3x2) using np.einsum('bij,bjk->bik', A, B) so that result[i] = A[i] @ B[i].

Difficulty Level: L4

Solve:

# Task: Batch matrix multiplication using np.einsum
import numpy as np

np.random.seed(42)
A = np.random.randint(1, 5, size=(3, 2, 3))  # 3 matrices of shape 2x3
B = np.random.randint(1, 5, size=(3, 3, 2))  # 3 matrices of shape 3x2

print('A shape:', A.shape)
print('B shape:', B.shape)

# Write your code below
# Multiply each pair: result[i] = A[i] @ B[i]

Desired Output:

Result shape: (3, 2, 2)
Result:
[[[22 17]
  [26 28]]

 [[18 16]
  [26 22]]

 [[23 13]
  [25 13]]]

Why this matters: Batch matrix multiplication is essential in deep learning for computing operations like attention scores across all samples in a batch simultaneously.

Show Solution

import numpy as np

np.random.seed(42)
A = np.random.randint(1, 5, size=(3, 2, 3))
B = np.random.randint(1, 5, size=(3, 3, 2))

# 'bij,bjk->bik' means: for each batch b, multiply matrix (i,j) by (j,k)
result = np.einsum('bij,bjk->bik', A, B)
print('Result shape:', result.shape)
print('Result:')
print(result)

101. How to compute a pairwise Euclidean distance matrix without loops?

Compute the 5×5 pairwise Euclidean distance matrix for the points array (5 points in 3D) using broadcasting and np.einsum, without any Python loops.

Difficulty Level: L4

Solve:

# Task: Compute pairwise Euclidean distance matrix (no loops)
import numpy as np

np.random.seed(42)
points = np.random.rand(5, 3)  # 5 points in 3D
print('Points:')
print(np.round(points, 4))

# Write your code below
# Compute the 5x5 distance matrix using broadcasting

Desired Output:

Distance matrix:
[[0.     1.0068 0.3527 1.0164 1.0284]
 [1.0068 0.     0.9973 0.8323 0.2419]
 [0.3527 0.9973 0.     1.1285 1.0968]
 [1.0164 0.8323 1.1285 0.     0.8206]
 [1.0284 0.2419 1.0968 0.8206 0.    ]]

Why this matters: Pairwise distance matrices are the foundation of KNN, hierarchical clustering, and DBSCAN, and computing them vectorized is critical for performance.

Show Solution

import numpy as np

np.random.seed(42)
points = np.random.rand(5, 3)

# Broadcasting: diff[i,j,k] = points[i,k] - points[j,k]
diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]

# Sum squared differences over the last axis, then square root
dist_matrix = np.sqrt(np.einsum('ijk,ijk->ij', diff, diff))

print('Distance matrix:')
print(np.round(dist_matrix, 4))

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Selva Prabhakaran →

Related Course

Master Python — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course