305 BA PYTHON - APR 2022 ANSWER Key

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Total No. of Questions SEAT No.

:
: 5]
[5860]-318 [Total No. of Pages
P6895 S.Y.M.B.A. :2
305 BA SC-BA-04 : MACHINE LEARNING & COGNITIVE
INTELLIGENCE USING PYTHON
- By Pratik Patil

Time : 2½ Hours] [Max. Marks :


50
Instructions to the candidates:
1) Figures to the right indicate full marks.
2) Assume suitable data if necessary.
3) Draw neat diagrams wherever necessary.
4) All questions are compulsory.

Q1) Solve any five:


a) State how to define variable in python?
b) Identify any two features of machine learning.
c) List various loops in python.
d) List any two differences between lists and sets.
e) What do you mean by operator overloading in python?
f) Define the term cognitive intelligence.
g) Identify the steps of CRISP - DM Methodology.
h) What do you mean by data visualisation?

a) Defining variables in Python:


• Direct assignment: Simply assign a value to a variable name using =.
Python
age = 30
name = "Alice"
• No explicit declaration: Python infers data types, so no need to declare them
beforehand.
b) Two features of machine learning:
1. Learning from data: Algorithms improve performance with experience
without explicit programming.
2. Generalization: Models make predictions on new, unseen data, not just data
used for training.
c) Loops in Python:
• for loop: Iterates over a sequence (list, tuple, string, etc.).
Python
for item in fruits:
print(item)
• while loop: Repeats a block as long as a condition is True.
Python
count = 0
while count < 5:
print(count)
count += 1
d) Differences between lists and sets:
1. Ordering: Lists are ordered, preserving the sequence of elements. Sets are
unordered, so element position doesn't matter.
2. Duplicates: Lists allow duplicates, while sets only store unique elements.
e) Operator overloading in Python:
• Allowing custom behavior for built-in operators like +, -, *, / when used
with user-defined classes.
• Example: Defining + to concatenate strings in a custom class.
f) Cognitive intelligence:
• Ability of a system to understand and process information in a manner
similar to human thinking, including:
o Learning
o Reasoning
o Problem-solving
o Decision-making
g) Steps of CRISP-DM Methodology:
1. Business Understanding: Defining business objectives and understanding
the problem.
2. Data Understanding: Exploring and preparing the data.
3. Data Preparation: Cleaning, formatting, and structuring data for analysis.
4. Modeling: Building and evaluating predictive models.
5. Evaluation: Assessing model performance and understanding results.
6. Deployment: Implementing the model in a production environment.
h) Data visualization:
• Representing data graphically to:
o Explore patterns, trends, and relationships
o Communicate insights effectively
o Common types: bar charts, line graphs, scatter plots, pie charts,
heatmaps, etc.
Q2) Solve any two:
a) Describe Numpy Arrays. Explain with example.

NumPy arrays are powerful tools for working with numerical data in Python. They
offer several advantages over standard Python lists:
• Efficiency: NumPy arrays are optimized for fast operations on large
datasets, making them essential for scientific computing and data analysis.
• Flexibility: They can handle multi-dimensional data, representing matrices,
vectors, and tensors.
• Broadcasting: NumPy allows operations on arrays of different shapes,
making it easy to perform element-wise calculations.
• Mathematical functions: It provides a vast library of mathematical functions
for linear algebra, Fourier transforms, random number generation, and
more.
Here are some examples:
1. Creating arrays:
Python
import numpy as np

# 1-dimensional array
numbers = np.array([1, 2, 3, 4, 5])

# 2-dimensional array (matrix)


matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
2. Accessing elements:
Python
# Access the third element of the 1-D array
print(numbers[2]) # Output: 3

# Access the element at row 1, column 2 of the matrix


print(matrix[1, 2]) # Output: 6
3. Operations on arrays:
Python
# Sum of all elements in the 1-D array
sum_of_numbers = np.sum(numbers)

# Mean of the elements in the 1-D array


mean_of_numbers = np.mean(numbers)

# Transpose of the matrix


transposed_matrix = matrix.T
NumPy arrays are fundamental to scientific computing in Python. They are also
extensively used in machine learning and data analysis libraries like Pandas and
Scikit-learn.
b) Distinguish between clustering and classification in machine learning.
Both clustering and classification are fundamental techniques in machine learning
used to uncover patterns and group data points. However, they have key
differences in their approach and goals:
Clustering:
• Unsupervised learning: No pre-existing labels or categories are provided to
the data.
• Grouping similar data points: Clusters are formed based on inherent
similarities between data points, like distances or shared features.
• Exploration and discovery: Used to identify hidden structures and patterns
in unlabeled data, often for initial data exploration or customer
segmentation.
Classification:
• Supervised learning: Requires a labeled training dataset where each data
point has a pre-assigned class label.
• Predicting class labels: The model learns to map data points to their
corresponding class labels based on the training data.
• Prediction and decision-making: Used for tasks like spam filtering, image
recognition, or customer churn prediction.
Here's an analogy to illustrate the difference:
• Imagine sorting fruits in a basket:
o Clustering: You group fruits based on their similarities, like color,
shape, or taste, without any pre-existing labels.
o Classification: You have labels like "apple," "banana," and "orange,"
and you sort the fruits based on their known characteristics to assign
the correct label to each.
Here's a table summarizing the key differences:
Feature Clustering Classification
Learning Unsupervised Supervised
type
Data labels No pre-existing labels Requires labeled training data
Goal Group similar data points Predict class labels for new
data points
Application Data exploration, customer Prediction, decision-making
segmentation
c) Discuss the Reinforcement learning with example.

Reinforcement learning (RL) is a type of machine learning where an agent learns


by interacting with its environment, receiving rewards for actions that lead to
desired outcomes. It's inspired by how humans and animals learn through trial and
error, gradually adjusting their behavior to achieve goals.
Key components of RL:
• Agent: The learner, making decisions and taking actions.
• Environment: The world in which the agent interacts, providing feedback in
the form of rewards and observations.
• Actions: Choices the agent can make to impact its environment.
• Rewards: Numerical signals indicating the desirability of an action's
outcome.
• Policy: A mapping of states to actions, guiding the agent's behavior.
• Value function: Estimates the expected future reward for being in a given
state or taking a particular action.
Example: Robot Navigation
• Agent: A robot.
• Environment: A maze.
• Actions: Moving forward, turning left, turning right.
• Rewards: +1 for reaching the goal, -1 for hitting a wall.
• Policy: The robot's strategy for navigating the maze.
• Value function: The robot's estimation of how good it is to be in a particular
location in the maze.
Learning process:
1. The robot starts in a random location and takes an action based on its initial
policy.
2. It observes the next state (its new location) and receives a reward from the
environment.
3. It updates its policy and value function based on this experience.
4. It repeats steps 1-3, gradually learning to navigate the maze efficiently to
reach the goal.
Common RL algorithms:
• Q-learning
• Deep Q-networks (DQN)
• Policy gradients
• Actor-critic methods
Applications of RL:
• Robotics
• Game playing
• Recommender systems
• Finance
• Control systems
• Natural language processing
• Healthcare
• Robotics
• And more
Q3) Solve any one:
a) Explain the decision tree algorithm in machine learning with example.
Here's an explanation of decision trees in machine learning, with an example and a
visual:
Decision trees are a supervised learning algorithm that builds a tree-like model of
decisions and their possible consequences. They're used for classification and
regression tasks, making predictions by recursively asking questions about the data
features.
Key concepts:
• Root node: The topmost node, representing the entire dataset.
• Internal nodes: Nodes that represent decisions, splitting the data based on
feature values.
• Leaf nodes: Terminal nodes representing the final class labels
(classification) or predicted values (regression).
• Branches: Connections between nodes, indicating possible outcomes of
decisions.
Example: Predicting whether to play tennis based on weather conditions:
Decision tree:

[Image of a decision tree with the following structure:

• Root node: Outlook (Sunny, Overcast, Rainy)


• Children of Sunny: Humidity (High, Normal)
• Children of Overcast: Play (Yes)
• Children of Rainy: Wind (Strong, Weak)
• Children of Rainy, Weak: Play (Yes)
• Leaf nodes: Play (No) under High humidity, and Play (Yes) under Normal
humidity and Strong wind]
How it works:
1. Start at the root node: Consider the entire dataset.
2. Split based on the best feature: Choose the feature that best separates the
data (e.g., "Outlook").
3. Create branches for each feature value: Divide the data based on the
possible values (e.g., "Sunny," "Overcast," "Rainy").
4. Repeat for each child node: Continue splitting based on the most
informative features until reaching leaf nodes with clear predictions.
5. Make predictions: To classify a new data point, follow the tree's decisions
based on its feature values until reaching a leaf node, which provides the
prediction.
Advantages:
• Easy to understand and interpret.
• Handle both numerical and categorical data.
• Not sensitive to outliers.
• Require little data preparation.
Disadvantages:
• Prone to overfitting if not properly pruned.

• Can be unstable with small changes in the data.


Applications:
• Loan approval

• Fraud detection
• Medical diagnosis
• Customer segmentation
• Risk assessment
• And more

b) Explain the concept of simple and multiple regression.


Simple and Multiple Regression Explained:
Both simple and multiple regression are statistical techniques used to analyze the
relationship between variables. However, they differ in the number of independent
variables used to predict the dependent variable:
Simple Regression:
• One independent variable: Explains the relationship between two variables -
a dependent variable (what you want to predict) and an independent variable
(what you think influences the dependent variable).
• Equation: y = mx + b, where:
o y is the dependent variable.
o x is the independent variable.
o m is the slope of the line representing the relationship.
o b is the y-intercept of the line where it crosses the y-axis.
• Example: Predicting house prices based on square footage.
Multiple Regression:
• Multiple independent variables: Explains the relationship between one
dependent variable and two or more independent variables.
• Equation: y = m1x1 + m2x2 + ... + mnxn + b, where:
o y is the dependent variable.
o x1, x2, ..., xn are the independent variables.
o m1, m2, ..., mn are the slopes for each independent variable.
o b is the y-intercept.
• Example: Predicting student grades based on study hours, exam difficulty,
and class attendance.
Here's a table summarizing the key differences:
Feature Simple Regression Multiple Regression
Number of 1 2 or more
independent
variables
Equation y = mx + b y = m1x1 + m2x2 + ... +
mnxn + b
Interpretation Slope m shows the effect of Multiple slopes m1, m2, ...,
changing the independent mn show the effect of each
variable on the dependent independent variable, holding
variable others constant
Application Useful for understanding More powerful for analyzing
simple relationships with complex relationships with
one influencing factor multiple contributing factors
In conclusion:
• Simple regression is a good starting point for understanding basic
relationships.
• Multiple regression provides a more comprehensive analysis when several
factors influence the outcome.
Remember, choosing the right technique depends on the specific research question
and data you have.

Q4) Solve any one:


a) Discuss how the clustering is useful in marketing domain?
Clustering, a powerful unsupervised learning technique, is incredibly valuable in
the marketing domain. It helps marketers uncover hidden patterns and group
customers or data points based on shared characteristics, leading to several
powerful applications:

1. Customer Segmentation:

Imagine a vast sea of customers; clustering helps marketers segment them into
distinct groups based on demographics, purchase history, website behavior, or
other relevant factors. This allows for:

• Targeted marketing campaigns: Each segment can receive personalized


messaging, offers, and recommendations tailored to their interests and
needs, increasing campaign effectiveness and ROI.
• Efficient resource allocation: Resources can be focused on high-value
segments with the greatest potential for conversion or engagement.
2. Product Development and Pricing:

By analyzing customer clusters, marketers can gain insights into product


preferences and price sensitivity. This valuable information can be used to:

• Develop products that resonate with specific segments: Understanding what


each segment values helps create products that cater to their unique needs
and preferences.
• Optimize pricing strategies: Different segments may be willing to pay
varying prices for the same product. Clustering helps identify optimal
pricing strategies for each segment to maximize revenue.
3. Churn Prediction and Customer Retention:

Clustering can identify customer segments at risk of churn based on past behavior
or demographics. This allows marketers to:

• Implement targeted retention campaigns: Proactive outreach with


personalized offers or incentives can encourage at-risk customers to stay.
• Improve customer lifetime value: By understanding the factors that
contribute to churn, marketers can address them to retain valuable
customers for longer.
4. Market Research and Trend Identification:

Clustering can analyze large datasets of social media conversations, website


traffic, or survey responses to:

• Identify emerging trends and consumer preferences: Understanding what


different segments are talking about or tertarik in can inform product
development, marketing strategies, and brand positioning.
• Discover new market opportunities: Clusters may reveal previously
unknown segments with unique needs or untapped potential.
Here's an example of how clustering can be used in marketing:

Imagine an online clothing retailer. They can cluster their customers based on
purchase history, browsing behavior, and demographics. This might reveal:

• A cluster of young professionals who buy trendy clothes.


• A cluster of families with children who prioritize practical and comfortable
clothing.
• A cluster of older adults who prefer classic and timeless styles.

The retailer can then use this information to:


• Send targeted email campaigns with relevant product recommendations to
each cluster.
• Develop new clothing lines specifically for each segment.
• Optimize their website and marketing materials to appeal to each segment's
preferences.

By leveraging the power of clustering, marketers can gain valuable insights into
their customers, optimize their marketing efforts, and ultimately drive business
growth.

b) Analyse K - Nearest Neighbour algorithm for machine learning.


Analyzing the K-Nearest Neighbors (KNN) Algorithm
K-Nearest Neighbors (KNN) is a simple yet powerful supervised learning
algorithm used for both classification and regression tasks. It works by comparing
a new data point to its k nearest neighbors in the training dataset and predicting its
class or value based on the majority vote or average of those neighbors.
Strengths:
• Easy to understand and implement: KNN involves minimal complex
calculations, making it accessible to beginners.
• Non-parametric: KNN doesn't assume any underlying data distribution,
making it flexible for various data types.
• Effective for multi-class problems: KNN performs well with numerous class
labels, unlike some algorithms limited to binary classification.
• Robust to outliers: KNN isn't overly sensitive to outliers in the training data.
Weaknesses:
• Computationally expensive: KNN requires comparing the new data point to
all training points during prediction, impacting efficiency for large datasets.
• Curse of dimensionality: Performance can significantly decrease in high-
dimensional data due to increased distance calculations.
• Choice of k can be crucial: Selecting the optimal k value significantly
impacts accuracy and requires experimentation.
• No model interpretation: KNN offers little insight into the relationships
between features and the target variable.
Applications:
• Recommendation systems: Recommending similar products or content
based on user preferences.
• Image recognition: Classifying images based on their similarities to labeled
examples.
• Anomaly detection: Identifying data points significantly different from their
neighbors, potentially indicating anomalies.
• Finance: Predicting loan defaults or fraud based on customer information.
Overall, KNN is a versatile and easy-to-use algorithm with inherent strengths and
weaknesses. Its simplicity and non-parametric nature make it a valuable tool for
various tasks, but its computational cost and dependence on k require careful
consideration for optimal performance.
Here are some additional points to consider for a more thorough analysis:
• Variants of KNN: Different versions of KNN exist, such as weighted KNN
or KNN with distance functions, offering potential improvements in specific
scenarios.
• Hybrid approaches: KNN can be combined with other algorithms, like
ensemble methods, to leverage its strengths and address its limitations.
• Parameter tuning: Optimizing the k value and other parameters is crucial for
KNN's effectiveness, requiring appropriate evaluation techniques.

Q5) Solve any one:


a) Design a code in python to print the following pattern.
*
* *
* * *
* * * *
* * * * *

Python code to print the pattern:

rows = 5
for i in range(1, rows + 1):
for j in range(1, i + 1):
print("*", end=" ")
print()
output:

*
**
***
****
*****

b) "Machine learning will make companies more efficient and allow them to
streamline business processes of an organisation". Justify the statement.
I completely agree with the statement that "Machine learning will make companies
more efficient and allow them to streamline business processes of an
organization." Here's why:
Increased Efficiency:
• Automation: Machine learning automates repetitive tasks like data entry,
analysis, and reporting, freeing up human employees for more strategic
work.
• Improved Accuracy: Algorithms can analyze vast amounts of data and
identify patterns humans might miss, leading to more accurate decision-
making and less wasted resources.
• Predictive Capabilities: Machine learning can predict future events like
customer churn, equipment failures, and market trends, allowing companies
to proactively address challenges and optimize operations.
Streamlined Business Processes:
• Personalization: Recommendation engines and targeted marketing
campaigns can personalize the customer experience, leading to increased
customer satisfaction and loyalty.
• Resource Optimization: By analyzing data on resource usage, machine
learning can optimize inventory management, scheduling, and logistics,
minimizing downtime and costs.
• Reduced Errors: Algorithmic decision-making reduces human error in areas
like fraud detection, anomaly detection, and quality control, improving
overall process efficiency.
Examples:
• A manufacturing company uses machine learning to predict equipment
failures and schedule preventative maintenance, reducing downtime and
production losses.
• A retail store uses recommendation engines to suggest products to
customers based on their purchase history, increasing sales and customer
satisfaction.
• A bank uses machine learning to detect fraudulent transactions in real-time,
protecting its customers and reducing financial losses.
While concerns exist about potential job displacement, machine learning should be
seen as a tool to augment human capabilities, not replace them. By focusing on
tasks where machines excel and allowing humans to handle the creative and
strategic aspects, organizations can achieve significant increases in efficiency and
streamline their business processes.
Overall, the advantages of machine learning for improving efficiency and
streamlining business processes are undeniable. Companies that embrace this
technology are well-positioned to gain a competitive edge in the years to come.
Do you have any specific examples of how you envision machine learning being
used to streamline business processes? I'd be happy to discuss further.

You might also like