C1000-154 STU C1000154v2STUSGC1000154

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

IBM Professional Certification

Program

Study Guide Series

Exam C1000-154: IBM Watson Data


Scientist v1
Purpose of Exam Objectives
When an exam is developed, Subject Matter Experts work together to define the role the
certified individual will fill. They define the tasks and knowledge that an individual would
need to successfully perform this job role doe the product or solution. This creates the
foundation for the objectives and measurement criteria, which form the basis of the
certification exam. Question writers then use these objectives to develop exam
questions.
It is recommended that you review these objectives and ask yourself the following questions:

• Do you know how to complete the task in the objective?


• Do you know why that task needs to be done?
• Do you know what will happen if you do it incorrectly?

If you are not familiar with a task, go through the objective, perform that task in your own
environment and read more information on the task. If there is an objective on a task,
there is a high likelihood that you WILL see a question about it on the actual exam.
Review the recommended learning designed to prepare you to take the certification
exam.
After reviewing the objectives in this guide and completing your own research, take the
assessment exam. While the assessment exam does not indicate which specific
questions were answered incorrectly, it does indicate overall performance by section.
This is a good indicator of preparedness or if further preparation is warranted.

Study Resources:
Below is a high-level list of resources to help when you are preparing for the certification
exam. This list is not exhaustive and meant to help you should you need more
information on topics listed below in the Study Guide.

https://chartio.com/learn/charts/how-to-choose-data-visualization/
https://cloud.ibm.com/apidocs/natural-language-understanding
https://crunchingthedata.com/cs01-check-data-quality/
https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/welcome-
main.html?audience=wdp
https://developer.ibm.com
https://hastie.su.domains/ISLR2/ISLRv2_website.pdf
https://learn.ibm.com/course/view.php?id=8710
https://scikit-learn.org/stable/#
https://mlops-guide.github.io
https://seaborn.pydata.org/
https://www.ibm.com/products/cloud-pak-for-data/governance
https://www.ibm.com/cloud/blog/ai-vs-machine-learning-vs-deep-learning-vs-neural-
networks
https://www.ibm.com/docs/en/cloud-paks/cp-data
https://www.ibm.com/docs/en/cognos-analytics
https://www.ibm.com/docs/en/iis
https://www.ibm.com/docs/en/spss-modeler
https://www.ibm.com/docs/en/wmla
https://www.ibm.com/garage/method/

Section 1 - Understand the business problem


1.1 Help business articulate and define business problems
• Understand the CRISP Methodology
• Explain how IBM Garage Methodology works

1.2 Identify analytic techniques to address requirements


• Align on user intents for a solution
• Determine upskill requirements
• Assess feasibility of solution(s)
• Define key metrics

Section 2 – Collect and explore the data

2.1 Identify appropriate data sources


• Understand what data sources are available
• Browse data assets using Watson Knowledge Catalog
• Anticipate additional data sources that might be relevant

2.2 Collect data


• Add data assets from catalog to project (Watson Knowledge Catalog and Cloud Pak
for Data)
• Collect additional data
• Use SQL to fetch data from data warehouse
• Scrape data from webpage
• Use Python APIs for external data
• Profile and visualize data using Watson tools
2.3 Assess data quality
• Understand what data quality is
• Analyze data quality in WKC and CPD

2.4 Perform exploratory data analysis (EDA)


• Determine steps for EDA
• Use pandas in Jupyter notebook for ED

2.5 Connect and ingest all data sources


• Demonstrate knowledge of ETL process
• Connect to data sources using Cloud Pak for Data

Section 3 – Prepare the data

3.1 Preprocess and combine data from various data sources


• Identify potential issues with data
• Employ dimensionality reduction techniques for volume reduction
• Transform the data based on model requirements

3.2 Clean and validate the data


• Describe several methods for replacing missing values in data
• Describe several methods for detecting outliers in data
• Describe class imbalance and ways to avoid it
• Deduplicate data

3.3 Data integration


• Choose a method of data integration
o Data consolidation
o Data propagation
o Data virtualization
• Demonstrate working knowledge of SQL
o Data management
o Data manipulation
• Use a variety of tools to merge data from different sources
o SQL Join
o Pandas Merge
o SPSS Merge Node

3.4 Feature selection and engineering


• Identify and extract key features
o SPSS feature selection node
o Python sklearn feature selection
o Watson NLP APIs
o Avoid feature leakage
• Describe several methods of feature engineering
o Encoding
o Embedding
o Scaling
o Dimensionality reduction for model optimization

Section 4 – Build the model

4.1 Select the right model class and toolset


• Demonstrate understanding of different types of machine learning and related
algorithms
o Supervised (Regression/Classification)
o Unsupervised (Clustering)
• Differentiate between machine learning and deep learning and describe when to
use each
• Select a small number of algorithms based on model requirements or use AutoAI
• Select a tool based on algorithm requirements and expertise

4.2 Split data


• Partition data into train data and test data
o Create data splits that are reproducible
o Stratified split in case of imbalanced data
• Understand the risk of data leakage for model training
• Understand and implement cross-validation

4.3 Create models


• Implement Supervised Learning: Regression
o Linear regression
o Ridge/ Lasso regression
o Logistic regression
o Random forest regression
• Implement Supervised Learning: Classification
o K-nearest neighbor (KNN)
o Random forest, decision tree
o Support Vector Machines (SVM)
o Naïve Bayes
• Describe several ensemble methods
o Bagging
o Boosting
o Stacking
• Implement Unsupervised Learning: Clustering
o k-means clustering
o Gaussian Mixture Model
• Implement Deep Learning models
o Deep Neural network
o Recurrent neural network (RNN)
o Convolution neural network (CNN)
o Long short-term memory (LSTM)
• Watson Studio on Cloud Pak for Data as a Service
o AutoAI
o SPSS Modeler
o Deep Learning experiment

Section 5 – Evaluate the model

5.1 Perform hyperparameter tuning


• Understand hyperparameters for various algorithms
o Regression
o Classification
o Clustering
o Recommendation engines
o Deep Learning
• Describe the trade-offs between underfitting and overfitting a model Avoid
underfitting or overfitting by splitting the data into training, testing, and validation
sets
• Explain the effect of hyperparameters and hyperparameter tuning
o Tuning is a trial-and-error process
o Tuning is based on the training output loss value
o Learning rate, number of epochs, hidden layers, hidden units, activation
o Functions
o AutoAI hyperparameter optimization
• Summarize search algorithms
o Grid Search
o Random Search
o Bayesian Optimization

5.2 Compare the performance of different models


• Different metrics for Regression Models
• Different metrics for Classification Models
o Confusion matrix
o AUC measures
o ROC curve
o Precision
o Recall
o F1-score
• Choose the best model
o Performance
o Explainability
o Complexity
o Dataset size

Section 6 – Deploy the solution

6.1 Understand deployment environment considerations


• Understands how to use libraries in Python
• Know which libraries are available in Cloud Pak for Data by default (e.g. Spark)
• Understand resources

6.2 Create data pipelines to automate model lifecycle


• Understand the difference between batch processing and streaming
• Know the different data sources available in Cloud Pak for Data
• Managing (reading and writing) to different Cloud Pak for Data Services (Watson
Studio, WKC, Data Virtualization)
• Automate data processing and model deployment with jobs in Watson Studio

6.3 Deploy models in a production setting


• Deploy models to Watson Machine Learning
o Deploy in Watson Machine Learning using notebooks
o Manage models with Watson Machine Learning
o Understand CI/CD

6.4 Validate model performance to business outcomes


• Understand application testing methods.
o A/B Testing
Multivariate testing

Section 7 – Governance and compliance

7.1 Govern and manage data


• Understand the governance artifacts in Watson Knowledge Catalog
• Apply data protection to data
o Obfuscating vs redacting
o Access roles and permissions

7.2 Govern and manage models


• Manage model deployments
• Evaluate model bias

Section 8 – Visualization and Storytelling

8.1 Utilize appropriate visualizations and tools


• Implement visualization using tools
o Cognos Dashboards
o Opensource visualizations
• Employ the type of visualization

8.2 Articulate findings to business community


• Match data literacy of your audience
• Communicate using stories

Section 9 – Strategy and Lifecycle

9.1 Understand and utilize the Data Science/AI Lifecycle


• Understand the AI Ladder
• Explain the challenges adopting AI
• Assess progress in infusing AI into the organization
• Understand the stages of AI lifecycle
• Understand design thinking for modern organizations

9.2 Collaborate with IT on technical and data architectures


• Understand the relationship between data and cloud architectures
• Define the relationship between data architecture and AI adoption
9.3 Illustrate the value of governed data
• Articulate value of common, consistent, and trusted data
o Data quality
o Common sourcing
o Consistent transformation

9.4 Understand and articulate IBM Cloud Pak for Data value propostion
• Understand the scalability and flexibility of a modern cloud architecture
• Understand deployment at scale with trust and transparency
• Explain self-service analytics
• IBM Cloud Continuous Delivery

Next Steps
1. Take the assessment test for IBM Watson Data Scientist v1
2. If you pass the assessment exam, visit pearsonvue.com/ibm to schedule your
testing sessions.
If you failed the assessment exam, review how you did by section. Focus attention on the
sections where you need improvement. Keep in mind that you can take the assessment
exam as many times as you would like ($30 per exam); however, you will still receive the
same questions only in a different order.

You might also like