Sample MCQs

Question One:

Please select the most appropriate answer and write it on the answer booklet.
1. In SQL, a-------------------- subquery always returns two or more columns and single row.
a. row b. scalar c. table d. none of the above

2. is a prevalent approach to recommender systems where the user will be recommended

items similar to the ones the user preferred in the past.
a. Content-based b. Collaborative filtering c. Non-collaborative d. none of the above
recommendation filtering
3. The scaling out approach to increasing the capacity of relational DBMSs has the following advantages over
the scaling up approach: ----------------- --.
a. Data will be resilient b. More concurrency c. More processing & d. none of the above
storage capacity for
the same cost
4. Spearman’s ρ is equivalent to Pearson’s R² measure of correlation between two continuous variables but
applies to -------------------- data.
a. categorical b. ordinal c. interval d. ratio
5. Document databases differ from relational databases in that ------------------------------------ --.

a. each document in a b. the values in the c. both a. & b. above d. none of the above
collection can have a document need not be
different set of keys atomic
6. The ----------stage of the map reduce programming pattern operates on the partial results produced by the
preceding stage.
a. reduce b. map c. design d. none of the above
7. In contrast with operational systems, data warehousing systems – where responsiveness of large queries is
the critical design factor – use the------------------------------- data model.
a. un-normalized b. normalized c. the un-normalized d. all of the above
multidimensional multidimensional single dimensional
model model
8. Additive measures in a data cube are numeric values associated with facts that ----------- --.
a. can be meaningfully b. cannot be c. cannot be d. none of the above
combined along any meaningfully meaningfully
dimension combined along combined along any
arbitrary dimensions dimensions
9. In the K-means clustering algorithm, the number of clusters ------------------------------- --.

a. is computed by the b. needs to be specified c. is equal to the number d. both a. & c. above
algorithm in advance of data points
10. Relying on Euclidean distance to determine document similarity is -------------------- especially in very high
dimensionality.
a. very unreliable b. very reliable c. is the preferred d. none of the above
approach
Question Two:
Please select the best and most appropriate answer and write it on the answer booklet.
1. Which type of subquery can be used within the following query?

select name from employee where id != (<subquery>)
a. table b. scalar c. row d. none of the above
2. In recommender systems, when computing the similarity between a favourite item x and another
item y, a feature Fi is calculated as the fraction of the values of y in x, if Fiis a property.
a. multi-valued b. single-valued c. single-valued numer- d. single-valued, non-
ical numerical property
3. Unlike multidimensional databases, databases are used to store unaggregated
data.
a. Key-value b. Document c. NoSQL d. Relational
4. In , the data values themselves are not used, but are used to give a rank ordering of
the data items.
a. Pearson’s correlation b. χ 2 test c. k-NN classification d. Spearman's
correlation
5. The performance of replicated systems can be improved by combining replication with .
a. backup and recovery b. sharding c. transaction manage- d. CRUD operations
ment
6. According to the CAP theorem, means that each update should appear to be instan-
taneous.
a. atomic consistency b. availability c. tolerance to network d. durability
partition
7. is part of data integration and is concerned with ensuring domain consistency.
a. Format integration b. Semantic Integra- c. Data cleansing d. Data scrubbing
tion
8. A commonly used method to find an empirical optimal k in the k-NN classification algorithm is the
cross-validation method which checks how well the algorithm classifies data.
a. the new b. its own training c. un-labeled d. un-classified
9. The find_one() method in pyMongo places its results in a python .
a. list b. dict c. dataframe d. none of the above
10. distance is considered a more appropriate measure of textual document similarity
than distance.
a. Cosine, Euclidean b. Euclidean, Cosine c. Vetical. horizontal d. Horizontal, vertical
Question Three:
1. All SQL transaction Isolation levels guarantee that the _____________ problem will never occur.
a. Lost update b. Dirty read c. Non-repeatable d. none of the above
read
2. In SQL, a _________ subquery always returns a single column and a single row, that is, a single
value
a. row b. table c. scalar d. none of the above
3. According to Rob Kitchin's characterization of data, the structure of data can be : ____________.
a.structured b. semi-structured c. unstructured d. any of the above
4. A non-empty cell in a cube is called a fact and can contain several ________________
a. cubes b. tables c. measures d. relations
5. Spreadsheets are easily shared, possibly outside the business for which they were created which
creates a problem of data ____________.
a. cleansing b. sharing c. control d. acquisition
6. Replication is mostly concerned with data resilience whereas ________ is concerned with
capacity and performance.
a. sharding b. centralisation c. the user interface d. none of the above
7. In a data warehouse hypercube, _________ measure values can be combined along

some but not all dimensions.
a. semi-additive b. additive c. non additive d. all of the above
8. Classification tasks like identifying email spam and classifying credit applications, which
use la-belled training data to try to classify unseen instances, are known as
_______________ tasks
a. regression b. unsupervised c. supervised learning d. clustering
learning
9. A relation is in First Normal Form (1NF) if each attribute contains only ______ values,
that is, it has no repeating groups of values.
a. atomic b. numerical c. string d. none of the above
10. Which of the following descriptive statistical techniques requires no training data ?
a. classification b. clustering c. KNN d. all of the above
Question Four:
1. Which data quality problem is being exhibited by a data set that contains dates using
different calendars in the same column?
a. lack of b. lack of uniformity c. lack of validity d. lack of accuracy
completeness
2. In SQL, a _________ subquery always returns a single column and a single row, that is, a single
value
a. row b. table c. scalar d. none of the above
3. The primary responsibilities for processing personal data include: __________________.
a. identity b. accuracy c. security d. all of the above
4. A non-empty cell in a cube is called a fact and can contain several ________________
a. cubes b. tables c. measures d. relations
5. Which one of the following approaches is used to handle dirty data?

a. Fix it b. Remove it c. Both a and b d. None of the above
6. Replication is mostly concerned with data resilience whereas ________ is concerned with
capacity and performance.
a. sharding b. centralisation c. the user interface d. none of the above
7. __________ is concerned with segmenting a diverse group of data into a number of similar sub-
groups.
a. clustering b. correlation c. combination d. regression
8. __________ is part of data integration and is concerned with ensuring domain consistency.
a. Format integration b. Semantic Integra- c. Data cleansing d. Data scrubbing

tion
9. A relation is in First Normal Form (1NF) if each attribute contains only ______ values,
that is, it has no repeating groups of values.
a. atomic b. numerical c. string d. none of the above
10. Which of the following descriptive statistical techniques requires no training data ?
a. list b. dict c. dataframe d. none of the above

Sample MCQs

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Sample MCQs

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sample MCQs

Uploaded by

Copyright:

Available Formats

Question One:

2. is a prevalent approach to recommender systems where the user will be recommended

5. Document databases differ from relational databases in that ------------------------------------ --.

9. In the K-means clustering algorithm, the number of clusters ------------------------------- --.

1. Which type of subquery can be used within the following query?

7. In a data warehouse hypercube, _________ measure values can be combined along

5. Which one of the following approaches is used to handle dirty data?

a. Format integration b. Semantic Integra- c. Data cleansing d. Data scrubbing

You might also like