Sample MCQs

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Question One:

Please select the most appropriate answer and write it on the answer booklet.
1. In SQL, a-------------------- subquery always returns two or more columns and single row.
a. row b. scalar c. table d. none of the above

2. is a prevalent approach to recommender systems where the user will be recommended


items similar to the ones the user preferred in the past.
a. Content-based b. Collaborative filtering c. Non-collaborative d. none of the above
recommendation filtering

3. The scaling out approach to increasing the capacity of relational DBMSs has the following advantages over
the scaling up approach: ----------------- --.
a. Data will be resilient b. More concurrency c. More processing & d. none of the above
storage capacity for
the same cost
4. Spearman’s ρ is equivalent to Pearson’s R² measure of correlation between two continuous variables but
applies to -------------------- data.
a. categorical b. ordinal c. interval d. ratio

5. Document databases differ from relational databases in that ------------------------------------ --.


a. each document in a b. the values in the c. both a. & b. above d. none of the above
collection can have a document need not be
different set of keys atomic

6. The ----------stage of the map reduce programming pattern operates on the partial results produced by the
preceding stage.
a. reduce b. map c. design d. none of the above

7. In contrast with operational systems, data warehousing systems – where responsiveness of large queries is
the critical design factor – use the------------------------------- data model.
a. un-normalized b. normalized c. the un-normalized d. all of the above
multidimensional multidimensional single dimensional
model model

8. Additive measures in a data cube are numeric values associated with facts that ----------- --.
a. can be meaningfully b. cannot be c. cannot be d. none of the above
combined along any meaningfully meaningfully
dimension combined along combined along any
arbitrary dimensions dimensions

9. In the K-means clustering algorithm, the number of clusters ------------------------------- --.


a. is computed by the b. needs to be specified c. is equal to the number d. both a. & c. above
algorithm in advance of data points

10. Relying on Euclidean distance to determine document similarity is -------------------- especially in very high
dimensionality.
a. very unreliable b. very reliable c. is the preferred d. none of the above
approach
Question Two:

Please select the best and most appropriate answer and write it on the answer booklet.

1. Which type of subquery can be used within the following query?


select name from employee where id != (<subquery>)
a. table b. scalar c. row d. none of the above
2. In recommender systems, when computing the similarity between a favourite item x and another
item y, a feature Fi is calculated as the fraction of the values of y in x, if Fiis a property.
a. multi-valued b. single-valued c. single-valued numer- d. single-valued, non-
ical numerical property
3. Unlike multidimensional databases, databases are used to store unaggregated
data.
a. Key-value b. Document c. NoSQL d. Relational
4. In , the data values themselves are not used, but are used to give a rank ordering of
the data items.
a. Pearson’s correlation b. χ 2 test c. k-NN classification d. Spearman's
correla- tion
5. The performance of replicated systems can be improved by combining replication with .
a. backup and recovery b. sharding c. transaction manage- d. CRUD operations
ment
6. According to the CAP theorem, means that each update should appear to be instan-
taneous.
a. atomic consistency b. availability c. tolerance to network d. durability
partition
7. is part of data integration and is concerned with ensuring domain consistency.
a. Format integration b. Semantic Integra- c. Data cleansing d. Data scrubbing
tion
8. A commonly used method to find an empirical optimal k in the k-NN classification algorithm is the
cross-validation method which checks how well the algorithm classifies data.
a. the new b. its own training c. un-labeled d. un-classified
9. The find_one() method in pyMongo places its results in a python .
a. list b. dict c. dataframe d. none of the above
10. distance is considered a more appropriate measure of textual document similarity
than distance.
a. Cosine, Euclidean b. Euclidean, Cosine c. Vetical. horizontal d. Horizontal, vertical
Question Three:
Please select the best and most appropriate answer and write it on the answer booklet.

1. All SQL transaction Isolation levels guarantee that the _____________ problem will never occur.
a. Lost update b. Dirty read c. Non-repeatable d. none of the above
read
2. In SQL, a _________ subquery always returns a single column and a single row, that is, a single
value
a. row b. table c. scalar d. none of the above

3. According to Rob Kitchin's characterization of data, the structure of data can be : ____________.
a.structured b. semi-structured c. unstructured d. any of the above

4. A non-empty cell in a cube is called a fact and can contain several ________________
a. cubes b. tables c. measures d. relations

5. Spreadsheets are easily shared, possibly outside the business for which they were created which
creates a problem of data ____________.
a. cleansing b. sharing c. control d. acquisition

6. Replication is mostly concerned with data resilience whereas ________ is concerned with
capacity and performance.
a. sharding b. centralisation c. the user interface d. none of the above

7. In a data warehouse hypercube, _________ measure values can be combined along


some but not all dimensions.
a. semi-additive b. additive c. non additive d. all of the above

8. Classification tasks like identifying email spam and classifying credit applications, which
use la-belled training data to try to classify unseen instances, are known as
_______________ tasks
a. regression b. unsupervised c. supervised learning d. clustering
learning
9. A relation is in First Normal Form (1NF) if each attribute contains only ______ values,
that is, it has no repeating groups of values.
a. atomic b. numerical c. string d. none of the above

10. Which of the following descriptive statistical techniques requires no training data ?
a. classification b. clustering c. KNN d. all of the above
Question Four:
Please select the best and most appropriate answer and write it on the answer booklet.

1. Which data quality problem is being exhibited by a data set that contains dates using
different calendars in the same column?
a. lack of b. lack of uniformity c. lack of validity d. lack of accuracy
completeness
2. In SQL, a _________ subquery always returns a single column and a single row, that is, a single
value
a. row b. table c. scalar d. none of the above
3. The primary responsibilities for processing personal data include: __________________.
a. identity b. accuracy c. security d. all of the above

4. A non-empty cell in a cube is called a fact and can contain several ________________
a. cubes b. tables c. measures d. relations

5. Which one of the following approaches is used to handle dirty data?


a. Fix it b. Remove it c. Both a and b d. None of the above
6. Replication is mostly concerned with data resilience whereas ________ is concerned with
capacity and performance.
a. sharding b. centralisation c. the user interface d. none of the above

7. __________ is concerned with segmenting a diverse group of data into a number of similar sub-
groups.
a. clustering b. correlation c. combination d. regression

8. __________ is part of data integration and is concerned with ensuring domain consistency.

a. Format integration b. Semantic Integra- c. Data cleansing d. Data scrubbing


tion
9. A relation is in First Normal Form (1NF) if each attribute contains only ______ values,
that is, it has no repeating groups of values.
a. atomic b. numerical c. string d. none of the above

10. Which of the following descriptive statistical techniques requires no training data ?
a. list b. dict c. dataframe d. none of the above

You might also like