Cloudera CCD-470 Exam - CertifySky.com
Free CCD-470 Sample Questions:
In a MapReduce job, you want each of your input files processed by a single map task. How do
you configure a MapReduce job so that a single map task processes each input file regardless of
how many blocks the input file occupies?
A. Increase the parameter that controls minimum split size in the job configuration.
B. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
C. Set the number of mappers equal to the number of input files you want to process.
D. Write a custom FileInputFormat and override the method isSplitable to always return false.
Which process describes the lifecycle of a Mapper?
A. The JobTracker calls the TaskTracker’s configure () method, then its map () method and finally
its close () method.
B. The TaskTracker spawns a new Mapper to process all records in a single input split.
C. The TaskTracker spawns a new Mapper to process each key-value pair.
D. The JobTracker spawns a new Mapper to process all records in a single file.
Determine which best describes when the reduce method is first called in a MapReduce job?
A. Reducers start copying intermediate key-value pairs from each Mapper as soon as it has
completed. The programmer can configure in the job what percentage of the intermediate data
should arrive before the reduce method begins.
B. Reducers start copying intermediate key-value pairs from each Mapper as soon as it has
completed. The reduce method is called only after all intermediate data has been copied and
C. Reduce methods and map methods all start at the beginning of a job, in order to provide
optimal performance for map-only or reduce-only jobs.
D. Reducers start copying intermediate key-value pairs from each Mapper as soon as it has
completed. The reduce method is called as soon as the intermediate key-value pairs start to
You have written a Mapper which invokes the following five calls to the OutputColletor.collect
output.collect (new Text (“Apple”), new Text (“Red”) ) ;
output.collect (new Text (“Banana”), new Text (“Yellow”) ) ;
output.collect (new Text (“Apple”), new Text (“Yellow”) ) ;
output.collect (new Text (“Cherry”), new Text (“Red”) ) ;
output.collect (new Text (“Apple”), new Text (“Green”) ) ;
How many times will the Reducer’s reduce method be invoked?
To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory. What
is the best way to accomplish this?
A. Serialize the data file, insert in it the JobConf object, and read the data into memory in the
configure method of the mapper.
B. Place the data file in the DistributedCache and read the data into memory in the map method of the mapper.
C. Place the data file in the DataCache and read the data into memory in the configure method of
D. Place the data file in the DistributedCache and read the data into memory in the configure
method of the mapper.
In a MapReduce job, the reducer receives all values associated with same key. Which statement
best describes the ordering of these values?
A. The values are in sorted order.
B. The values are arbitrarily ordered, and the ordering may vary from run to run of the same
C. The values are arbitrary ordered, but multiple runs of the same MapReduce job will always
have the same ordering.
D. Since the values come from mapper outputs, the reducers will receive contiguous sections of
You need to create a job that does frequency analysis on input data. You will do this by writing a
Mapper that uses TextInputFormat and splits each value (a line of text from an input file) into
individual characters. For each one of these characters, you will emit the character as a key and
an InputWritable as the value. As this will produce proportionally more intermediate data than input
data, which two resources should you expect to be bottlenecks?
A. Processor and network I/O
B. Disk I/O and network I/O
C. Processor and RAM
D. Processor and disk I/O
You want to count the number of occurrences for each unique word in the supplied input data.
You’ve decided to implement this by having your mapper tokenize each word and emit a literal
value 1, and then have your reducer increment a counter for each literal 1 it receives. After
successful implementing this, it occurs to you that you could optimize this by specifying a
combiner. Will you be able to reuse your existing Reduces as your combiner in this case and why
or why not?
A. Yes, because the sum operation is both associative and commutative and the input and output
types to the reduce method match.
B. No, because the sum operation in the reducer is incompatible with the operation of a Combiner.
C. No, because the Reducer and Combiner are separate interfaces.
D. No, because the Combiner is incompatible with a mapper which doesn’t use the same data
type for both the key and value.
E. Yes, because Java is a polymorphic object-oriented language and thus reducer code can be
reused as a combiner.
Your client application submits a MapReduce job to your Hadoop cluster. Identify the Hadoop
daemon on which the Hadoop framework will look for an available slot schedule a MapReduce
E. Secondary NameNode
Which project gives you a distributed, Scalable, data store that allows you random, realtime
read/write access to hundreds of terabytes of data?