Open
Description
What is better way how to define a function on RDD?
# as Proc
rdd.map(lambda{|x| x*2})
# as block
rdd.map {|x| x*2}
Which method should be supported?
As Proc:
- the same way as in Python
- currently implemented
As block:
- what about
aggregate(zero_value, seq_op, comb_op)
- method needs 2 function
Both:
-
what about
reduce_by_key(f, num_partitions=nil)
-
if you would like to use block and num_partitions:
rdd.reduce_by_key(nil, 2){|x,y| x+y}