Skip to content

Block or Proc? #3

Open
Open
@ondra-m

Description

@ondra-m

What is better way how to define a function on RDD?

# as Proc
rdd.map(lambda{|x| x*2})

# as block
rdd.map {|x| x*2}

Which method should be supported?

As Proc:

  • the same way as in Python
  • currently implemented

As block:

  • what about aggregate(zero_value, seq_op, comb_op)
  • method needs 2 function

Both:

  • what about reduce_by_key(f, num_partitions=nil)

  • if you would like to use block and num_partitions:

      rdd.reduce_by_key(nil, 2){|x,y| x+y}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions