Skip to content

doc: document the growing lineage problem #746

@SemyonSinchenko

Description

@SemyonSinchenko

Is your feature request related to a problem? Please describe.
Any call to GraphFrame built from DataFrames that already contain complex transformation may blow up a plan and make it very slow.

Describe the solution you would like
Add a section to the docuemntation with explanation of the growing plan problem and recommendation to explicitly checkpoint dataframes with complex transformations before passing it to GraphFrame(vertices, edges)

It may be a new section in https://graphframes.io/04-user-guide/13-configurations.html

Component

  • Scala Core Internal
  • Scala API
  • Spark Connect Plugin
  • Infrastructure
  • PySpark Classic
  • PySpark Connect
  • Documentation

Additional context

Are you planning on creating a PR?

  • I'm willing to make a pull-request

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions