At LinkedIn, we rely heavily on offline analytics for making data-driven decisions. Apache Spark provides a significant amount of the compute infrastructure powering use cases like data warehousing, data science, AI/ML, A/B testing, and metrics reporting. The scale of Spark at LinkedIn is significant with 150k+ unique jobs responsible for 300k+ executions consuming 200 petabyte-hours of compute da
{{#tags}}- {{label}}
{{/tags}}