Just another Network site

sparklyr — R interface for Apache Spark

sparklyr — R interface for Apache #Spark #rstats #BigData

  • The returned Spark connection ( sc ) provides a remote dplyr data source to the Spark cluster.
  • Connect to Spark from R – the sparklyr package provides a complete dplyr backend.
  • You can connect to both local instances of Spark as well as remote Spark clusters.
  • You can copy R data frames into Spark using the copy_to function (more typically though you’ll read data within the Spark cluster using the spark_read family of functions).
  • Since Spark is a general purpose cluster computing system there are many potential applications for extensions (e.g. interfaces to custom machine learning pipelines, interfaces to 3rd party Spark packages, etc.).

Read the full article, click here.

@kdnuggets: “sparklyr — R interface for Apache #Spark #rstats #BigData”

If you use the RStudio IDE, you should also download the latest preview release of the IDE which includes several enhancements for interacting with Spark (see the RStudio IDE section below for more details).

sparklyr — R interface for Apache Spark