Make pleasingly parallel R code with rxExecBy

Using the foreach package (available on CRAN) is one simple way of speeding up pleasingly parallel problems using R.A better idea would be to leave the data where it is, and run R within the data repository, in parallel.When your data is sitting in SQL Server or Spark, you can specify a set of keys to partition the data by, and an R function (any…
Read More...

Where Europe lives, in 14 lines of R Code

Via Max Galka, always a great source of interesting data visualizations, we have this lovely visualization of population density in Europe in 2011, created by Henrik Lindberg: Impressively, the chart was created with just 14 lines of R…

NumPy Cheat Sheet

NumPy is the library that gives Python its ability to work with data at speed.Originally, launched in 1995 as ‘Numeric,’ NumPy is the foundation on which many important Python data science libraries are built, including Pandas, SciPy and…

Programming as a Way of Thinking

With modern programming languages—I’ll use Python as an example—we use functions, objects, modules, and libraries to extend the language, and that doesn’t just make programs better, it changes what programming is.Programming used to be…