Challenge of the Week

The #DataScience Challenge of the Week (and a Solution)  #abdsc #BigData vía @DataScienceCtrl

  • You need to be a member of Data Science Central to add comments!
  • The number of clusters steadily decreases (7 at 20s [~167 iterations], 6 at 40s [~333 iterations], 5 at the end [500 iterations])
  • One of the clusters is unstable: points are exchanging between it and a nearby cluster – further iterations may reduce the number of clusters through consolidation.
  • You can check the video, explanations, and data .
  • We recently posted a challenge: creating data videos.

We recently posted a challenge: creating data videos. You can check the challenge here, including training material and data to produce these videos, with open…

@JulioTrujillo_: The #DataScience Challenge of the Week (and a Solution) #abdsc #BigData vía @DataScienceCtrl

We recently posted a challenge: creating data videos. You can check the challenge here, including training material and data to produce these videos, with open source software, and in particular with R.  Here we provide the solution (including the video) produced by one of the participants. Other solutions from other participants will soon be posted as well. Our full list of challenges of the week can be found here.

The Solution

The participant created a 3D version of one of our videos, adding several features along the way. The video features clusters growing, shrinking, merging, or being born, over time, as in a birth-and-death process, or as in the formation of galaxies. You can check the video, explanations, and data here.  Below is a screen shot. 

Takeaway from this challenge, according to the author:

The number of clusters steadily decreases

(7 at 20s [~167 iterations], 6 at 40s [~333 iterations], 5 at the end [500 iterations])

Around the middle of the video you see that the clusters appear to be fairly stable, however more iterations result in a significant change in cluster location and number.  A local minimum was detected, however it was not the global minimum.

One cluster is especially small (and potentially suspect) at the end of the iterations in this simulation

One of the clusters is unstable: points are exchanging between it and a nearby cluster – further iterations may reduce the number of clusters through consolidation.

There is a lot more movement of points within the z dimension than along x or y.  This would be worth investigating as a potential issue with the clustering algorithm or visualization – or perhaps something interesting is going on!   

There appear to be several outlier points that stick around toward last 1/3 of the video and move around outside of any cluster.  These points are likely worth investigating further to understand the source and behavior.

DSC Resources

Additional Reading

Follow us on Twitter:  @DataScienceCtrl  |  @AnalyticBridge

Challenge of the Week

You might also like More from author

Comments are closed, but trackbacks and pingbacks are open.