Chapter 3 : Decision Tree Classifier — Coding – Machine Learning 101 – Medium
- Now let’s explore some tuning parameters and try to make training faster.Minimum sample splitIdeally, decision tree stops splitting the working set based on features either it runs out of features or working set ends up in same class.
- With this parameter, decision tree classifier stops the splitting if the number of items in working set decreases below specified value.Following is the diagram where Minimum Sample split is 10.
- Try providing this parameter as 40model = is the accuracy here?
- Criteria for split : criterionIn theory part we learned that one of good spliting decision is to take one that provides the best information gain.
- Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.Try these two and checkout what is accuracy.model = = can find detailed parameters here : ThoughtsDecision tree is classification strategy as opposed to the algorithm for classification.
In this second part we try to explore sklearn library’s decision tree classifier. We shall tune parameters discussed in theory part and checkout accuracy results. Coding exercise is the extension of…
@kdnuggets: #MachineLearning 101: Decision Tree Classifier
In this second part we try to explore sklearn library’s decision tree classifier. We shall tune parameters discussed in theory part and checkout accuracy results.
While you will get fair enough idea about implementation just by reading, I strongly recommend you to open editor and code along with the tutorial. I will give you better insight and long lasting learning.
0. What shall we be doing.
Don’t forget to hit ❤
Coding exercise is the extension of previous Naive Bayes classifier program that classifies the email into spam and non spam. Not to worry, if you haven’t gone through Naive Bayes (chapter 1) (Although I would suggest you to complete it first). The same code snippet shall be discussed in abstract way here as well.
I have created a git repository for the data set and the sample code. You can download it from here (Use chapter 3 folder). In case it fails, you can use/refer my version (classifier.py in chapter 3 folder) to understand working.
2. Little bit about cleaning
You may skip this part if you have already gone through coding part of Naive Bayes.(this is for readers who have directly jumped here).
Before we can apply the sklearn classifiers, we must clean the data. Cleaning involves removal of stop words, extracting most common words from text etc. In the code example concerned we perform following steps:
To understand in…