79

# Support Vector Machine April 10, 2020

## Transcript

1. Welcome to the
Covid Coding Program

2. Let’s Start Basics of
Machine Learning!
I’m, Charmi Chokshi
An ML Engineer at Shipmnts.com
and a passionate Tech-speaker. A
Critical Thinker and your mentor of
the day!
Let’s connect:
@CharmiChokshi

3. Let’s classify some points!

4. Which hyperplane to choose?

5. Which hyperplane to choose?

6. Margin

7. Which hyperplane to choose?

8. Which hyperplane to choose?

9. Robust to Outliers

10. Which hyperplane to choose?

11. Tada!! Introducing a new feature

12. Kernel trick

13. Hyperplane

14. Large Margin
- In logistic regression, we take the output of the linear
function and squash the value within the range of [0,1] using
the sigmoid function. If the squashed value is greater than a
threshold value(0.5) we assign it a label 1, else we assign it a
label 0.
- In SVM, we take the output of the linear function and if that
output is greater than 1, we identify it with one class and if
the output is -1, we identify is with another class. Since the
threshold values are changed to 1 and -1 in SVM, we obtain
this reinforcement range of values([-1,1]) which acts as
margin.

15. Cost Function
- In the SVM algorithm, we are looking to maximize the margin
between the data points and the hyperplane. The loss
function that helps maximize the margin is hinge loss.

16. Cost Function
- The cost is 0 if the predicted value and the actual value are of
the same sign. If they are not, we then calculate the loss
value. We also add a regularization parameter the cost
function. The objective of the regularization parameter is to
balance the margin maximization and loss. After adding the
regularization parameter, the cost functions looks as below.

17. Pros and Cons
Pros:
○ It works really well with a clear margin of separation
○ It is effective in high dimensional spaces.
○ It uses a subset of training points in the decision function
(called support vectors), so it is also memory efficient.
Cons:
○ It doesn’t perform well when we have large data set because
the required training time is higher
○ It also doesn’t perform very well, when the data set has more
noise i.e. target classes are overlapping
○ SVM doesn’t directly provide probability estimates, these are
calculated using an expensive five-fold cross-validation. It is
included in the related SVC method of Python scikit-learn
library.

18. Comparing 10 Classification algos

19. Thank You!