# Backpropagation – Long – Medium

- Gradient descentlet’s see a simple neural network demo :y = sigmoid(w * x)x = np.array([1,2])target = np.array(0.5)w = np.array([0.5,-0.5])error(SSE) = 1/2 Σ(target-y)²ok, now ,how to update weight w to minimums value of errorwe should derivate weights to get Δw then w += Δw to update weight.
- (error)dw = = y′ = sigmoid′(w * x) = sigmoid(w*x) * (1-sigmoid(w*x)) * x= (error)dw = (target−y) * sigmoid(w*x) * (1-sigmoid(w*x)) * xΔw = η * (error)dw ;η: learning rate ,to control descent steps ,normally 0.1,0.01,0.001,0.0001Code is:# Defining the sigmoid function for activationsdef sigmoid(x): return 1/(1+np.exp(-x))# Derivative of the sigmoid functiondef sigmoid_prime(x): return sigmoid(x) * (1 – sigmoid(x))# Input datax = np.array([0.1, 0.3])# Targety = 0.2# Input to output weightsweights = np.array([-0.8, 0.5])# The learning rate, eta in the weight step equationlearnrate = 0.5# The neural network output (y-hat)nn_output = sigmoid(x[0]*weights[0] + x[1]*weights[1])# or nn_output = sigmoid(np.dot(x, weights))# output error (y – y-hat)error = y – nn_output# error term (lowercase delta)error_term = error * Gradient descent step del_w = [ learnrate * error_term * x[0], learnrate * error_term * x[1]]# or del_w = learnrate * error_term * is extension of gradient descent.calculate method is almost same, if neural network has hidden layer between input layer and output layer, we need update multiple weights,neural networkNow, let’ see the process of update weighs.base on the gradient descent.first derivate dwH ,then derivate (dwH)dwX, dwH is weight of hidden layerdwX is weight of input layerh = sigmoid(wX * x)output = sigmoid(wH * h)error = y-output(error)dwH = error * sigmoid(wH*h) * (1-sigmoid(wH*h)) * h((error)dwH)dwX = (error)dwH * sigmoid(wX * x) * (1-sigmoid(wX * x)) * x3.
- Gradient vanishwhy sigmoid gradient vanish?Because sigmoid derivate is sigmoid(1-sigmoid),and backpropagation process will generate more and more sigmoid, and sigmoid value range in (0,1), so more numbers of hidden layer ,value decrease sharper, the first layer update weight is very small:Diagram like below:sig (output layer)sig * sig (layer n -1)….
- * sig (layer 1)4.
- CodeHere is full code of gradient descent backpropagation, u can run it by urself.

calculate method is almost same, if neural network has hidden layer between input layer and output layer, we need update multiple weights, Because sigmoid derivate is sigmoid(1-sigmoid),and…

@gp_pulipaka: *Machine Learning: Backpropagation. #BigData #DeepLearning #MachineLearning #DataScience #AI*

1. Gradient descent

let’s see a simple neural network demo :

ok, now ,how to update weight w to minimums value of error

we should derivate weights to get Δw then w += Δw to update weight.

Code is:

def sigmoid(x):

def sigmoid_prime(x):

2.Backpropagation

backpropagation is extension of gradient descent.

calculate method is almost same, if neural network has hidden layer between input layer and output layer, we need update multiple weights,

Now, let’ see the process of update weighs.

base on the gradient descent.

first derivate dwH ,then derivate (dwH)dwX, dwH is weight of hidden layer

why sigmoid gradient vanish?

Because sigmoid derivate is sigmoid(1-sigmoid),and backpropagation process will generate more and more sigmoid, and sigmoid value range in (0,1), so more numbers of hidden layer ,value decrease sharper, the first layer update weight is very small:

Diagram like below:

sig (output layer)

sig * sig (layer n -1)

sig * sig * sig * …. * sig (layer 1)

Here is full code of gradient descent & backpropagation, u can run it by urself.