In this lesson we are going to give a short revision of calculus. This will help to be able to picture backpropagation and gradient descent more easily in future lesson. You do not need to understand calculus to work with deep learning major of people using deep learning apart from the hardcore researchers of course do not understand the calculus aspect of backpropagation and gradient descent
This means we get the result of the function f(a) by multiplying 3 by whatever the value of a is. In this graph the x axis represents the “a” values and the y-axis represents the f of a values. For example, lets say a equals 2, then f of a is going to be 6 since we multiply 2 by 3 to get f of a. Lets say we increase a by a tiny amount say 0.001 such that a becomes 2.001.
When we compute f(a) again we shall get 6.003. We can realize that whatever a is f of a is 3 times that value. We we plot our two “a” values and their corresponding f of a values this is what we get. We can compute the slope of f of a by dividing the vertical change by the horizontal change. Over here we have named the vertical change as height and the horizontal change as width. When we perform this division we arrive at the answer 3. This is known as the derive. The slope and the derivative mean the same thing. This is often written as “d” “f of a ” over “d a ” and the answer is 3. Also I should point out that you don’t always need to plot a graph in order to compute the functions. We shall define basic rules for finding derivatives later on in this lesson. Lets see another example. Over here we have the function y equals x squared.
We computed the derivative by delta y divided by delta x.
Now lets take a look at some rules of derivatives.
The power rule states that to find the derivate of a function x raised to power n , the answer becomes n times x raised to the power n-1. If n = zero then the derivative is equal to zero because the derivative of any constant is equal to zero. If we want to find the derivative of a constant A multiplied by a function f of x. We can find the derivative of the function f of x and then multiply the answer by the constant.
The derivative of the function f of x is sometimes written as f prime of x.
The derivative of the sum of two functions and f of x and f of g is equal to the derivative of f of x plus the derivative of g of x as we can see shown in these two examples.
We already mentioned the derive of a constant equals zero. The derive of a function multiplied by a constant is equal to the derive of the of the function only multiplied by that constant. We also said the derivative of the sum of two functions f of x plus g of x is equal to the derivative of f of x plus the derivative of g of x.
The difference rule works the same way as the sum rule.
The product rule says the derivative of the product of two functions is equal to the derivative of the first function multiplied by the second function + the first function multiplied by the derivative of the second function.
The derivative of sinx is equal to cosx. The derivative cos x is equal to minus sine x. The derivative of euler’s constant e raised to the power x is the same e raised to the power x.
The chain rule is the one that is heavily applied in back propagation.
It states that we apply the chain rule by multiplying the derivative of the outside function by the inside function.
It is often expressed as dz dx equals dz dy multiplied by dy dx.
Or f of g prime equals f prime of g multiplied by g prime.
Lets see an example.
Lets say we want to find the derivative of the function bracket open 3x +2x2 bracket closed, all squared.
We first find the derivative of the entire function which is 2 bracket open3x +2x2 bracket closed and then multiply it by the derivative of the content in the bracket.