Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Learning Basics

Deep Learning Basics

Lecture notes wrote for teaching undergraduate students in lecture, Audio Signal Processing.

Taein Kim

May 02, 2022
Tweet

More Decks by Taein Kim

Other Decks in Education

Transcript

  1. Machine Learning Source: [Link] page 19 - Artificial Intelligence (AI)

    : Subfield of computer science, solving tasks humans are good at (Natural language, Speech, Image recognition, etc) - Machine Learning : The field of study that gives computers the ability to learn without being explicitly programmed * * Arthur L Samuel. "Some Studies in machine learning using the game of checkers", IBM Journal of research and development 3.3 (1959), pp. 210-229 • Regression • Classification
  2. Deep Learning - Deep Learning : Part of a broader

    family of machine learning methods based on artificial neural networks with representation learning - Artificial Neural Network : Computing systems vaguely inspired by the biological neural networks that constitute animal brains - Representation Learning : A set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data [Source] [Source]
  3. Perceptron 01. Perceptron : Outputs a signal from multiple inputs

    x1 x2 y w1 w2 Bias Weight Output Inputs - Signal is a flow - Perceptron signals create flow and transmit information forward - The perceptron signal can have two values: 'Flow/No flow' - Output 1 only when the total number of signals from the neuron exceeds the specified limit b Neuron(Node) - w1, w2 : Weight – The effect of input signals on the results - b : Bias – Determines how easily neurons to be activate
  4. Perceptron 01. Limits of single Perceptron : Linear structure -

    OR Gate with Perceptron : Can be implemented - XOR Gate with Perceptron : Cannot be implemented (linearly) - Single-layer Perceptron : Cannot separate non-linear area - Multi-layer Perceptron : Can represent non-linear area s1 s2 y x1 x2
  5. Neural Network = Multi-layered Perceptrons Neural Networks 02. Input Layer

    Hidden Layer Output Layer - Input Layer : The layer where the learning data feature is inputted It has as many neurons as feature's dimensions - Hidden Layer: Every layer between the input layer and the output layer Multiple nonlinear discriminant functions are learned for the input - Output Layer: Outputs the corresponding value for the input data
  6. Perceptron and the activation function 02. x1 x2 y w1

    w2 h(x) = y : Activation function - Activation function : Determines whether the total number of input signals causes activation - Using linear functions negates the meaning of deepening the layers - The problem of linear functions is same with a "network without hidden layers" even if the layers are deepened - Composition of linear function : h(x) = cx, f(x) = h(h(h(x))) = c*c*c*x = ax We use nonlinear functions in multi-layer perceptrons(Neural Network) Neural Networks
  7. Implementing 3-layered Neural Network 02. a1 a3 x1 x2 a2

    b1 b2 y1 y2 𝑤 11 (1) 𝑤 12 (1) 𝐴(1) = 𝑋𝑊(1) + 𝐵(1) 𝐴(1) = [𝑎1 𝑎2 𝑎3] 𝑋(1) = [𝑥1 𝑥2] 𝐵(1) = [𝑏1 𝑏2 𝑏3] ※ Bias is ignored 𝑊(1) = 𝑤11 𝑤21 𝑤12 𝑤22 𝑤31 𝑤32 Neural Networks
  8. Designing output layer 02. - Neural Network can use both

    classification and regression - Regression : Identity function - Classification : Softmax function Identity function - Outputs input signal a1 a2 y1 𝜎() y2 𝜎() Softmax function - Regards output signal as probability - Like probability, when all output values are added, the sum is 1 Neural Networks
  9. Training 03. Training Neural Network - Training : Automatically obtain

    the optimal value of the weight parameter from the training data - Loss function : An indicator that enables neural networks to learn - Gradient method: A technique to make the resulting value of the loss function as small as possible (with loss function) - End-to-end machine learning - Learning and evaluation are repeated by dividing the data into training and test data. - We find optimal parameters using training data only. - Training data and test data are evaluated separately to obtain general performance. - Overfitting : Over-optimized for specific datasets only Input Feature (SIFT, HOG, etc) Machine Learning (ex. SVM) Result Input Neural Network (Deep Learning) Result
  10. 03. Loss function - Loss function: An indicator that enables

    neural networks to learn - Neural network learning expresses states as an indicator → Finds the value of the weight parameter that makes the indicator the best - Uses SSE and CEE (See below) - Neural network learning is the process of finding parameters that make the loss function as small as possible - Find the minimum value of the loss function through the differentiation of the loss function of the weight parameter - The reason for using the loss function as an indicator is that it is elastic when the value of the accuracy compared to the parameter changes. Sum of squares for error (SSE) Cross entropy* error, CEE) * Entropy : A measure of uncertainty (calculates the difference between the actual and predicted values) Training
  11. 03. Gradient method - Optimal parameter = parameter value when

    the loss function is minimum - Gradient method : To find the minimum value by moving in the direction in which the slope is pointed and then repeating the method of finding the slope where it is moved 𝜂 ∶ Learning rate * * Hyper parameter : Parameters that must be set by the person themselves - Steps of the Neural Network training 1) Mini-batch : Randomly selects the training data 2) Calculating gradient : Calculate the slope of each weight parameter to reduce the value of the loss function 3) Updating parameters : Update weight parameters very slightly in slope direction 4) Repeat step 1 to 3 - Randomize data as mini-batch → Stochastic gradient descent, SGD) Training
  12. Backward propagation of the error Backpropagation 04. - Gradient Descent

    method : Numerical differentiation → Requires lots of time - Using backpropagation to efficiently calculate the slope of the weight parameter X X X 100 200 220 2 1.1 Forward propagation Backward propagation
  13. Backpropagation 04. - Backpropagation transfers 'local differentials' from right to

    left, which is the opposite of forward propagation - Can be proved with Chain rule - Backpropagation of Addition node : Multiply 1 (Send input data to next node) - Backpropagation of Multiplication node : Swap each value and multiply themselves f x y E E(𝜕𝑦 𝜕𝑥 ) Backpropagation http://wiki.hash.kr/index.php/%EC%97%AD%EC%A0%84%ED%8C%8C
  14. Implementing layer 04. - Layer of activation function: ReLU, Sigmoid

    - Layer of Affine/Softmax ReLU Sigmoid Affine layer - Apply the transposed matrix Softmax layer Backpropagation
  15. Conclusion 1. Neural networks are composed of multilayered perceptrons 2.

    Neural networks consist of multiple composite functions of activation functions 3. The activation function determines the output value through weights and biases 4. Neural network learning is to determine weights and biases using pre-known results 5. The purpose of learning weights is to minimize the loss function 6. The method to minimize the loss function is to compute differentials by gradient descent 7. Since the numerical differential requires huge computing time, backpropagation is applied
  16. Homework • Create a new project named ‘AudioDSPWeek10’ and add

    `train.py` and `eval.py` modules • Run each modules, `train.py` and `eval.py` and show the result (See the screenshot at below) • Explain the meaning of each step