본문 바로가기
COURSERA

week 4_Building your Deep Neural Network Step by Step 실습 (Andrew Ng)

by HYUNHP 2022. 2. 13.
728x90
반응형

안녕하세요, HELLO

 

오늘은 DeepLearning.AI에서 진행하는 앤드류 응(Andrew Ng) 교수님의 딥러닝 전문화의 첫 번째 과정인 "Neural Networks and Deep Learning"을 정리하려고 합니다. "Neural Networks and Deep Learning"의 강의 목적은 '딥러닝의 기능, 과제 및 결과 이해'를 통해 '머신 러닝을 업무에 적용하고, 기술 수준을 높이고, AI 분야에서 단계를 밟을 수 있는 지식과 기술을 얻을 수 있는 경로'를 배우기 위함이며, 강의는 아래와 같이 구성되어 있습니다.

 

~ Introduction

~ Basics of Neural Network programming

~ One hidden layer Neural Networks

~ Deep Neural Networks

 

"Neural Networks and Deep Learning" (Andrew Ng)에서 지금까지 배운 내용을 바탕으로 레이어 활성화 함수로 Relu 함수를 적용하고, 출력 함수로 sigmoid 함수를 적용하는 Deep neural network를 구현하고자 합니다. 추후에 "Neural Networks and Deep Learning"을 듣게 되신다면, 정답 코드가 궁금하신 분들은 참고하시기 바랍니다.

 

이번 실습을 통해 아래 사항을 구현해볼 수 있습니다.

 

  • Use non-linear units like ReLU to improve your model
  • Build a deeper neural network (with more than 1 hidden layer)
  • Implement an easy-to-use neural network class

CHAPTER 1. 'Packages'

 

CHAPTER 2. 'Outline'

 

CHAPTER 3. 'Initialization'

 

CHAPTER 4. 'Forward Propagation Module'

 

CHAPTER 5. 'Cost Function'

 

CHAPTER 6. 'Backward Propagation Module'


CHAPTER 1. 'Packages'

 

이번 실습에서 사용되는 패키지는 아래와 같습니다.

  • numpy is the main package for scientific computing with Python.
  • matplotlib is a library to plot graphs in Python.
  • dnn_utils provides some necessary functions for this notebook.
  • testCases provides some test cases to assess the correctness of your functions
  • np.random.seed(1) is used to keep all the random function calls consistent. It helps grade your work. Please don't change the seed

 

import numpy as np
import h5py
import matplotlib.pyplot as plt
from testCases import *
from dnn_utils import sigmoid, sigmoid_backward, relu, relu_backward
from public_tests import *

%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

%load_ext autoreload
%autoreload 2

np.random.seed(1)

 

 

CHAPTER 2. 'Outline'

 

신경망을 구현하기 위해서, 필요한 함수(function)들을 정의할 것이고, 이 정의된 함수들은 다음 과제에서 2개 층인 신경망과 여러 층인 신경망을 구현하는 데 사용될 예정입니다. 

 

  • L-layer의 파라미터 초기화 함수
  • Forward Propagation 수행 함수
  • Cost 계산 함수
  • Backward Propagation 수행 함수
  • 파라미터 업데이트 함수

 

반응형

 

CHAPTER 3. 'Initialization'

 

□ 3.1. 2-layer Neural Network

 

  • 2개 층인 신경망 구조는 Linear -> ReLU -> Linear -> Sigmoid로 구성됩니다.
  • 가중치 w는 np.random.randn(shape)*0.01를 적용해 랜덤 초기화를 진행하며, 이때 상수 0.01은 실습에서 임의로 지정한 숫자입니다.
  • 잔차 b에 대해서는 0 값으로 초기화해도 무관하여, np.zeros(shape)를 적용합니다.
# GRADED FUNCTION: initialize_parameters

def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    
    Returns:
    parameters -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """
    
    np.random.seed(1)
    
    #(≈ 4 lines of code)
    # W1 = ...
    # b1 = ...
    # W2 = ...
    # b2 = ...
    # YOUR CODE STARTS HERE
    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros((n_y, 1))
        
    # YOUR CODE ENDS HERE
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

 

그리고 랜덤 초기화한 파라미터를 확인하면 아래와 같이 결괏값이 나오게 됩니다.

 

parameters = initialize_parameters(3,2,1)

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

initialize_parameters_test(initialize_parameters)

 


□ 3.2. L-layer Neural Network

 

다음으로 L개 층(layer))을 가지는 신경망을 구현해보겠습니다. 각 층(layer)에서의 파라미터 W와 b의 차원을 맞춰서 계산을 연산을 진행해야 합니다. n [l] 값은 l번째 층(layer)의 연산 유닛(unit) 개수이며, 다음과 같이 각 layer의 파라미터 차원을 정리할 수 있습니다. (Input  𝑋  is  (12288,209) (with  𝑚=209  examples))

 

 

레이어 l에 대한 신경망 가중치 (w)의 행렬은 (n [l], n [l-1])이고, 잔차 (b)의 행렬은 (n [l], 1)입니다. 이때, 각 신경망에서 wX+b를 파이썬으로 계산하게 되면, (n [l], 1) 차원인 잔차 b는 파이썬 브로드캐스팅에 의해 각 행에 맞게 연산됩니다.

 

# GRADED FUNCTION: initialize_parameters_deep

def initialize_parameters_deep(layer_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the dimensions of each layer in our network
    
    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                    bl -- bias vector of shape (layer_dims[l], 1)
    """
    
    np.random.seed(3)
    parameters = {}
    L = len(layer_dims) # number of layers in the network

    for l in range(1, L):
        #(≈ 2 lines of code)
        # parameters['W' + str(l)] = ...
        # parameters['b' + str(l)] = ...
        # YOUR CODE STARTS HERE
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))
        
        # YOUR CODE ENDS HERE
        
        assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l - 1]))
        assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))

        
    return parameters

 

그리고 랜덤 초기화한 파라미터를 확인하면 아래와 같이 결괏값이 나오게 됩니다.

 

parameters = initialize_parameters_deep([5,4,3])

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

initialize_parameters_deep_test(initialize_parameters_deep)

 


CHAPTER 4. 'Forward Propagation Module'

 

□ 4.1. Linear Forward

 

CHAPTER 3을 통해 2개 층 신경망과 L개 층 신경망의 파라미터를 초기화했으며, 순방향 전파(Forward propagation)를 진행해보겠습니다. 이를 위해 아래 함수들을 작성해보겠습니다.

 

  • LINEAR
  • LINEAR -> ACTIVATION where ACTIVATION will be either ReLU or Sigmoid.
  • [LINEAR -> RELU] × (L-1) -> LINEAR -> SIGMOID (whole model)

Linear Forward는 다음과 같이 구할 수 있습니다.

# GRADED FUNCTION: linear_forward

def linear_forward(A, W, b):
    """
    Implement the linear part of a layer's forward propagation.

    Arguments:
    A -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)

    Returns:
    Z -- the input of the activation function, also called pre-activation parameter 
    cache -- a python tuple containing "A", "W" and "b" ; stored for computing the backward pass efficiently
    """
    
    #(≈ 1 line of code)
    # Z = ...
    # YOUR CODE STARTS HERE
    Z = np.dot(W, A) + b
    # YOUR CODE ENDS HERE
    cache = (A, W, b)
    
    return Z, cache

 

그리고 Linear Forward 함수를 확인하면 아래와 같이 결괏값이 나오게 됩니다.

 

t_A, t_W, t_b = linear_forward_test_case()
t_Z, t_linear_cache = linear_forward(t_A, t_W, t_b)
print("Z = " + str(t_Z))

linear_forward_test(linear_forward)

 


□ 4.2. Linear-Activation Forward

 

activation 계산에 두 가지 종류의 함수 Sigmoid와 ReLU가 사용됩니다. 은닉층(hidden layer)에서는 ReLU로 계산하며, 마지막 출력층(ouput layer)에서는 sigmoid로 계산합니다. 출력 함수로 sigmoid를 사용하는 이유는 이진 분류(Binary classification)에서 연산에 용이하기 때문입니다.

 

# GRADED FUNCTION: linear_activation_forward

def linear_activation_forward(A_prev, W, b, activation):
    """
    Implement the forward propagation for the LINEAR->ACTIVATION layer

    Arguments:
    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

    Returns:
    A -- the output of the activation function, also called the post-activation value 
    cache -- a python tuple containing "linear_cache" and "activation_cache";
             stored for computing the backward pass efficiently
    """
    
    if activation == "sigmoid":
        #(≈ 2 lines of code)
        # Z, linear_cache = ...
        # A, activation_cache = ...
        # YOUR CODE STARTS HERE
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = sigmoid(Z)
        
        # YOUR CODE ENDS HERE
    
    elif activation == "relu":
        #(≈ 2 lines of code)
        # Z, linear_cache = ...
        # A, activation_cache = ...
        # YOUR CODE STARTS HERE
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)
        
        # YOUR CODE ENDS HERE
    cache = (linear_cache, activation_cache)

    return A, cache

 

그리고 linear_activation_forward 함수를 확인하면 아래와 같이 결괏값이 나오게 됩니다.

 

t_A_prev, t_W, t_b = linear_activation_forward_test_case()

t_A, t_linear_activation_cache = linear_activation_forward(t_A_prev, t_W, t_b, activation = "sigmoid")
print("With sigmoid: A = " + str(t_A))

t_A, t_linear_activation_cache = linear_activation_forward(t_A_prev, t_W, t_b, activation = "relu")
print("With ReLU: A = " + str(t_A))

linear_activation_forward_test(linear_activation_forward)

 

 


□ 4.3. L-Layer Model


모든 층(layer)에서의 순방향 전파(Forward propagatino)를 진행하기 위한 함수를 구현합니다.

  • Use the functions you've previously written
  • Use a for loop to replicate [LINEAR->RELU] (L-1) times
  • Don't forget to keep track of the caches in the "caches" list. To add a new value c to a list, you can use list.append(c).

 

# GRADED FUNCTION: L_model_forward

def L_model_forward(X, parameters):
    """
    Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation
    
    Arguments:
    X -- data, numpy array of shape (input size, number of examples)
    parameters -- output of initialize_parameters_deep()
    
    Returns:
    AL -- activation value from the output (last) layer
    caches -- list of caches containing:
                every cache of linear_activation_forward() (there are L of them, indexed from 0 to L-1)
    """

    caches = []
    A = X
    L = len(parameters) // 2                  # number of layers in the neural network
    
    # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
    # The for loop starts at 1 because layer 0 is the input
    for l in range(1, L):
        A_prev = A 
        #(≈ 2 lines of code)
        # A, cache = ...
        # caches ...
        # YOUR CODE STARTS HERE
        A, cache = linear_activation_forward(A_prev, parameters['W'+ str(l)], parameters['b' + str(l)], activation = "relu")
        caches.append(cache)
        
        # YOUR CODE ENDS HERE
    
    # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
    #(≈ 2 lines of code)
    # AL, cache = ...
    # caches ...
    # YOUR CODE STARTS HERE
    AL, cache = linear_activation_forward(A, parameters['W'+ str(L)], parameters['b' + str(L)],activation = "sigmoid")
    caches.append(cache)
    
    # YOUR CODE ENDS HERE
          
    return AL, caches

 

그리고 L_model_forward 함수를 확인하면 아래와 같이 결괏값이 나오게 됩니다.

 

t_X, t_parameters = L_model_forward_test_case_2hidden()
t_AL, t_caches = L_model_forward(t_X, t_parameters)

print("AL = " + str(t_AL))

L_model_forward_test(L_model_forward)

 


CHAPTER 5. 'Cost Function'

 

원래의 값과 예측값이 가장 오차가 작은 값을 계산하는 Cost 함수를 생성해보겠습니다.

# GRADED FUNCTION: compute_cost

def compute_cost(AL, Y):
    """
    Implement the cost function defined by equation (7).

    Arguments:
    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)

    Returns:
    cost -- cross-entropy cost
    """
    
    m = Y.shape[1]

    # Compute loss from aL and y.
    # (≈ 1 lines of code)
    # cost = ...
    # YOUR CODE STARTS HERE
    cost = (-1/m) * np.sum(Y * np.log(AL) + (1-Y)*np.log(1-AL), axis = 1)
    
    # YOUR CODE ENDS HERE
    
    cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).

    
    return cost

 

그리고 compute_cost 함수를 확인하면 아래와 같이 결괏값이 나오게 됩니다.

 

t_Y, t_AL = compute_cost_test_case()
t_cost = compute_cost(t_AL, t_Y)

print("Cost: " + str(t_cost))

compute_cost_test(compute_cost)

 


CHAPTER 6. 'Backward Propagation Module'

 

순방향 전파(Forward propagation)를 통해 결과와 caches를 사용해서 역방향 전파(Backward propagation)를 진행해서 비용 함수(Cost Function)의 기울기, 미분 값(gradient)을 구해보겠습니다.

  1. LINEAR backward
  2. LINEAR -> ACTIVATION backward where ACTIVATION computes the derivative of either the ReLU or sigmoid activation
  3. [LINEAR -> RELU] × (L-1) -> LINEAR -> SIGMOID backward (whole model)

□ 6.1. Linear Backward

 

# GRADED FUNCTION: linear_backward

def linear_backward(dZ, cache):
    """
    Implement the linear portion of backward propagation for a single layer (layer l)

    Arguments:
    dZ -- Gradient of the cost with respect to the linear output (of current layer l)
    cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer

    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    A_prev, W, b = cache
    m = A_prev.shape[1]

    ### START CODE HERE ### (≈ 3 lines of code)
    # dW = ...
    # db = ... sum by the rows of dZ with keepdims=True
    # dA_prev = ...
    # YOUR CODE STARTS HERE
    dW = (1/m) * np.dot(dZ, A_prev.T)
    db = (1/m) * np.sum(dZ, axis = 1, keepdims=True)
    dA_prev = np.dot(W.T, dZ)
    
    # YOUR CODE ENDS HERE
    
    return dA_prev, dW, db

 

그리고 linear_backward 함수를 확인하면 아래와 같이 결괏값이 나오게 됩니다.

 

t_dZ, t_linear_cache = linear_backward_test_case()
t_dA_prev, t_dW, t_db = linear_backward(t_dZ, t_linear_cache)

print("dA_prev: " + str(t_dA_prev))
print("dW: " + str(t_dW))
print("db: " + str(t_db))

linear_backward_test(linear_backward)

 


□ 6.2. Linear-Activation Backward

 

sigmoid_backward와 relu_backward 함수를 사용해서 dZ를 구하게 됩니다. 는 activation 함수, 는 activation의 도함수입니다.

 

# GRADED FUNCTION: linear_activation_backward

def linear_activation_backward(dA, cache, activation):
    """
    Implement the backward propagation for the LINEAR->ACTIVATION layer.
    
    Arguments:
    dA -- post-activation gradient for current layer l 
    cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
    
    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    linear_cache, activation_cache = cache
    
    if activation == "relu":
        #(≈ 2 lines of code)
        # dZ =  ...
        # dA_prev, dW, db =  ...
        # YOUR CODE STARTS HERE
        dZ = relu_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)
        
        # YOUR CODE ENDS HERE
        
    elif activation == "sigmoid":
        #(≈ 2 lines of code)
        # dZ =  ...
        # dA_prev, dW, db =  ...
        # YOUR CODE STARTS HERE
        dZ = sigmoid_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)
        
        # YOUR CODE ENDS HERE
    
    return dA_prev, dW, db

 

그리고 linear_activation_backward 함수를 확인하면 아래와 같이 결괏값이 나오게 됩니다.

 

t_dAL, t_linear_activation_cache = linear_activation_backward_test_case()

t_dA_prev, t_dW, t_db = linear_activation_backward(t_dAL, t_linear_activation_cache, activation = "sigmoid")
print("With sigmoid: dA_prev = " + str(t_dA_prev))
print("With sigmoid: dW = " + str(t_dW))
print("With sigmoid: db = " + str(t_db))

t_dA_prev, t_dW, t_db = linear_activation_backward(t_dAL, t_linear_activation_cache, activation = "relu")
print("With relu: dA_prev = " + str(t_dA_prev))
print("With relu: dW = " + str(t_dW))
print("With relu: db = " + str(t_db))

linear_activation_backward_test(linear_activation_backward)

 


□ 6.3. L-Model Backward

 

전체 신경망에서 역방향 전파 backward 함수를 구현합니다. L_model_forward 함수를 통해서, 매 iteration마다 (X, W, b, z)를 저장한 cache를 기억하면서, BP module에서 cache를 사용해 gradient를 계산합니다. 

 

# GRADED FUNCTION: L_model_backward

def L_model_backward(AL, Y, caches):
    """
    Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group
    
    Arguments:
    AL -- probability vector, output of the forward propagation (L_model_forward())
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
    caches -- list of caches containing:
                every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
                the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])
    
    Returns:
    grads -- A dictionary with the gradients
             grads["dA" + str(l)] = ... 
             grads["dW" + str(l)] = ...
             grads["db" + str(l)] = ... 
    """
    grads = {}
    L = len(caches) # the number of layers
    m = AL.shape[1]
    Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL
    
    # Initializing the backpropagation
    #(1 line of code)
    # dAL = ...
    # YOUR CODE STARTS HERE
    dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) # derivative of cost with respect to AL
    
    # YOUR CODE ENDS HERE
    
    # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "dAL, current_cache". Outputs: "grads["dAL-1"], grads["dWL"], grads["dbL"]
    #(approx. 5 lines)
    # current_cache = ...
    # dA_prev_temp, dW_temp, db_temp = ...
    # grads["dA" + str(L-1)] = ...
    # grads["dW" + str(L)] = ...
    # grads["db" + str(L)] = ...
    # YOUR CODE STARTS HERE
    current_cache = caches[L-1]
    dA_prev_temp, dW_temp, db_temp = linear_activation_backward(dAL, current_cache, activation = 'sigmoid')
    grads["dA" + str(L-1)] = dA_prev_temp
    grads["dW" + str(L)] = dW_temp
    grads["db" + str(L)] = db_temp
    
    # YOUR CODE ENDS HERE
    
    # Loop from l=L-2 to l=0
    for l in reversed(range(L-1)):
        # lth layer: (RELU -> LINEAR) gradients.
        # Inputs: "grads["dA" + str(l + 1)], current_cache". Outputs: "grads["dA" + str(l)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] 
        #(approx. 5 lines)
        # current_cache = ...
        # dA_prev_temp, dW_temp, db_temp = ...
        # grads["dA" + str(l)] = ...
        # grads["dW" + str(l + 1)] = ...
        # grads["db" + str(l + 1)] = ...
        # YOUR CODE STARTS HERE
        current_cache = caches[l]
        dA_prev_temp, dW_temp, db_temp = linear_activation_backward(dA_prev_temp, current_cache, activation="relu")
        grads["dA" + str(l)] = dA_prev_temp
        grads["dW" + str(l + 1)] = dW_temp
        grads["db" + str(l + 1)] = db_temp        
        
        # YOUR CODE ENDS HERE

    return grads

 

역방향 전파(L_model_backward)를 진행하면 아래와 같은 결괏값이 나오게 됩니다.

 

t_AL, t_Y_assess, t_caches = L_model_backward_test_case()
grads = L_model_backward(t_AL, t_Y_assess, t_caches)

print("dA0 = " + str(grads['dA0']))
print("dA1 = " + str(grads['dA1']))
print("dW1 = " + str(grads['dW1']))
print("dW2 = " + str(grads['dW2']))
print("db1 = " + str(grads['db1']))
print("db2 = " + str(grads['db2']))

L_model_backward_test(L_model_backward)

 


□ 6.4. Update Parameters

 

순방향 전파를 통해 얻은 결괏값과 이를 바탕으로 역방향 전파를 진행해 미분 값(gradient)을 구합니다. 최종적으로 이를 바탕으로 신경망 훈련에 적용하는 파라미터를 업데이트합니다.

이때 는 학습률(learning rate)이며, 업데이트된 파라미터 값은 parameters에 저장됩니다.

 

# GRADED FUNCTION: update_parameters

def update_parameters(params, grads, learning_rate):
    """
    Update parameters using gradient descent
    
    Arguments:
    params -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients, output of L_model_backward
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
                  parameters["W" + str(l)] = ... 
                  parameters["b" + str(l)] = ...
    """
    parameters = params.copy()
    L = len(parameters) // 2 # number of layers in the neural network

    # Update rule for each parameter. Use a for loop.
    #(≈ 2 lines of code)
    for l in range(L):
        # parameters["W" + str(l+1)] = ...
        # parameters["b" + str(l+1)] = ...
        # YOUR CODE STARTS HERE
        parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * grads["dW" + str(l+1)]
        parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * grads["db" + str(l+1)]  
        # YOUR CODE ENDS HERE
                                                                                        
    return parameters

 

update_parameters 함수를 진행하면, 아래처럼 수정된 파라미터 값을 얻을 수 있으며, 다음 실습에 이를 바탕으로 훈련에서 수정된 파라미터로 신경망을 연산할 수 있습니다.

 

t_parameters, grads = update_parameters_test_case()
t_parameters = update_parameters(t_parameters, grads, 0.1)

print ("W1 = "+ str(t_parameters["W1"]))
print ("b1 = "+ str(t_parameters["b1"]))
print ("W2 = "+ str(t_parameters["W2"]))
print ("b2 = "+ str(t_parameters["b2"]))

update_parameters_test(update_parameters)

 


■ 마무리

 

"Neural Networks and Deep Learning" (Andrew Ng)에서 지금까지 배운 내용을 바탕으로 레이어 활성화 함수로 Relu 함수를 적용하고, 출력 함수로 sigmoid 함수를 적용하는 Deep neural network 코드에 대해서 정리해봤습니다.

 

그럼 오늘 하루도 즐거운 나날 되길 기도하겠습니다

좋아요와 댓글 부탁드립니다 :)

 

감사합니다.

 

반응형

댓글