Classification task에서 MSE보다 Cross Entropy 를 주로 사용하는 이유?

이를 이해하기 위해 다양한 레이어들의 역전파 과정을 살펴보겠습니다

conv layer → activation function → loss function → 미분, backpropagation→gradient descent weights update

$$ f(g(x))^\prime =\frac{\partial f}{\partial x} = \frac{\partial f}{\partial g} \bullet \frac{\partial g}{\partial x} $$

ReLU

Untitled

미분하면

Untitled

forward propagation 시, 입력인 x가 0보다 크다면 backpropagation 때에는 상류의 값을 그대로 하류로 보낸다.

입력인 x가 0보다 작다면 backpropagation 때에는 하류로 0을 보내게 된다 ( chain rule 의해 gradient가 0이 될 것이다 )

Untitled

Untitled

파란색이 sigmoid 함수 그래프, 빨간색이 sigmoid를 미분한 것의 그래프이다.

Untitled

미분식. 이해할 필요는 없다. 미분값이 0.3보다 작은 값이 나온다는 것만 알아두자

Untitled