Notice
Recent Posts
Recent Comments
Link
ยซ   2026/04   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30
Tags more
Archives
Today
Total
๊ด€๋ฆฌ ๋ฉ”๋‰ด

My DevLog

[cs231n] Lecture 3 | Loss Functions and Optimization ๋‚ด์šฉ ์ •๋ฆฌ ๋ณธ๋ฌธ

Deep Learning/Stanford c231n

[cs231n] Lecture 3 | Loss Functions and Optimization ๋‚ด์šฉ ์ •๋ฆฌ

๋ฏ€๋А๋ฅด์œผ 2021. 4. 3. 14:00

๐Ÿ“š Stanford cs231n

Loss Functions and Optimization

โœ… TODO
โœ” Define a loss function that quantifies our unhappiness with the scores across the training data
โœ” Come up with a way of efficiently find the parameters that minimize the loss function(optimization)

W๊ฐ€ ์ข‹์€์ง€ ์•ˆ ์ข‹์€์ง€ ์ •๋Ÿ‰ํ™” ํ•ด์ฃผ๋Š” ๊ฒƒ์ด ๋ฐ”๋กœ Loss function ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฅผ ์ข‹์€ ์ชฝ์œผ๋กœ ๋ฐœ์ „์‹œํ‚ค๋Š” ๊ฒƒ์„ Optimization์ด๋ผ ํ•œ๋‹ค.

image

์˜ˆ์ œ๋ฅผ ๋‹จ์ˆœํ™” ์‹œ์ผœ 3๊ฐœ์˜ class ๋งŒ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž.
์„ธ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ W ๊ฐ’์„ ๋ณด๋ฉด ์ž๋™์ฐจ ์ด๋ฏธ์ง€๋งŒ ์ •๋‹ต์„ ๋งž์ถ”๊ณ  ์žˆ๋‹ค. ์ด๋Š” Linear classifier๊ฐ€ ์ž˜ ์ž‘๋™ํ•˜๊ณ  ์žˆ์ง€ ์•Š๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค.

Multiclass SVM loss

image

  • SVM loss์˜ ์ž‘๋™ ์›๋ฆฌ
    1. ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ ๋ณด๊ณ  ์ •๋‹ต ์นดํ…Œ๊ณ ๋ฆฌ๋ผ๋ฉด ๋„˜์–ด๊ฐ„๋‹ค.
    2. ์ •๋‹ต ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ์•„๋‹Œ ๊ฒฝ์šฐ ํ˜„์žฌ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ์ ์ˆ˜ - ์ •๋‹ต ์ ์ˆ˜ + 1 ์„ ๊ณ„์‚ฐํ•˜์—ฌ 0๋ณด๋‹ค ํฌ๋‹ค๋ฉด loss ๊ฐ’์— ๋”ํ•œ๋‹ค.
    3. 0๋ณด๋‹ค ์ž‘์„ ๊ฒฝ์šฐ loss ๊ฐ’์€ 0

image

โ“ ์ ์ˆ˜๊ฐ€ ๋‚ฎ์œผ๋ฉด ์ข‹์€ ๊ฒƒ์ธ๊ฐ€?
๐Ÿ‘‰ ๋งž๋‹ค. ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ๊ฒฐ๊ตญ ์ •๋‹ต๊ณผ์˜ ์ฐจ์ด๊ฐ€ ์–ผ๋งˆ๋‚˜ ํฐ์ง€ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ’์ด๋ฏ€๋กœ ์ด ์ ์ˆ˜๊ฐ€ ๋‚ฎ์œผ๋ฉด ์ •๋‹ต๊ณผ ๋น„์Šทํ•˜๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค.
โ“ ์ž๋™์ฐจ์˜ score์„ ์กฐ๊ธˆ ๋ฐ”๊พผ๋‹ค๋ฉด?
๐Ÿ‘‰ ์ด๋ฏธ ๋‹ค๋ฅธ class์™€์˜ ๊ฒฉ์ฐจ๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ์˜ํ–ฅ ์—†์Œ. ์ฆ‰, ๋ฐ์ดํ„ฐ์˜ ๋ณ€ํ™”์— ๋‘”๊ฐํ•˜๋‹ค๊ณ  ํ•ด์„ ๊ฐ€๋Šฅํ•˜๋‹ค. score์˜ ์ˆซ์ž ๊ทธ ์ž์ฒด๋ณด๋‹ค๋Š” ์ •๋‹ต ํด๋ž˜์Šค์™€ ๋‹ค๋ฅธ ํด๋ž˜์Šค๊ฐ„์˜ ์ฐจ์ด๊ฐ€ ์ค‘์š”ํ•˜๋‹ค.
โ“ ์ตœ์†Ÿ๊ฐ’ / ์ตœ๋Œ“๊ฐ’์€?
๐Ÿ‘‰ ์ตœ์†Œ๋Š” 0, ์ตœ๋Œ€๋Š” ๋ฌดํ•œ๋Œ€
โ“ ์ •๋‹ต ํด๋ž˜์Šค๋ฅผ ์ œ์™ธํ•˜์ง€ ์•Š๊ณ  ๊ณ„์‚ฐํ•˜๋ฉด?
๐Ÿ‘‰ ํ‰๊ท  ๊ฐ’์ด 1 ์ฆ๊ฐ€ํ•œ๋‹ค. ์ด๋ ‡๊ฒŒ ๋˜๋ฉด loss์˜ ์ตœ์†Ÿ๊ฐ’์ด 1์ด ๋˜๋ฏ€๋กœ ์ •๋‹ต ํด๋ž˜์Šค๋ฅผ ์ œ์™ธํ•˜์—ฌ ์ตœ์†Ÿ๊ฐ’์ด 0์ด ๋˜๋„๋ก ํ•œ๋‹ค.
โ“ Loss ๊ฐ’์ด 0์ธ W๋Š” ์œ ์ผํ•œ๊ฐ€?
๐Ÿ‘‰ ์•„๋‹ˆ๋‹ค! W์˜ loss๊ฐ€ 0์ด๋ผ๋ฉด 2W ์—ญ์‹œ 0์˜ loss๋ฅผ ๊ฐ–๋Š”๋‹ค.

Regularization

image

์ง€๊ธˆ๊นŒ์ง€ ํ•œ๊ฒƒ์€ training set์— ๋Œ€ํ•ด W ๊ฐ’์„ ๋งž์ถฐ์ค€ ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์šฐ๋ฆฌ๊ฐ€ ๊ถ๊ทน์ ์œผ๋กœ ์›ํ•˜๋Š” ๊ฒƒ์€ test set์—๋„ ์ž‘์šฉํ•˜๋„๋ก ํ•˜๋Š” ๊ฒƒ! ์ด๋•Œ training set์— overfitting ๋œ๋Š” ๊ฒƒ์„ ๋ง‰์•„์ค„ ์ˆ˜ ์žˆ๋Š”๊ฒŒ Regularization์ด๋‹ค. ์•„๋ž˜์™€ ๊ฐ™์ด ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ Regularization์ด ์žˆ๋‹ค.

image

Softmax Classifier

image

  • Softmax์˜ ์ž‘๋™ ์›๋ฆฌ
    1. ๊ฐ ํด๋ž˜์Šค๋งˆ๋‹ค score๋ฅผ ๊ตฌํ•˜๊ณ  exp๋ฅผ ๊ณฑํ•œ๋‹ค.
    2. ์ด ์ˆ˜๋“ค์„ normalizationํ•ด์„œ ํ™•๋ฅ ๋กœ ๋งŒ๋“ค์–ด์ค€๋‹ค. ์ฆ‰ ์ „๋ถ€ ๋”ํ•˜๋ฉด 1์ด ๋˜๋„๋ก ํ•œ๋‹ค.
    3. ์ด ๊ฐ’์— -log๋ฅผ ์”Œ์šด๋‹ค. ํ™•๋ฅ ์ด 0์— ๊ฐ€๊นŒ์šด ๊ฒฝ์šฐ loss๊ฐ€ ๋ฌดํ•œ๋Œ€๋กœ ๊ฐ€๊ณ , ํ™•๋ฅ ์ด 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก loss๊ฐ€ 0์— ๊ฐ€๊นŒ์›Œ์ง„๋‹ค.

โ“ ์ตœ์†Ÿ๊ฐ’ / ์ตœ๋Œ“๊ฐ’์€?
๐Ÿ‘‰ ์ด๋ก ์ ์œผ๋กœ๋Š” ์ตœ์†Ÿ๊ฐ’์ด 0, ์ตœ๋Œ“๊ฐ’์€ ๋ฌดํ•œ๋Œ€์ง€๋งŒ ์‹ค์ œ๋กœ๋Š” ๋‚˜์˜ฌ ๊ฐ€๋Šฅ์„ฑ ๊ฑฐ์˜ ์—†๋‹ค.
โ“ ๋ฐ์ดํ„ฐ์˜ score๋ฅผ ์กฐ๊ธˆ ๋ฐ”๊พผ๋‹ค๋ฉด?
๐Ÿ‘‰ ํ™•๋ฅ ๋กœ ๊ณ„์‚ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ์˜ ๋ณ€ํ™”์— ๋ฏผ๊ฐํ•˜๊ฒŒ ๋ฐ˜์‘ํ•œ๋‹ค(SVM๊ณผ ๋Œ€๋น„๋จ)

image

Optimization

  1. Random Search
    ๋ง ๊ทธ๋Œ€๋กœ ๋žœ๋คํ•˜๊ฒŒ ์ฐพ๋Š” ๋ฐฉ๋ฒ•. ๋ณ„๋กœ์ž„! ์‹ค์ œ๋กœ ์“ฐ์ง€ ๋ง์•„๋ผ
  2. Follow the slope : Gradient Descent
  • Numerical Method
    image
    ํ•˜๋‚˜ํ•˜๋‚˜ ์ฐจ์ด๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•
    ๊ต‰์žฅํžˆ ๋น„ํšจ์œจ์ ์ž„!
  • Analytic Gradient
    image
    ๋ฏธ๋ถ„์„ ์จ์„œ ํ•œ๋ฒˆ์— ๊ตฌํ•˜์ž! ์ •ํ™•ํ•˜๊ณ  ๋น ๋ฅด์ง€๋งŒ ์—๋Ÿฌ ๋‚˜์˜ฌ ๊ฐ€๋Šฅ์„ฑ ๋†’์Œ.

image

์•ž์— -๋ฅผ ๋ถ™์—ฌ์„œ ์Œ์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ฐ–๋Š”๋‹ค๋ฉด +๋ฐฉํ–ฅ, ์–‘์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ฐ–๋Š”๋‹ค๋ฉด -๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€๋„๋ก ํ•œ๋‹ค.
์˜ฌ๋ฐ”๋ฅธ Step size(Learning rate)๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์ธ ์ง€์ ์— ๊ฐ€์žฅ ๋น ๋ฅด๊ฒŒ ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ.
adam optimizer, rms prop ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์„ ๋ฐฐ์šธ ๊ฒƒ์ด๋‹ค!

Stochastic Gradient Descent(SGD)

image

์ง€๊ธˆ๊นŒ์ง€๋Š” ๋ชจ๋“  N์„ ํ•œ๋ฒˆ์— ๊ณ„์‚ฐํ–ˆ๋‹ค.
์ด๋Š” N์ด ๊ต‰์žฅํžˆ ์ปค์งˆ ๊ฒฝ์šฐ ๋งค์šฐ ๋น„ํšจ์œจ์ ์ด๊ณ  ๋А๋ฆฐ ๋ฐฉ์‹์ด๋‹ค. W๊ฐ€ ํ•œ๋ฒˆ ์—…๋ฐ์ดํŠธ ๋˜๋ ค๋ฉด ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๊ณ„์‚ฐํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
--> ์ด๋–„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด SGD์ด๋‹ค.
minibatch๋ฅผ ์„ค์ •ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณดํ†ต 32, 64, 128 ๋“ฑ์˜ ์ˆซ์ž๋กœ ์ž˜๋ผ์„œ ์‚ฌ์šฉํ•œ๋‹ค.

โ— Image Features
์ด์ „๊นŒ์ง€๋Š” ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ๊ทธ๋ƒฅ ์‚ฌ์šฉํ–ˆ๋‹ค๋ฉด, ํŠน์ง•๋“ค์„ ๋ฝ‘์•„๋‚ด๊ณ  ์ด๋ฅผ linear regression์— ์ด์šฉํ•˜๋Š” ๋ฐฉ์‹์ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค.

  1. Color Histogram
    ์–ด๋–ค color๊ฐ€ ๋งŽ์ด ๋‚˜์˜ค๋Š”์ง€ count๋ฅผ ์„ธ์–ด ํŠน์ง•์„ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ์‹.
  2. Histogram of Oriented Gradients(HoG)
    ๋ฐฉํ–ฅ ๊ฐ’์„ ํžˆ์Šคํ† ๊ทธ๋žจ์œผ๋กœ ๋‚˜ํƒ€๋‚ด์–ด ํŠน์ง• ์ถ”์ถœ
  3. Bag of Words
    ์ž์—ฐ์–ด์ฒ˜๋ฆฌ์—์„œ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ์‹

ํ˜„์žฌ๋Š” ์ž…๋ ฅํ•œ ์ด๋ฏธ์ง€์—์„œ ์Šค์Šค๋กœ ํŠน์ง•์„ ๋ฝ‘์•„๋‚ด๋„๋ก ํ•˜๋Š” CNN์ด ์ฃผ๋กœ ์‚ฌ์šฉ๋จ

Comments