glowing713
Frontend-Deep-Dive
glowing713
์ „์ฒด ๋ฐฉ๋ฌธ์ž
์˜ค๋Š˜
์–ด์ œ
  • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (97)
    • Languages (11)
      • JavaScript ๐Ÿ’› (3)
      • Python ๐Ÿ (4)
      • Java โ˜•๏ธ (3)
      • Swift ๐Ÿงก (1)
    • Computer_Science (1)
      • Computer_Network ๐Ÿ•ธ (1)
    • Web_Frontend (4)
      • Vue.js (1)
    • Problem_Solving (76)
    • Server (1)
      • Spring ๐Ÿ€ (1)
    • AI (2)
      • NLP ๐Ÿ—ฃ (1)
      • AI_Math โž— (1)
    • ๊ฐœ๋ฐœํ™˜๊ฒฝ ๊พธ๋ฏธ๊ธฐ โœŒ (1)
    • ์ƒ๊ฐ์ •๋ฆฌ โœ๐Ÿป (1)

๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

  • ๐Ÿง‘๐Ÿปโ€๐Ÿ’ปGithub

๊ณต์ง€์‚ฌํ•ญ

์ธ๊ธฐ ๊ธ€

ํƒœ๊ทธ

  • ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค
  • ์ด๋ถ„ํƒ์ƒ‰
  • brute-force
  • BOJ
  • mst
  • Baekjoon
  • c++
  • ์นด์นด์˜ค ๊ธฐ์ถœ
  • Python
  • Java
  • Algorithm
  • binary search
  • ์™„์ „ํƒ์ƒ‰
  • ๋™์ ๊ณ„ํš๋ฒ•
  • Stack
  • boostcampaitech
  • bfs
  • DP
  • ps
  • 2019 ์นด์นด์˜ค ๊ฐœ๋ฐœ์ž ๊ฒจ์šธ ์ธํ„ด์‹ญ

์ตœ๊ทผ ๋Œ“๊ธ€

์ตœ๊ทผ ๊ธ€

ํ‹ฐ์Šคํ† ๋ฆฌ

hELLO ยท Designed By ์ •์ƒ์šฐ.
glowing713

Frontend-Deep-Dive

[NLP] BoW(Bag of words)
AI/NLP ๐Ÿ—ฃ

[NLP] BoW(Bag of words)

2021. 2. 17. 00:32

Bag of Words

 

๋‹จ์–ด ๋ฐ ๋ฌธ์„œ๋ฅผ ์ˆซ์žํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๊ธฐ๋ฒ•์œผ๋กœ์„œ

TextMining ๋ถ„์•ผ์—์„œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์ด ์ ์šฉ๋˜๊ธฐ ์ด์ „์— ๋งŽ์ด ํ™œ์šฉ๋˜๋˜ ๋ฐฉ์‹์ด๋ผ๊ณ  ํ•œ๋‹ค.

 

 

Step 1. Constructing the vocabulary containing unique words

 

Example sentences: "John really really loves this movie", "Jane really likes this song"

์ด ๋ฌธ์žฅ์—์„œ really์™€ this๋Š” ์ค‘๋ณต๋˜๊ธฐ์— ํ•œ ๋ฒˆ๋งŒ ํฌํ•จํ•˜๋ฉด ๋œ๋‹ค.

 

Vocabulary: {"John", "really", "loves", "this", "movie", "Jane", "likes", "song"}

 

 

Step 2. Encoding unique words to one-hot vectors(One-hot ์ธ์ฝ”๋”ฉ)

 

Vocabulary: {"John", "really", "loves", "this", "movie", "Jane", "likes", "song"}

  John: [1 0 0 0 0 0 0 0]

 really: [0 1 0 0 0 0 0 0]

 loves: [0 0 1 0 0 0 0 0]

    this: [0 0 0 1 0 0 0 0]

movie: [0 0 0 0 1 0 0 0]

  Jane: [0 0 0 0 0 1 0 0]

  likes: [0 0 0 0 0 0 1 0]

 song: [0 0 0 0 0 0 0 1]

 

์ด๋Ÿฐ ์‹์œผ๋กœ ๊ฐ ๋‹จ์–ด๋ฅผ One-hot vector๋กœ ๋งŒ๋“ค์–ด์ค€๋‹ค.

์ด๋•Œ, ๊ฐ ๋‹จ์–ด์˜ ์œ ์‚ฌ๋„ ์ธก์ •์„ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

1. ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ (Euclidean Distance)

2. ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ (Cosine Similarity)

 

 

1. Euclidean Distance Similarity

 

๊ฑฐ๋ฆฌ๊ฐ€ ์งง์œผ๋ฉด ์œ ์‚ฌ๋„๊ฐ€ ๋” ๋†’๋‹ค๋ผ๊ณ  ํŒ๋‹จํ•˜๋Š” ์œ ์‚ฌ๋„ ์ธก์ • ๋ฐฉ๋ฒ•์ด๋‹ค.(L2 Norm)

 

2. Cosine Similarity

 

๋‹จ์–ด์˜ ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚˜๋ฉด ์‹ค์ œ ์ •๋‹ต์— ๋ชจ์ˆœ๋˜๋Š” ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ฌ ๋•Œ๊ฐ€ ์žˆ๋‹ค.

์ด ๋ถ€๋ถ„์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ๋ฅผ ๋ฌด์‹œํ•˜๊ณ  ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„์˜ ์žฅ์ ์ด๋‹ค.

๋ฒกํ„ฐ์™€ ๋ฒกํ„ฐ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๋น„๊ตํ•  ๋•Œ ๋‘ ๋ฒกํ„ฐ ๊ฐ„์˜ ์‚ฌ์ž‡๊ฐ์„ ๊ตฌํ•ด์„œ ์–ผ๋งˆ๋‚˜ ์œ ์‚ฌํ•œ์ง€ ์ˆ˜์น˜๋กœ ๋‚˜ํƒ€๋‚ธ๋‹ค.

๋ฒกํ„ฐ ๋ฐฉํ–ฅ์ด ๋น„์Šทํ• ์ˆ˜๋ก ๋‘ ๋ฒกํ„ฐ๋Š” ์„œ๋กœ ์œ ์‚ฌํ•˜๋ฉฐ, ๋ฒกํ„ฐ ๋ฐฉํ–ฅ์ด 90๋„ ์ผ๋•Œ๋Š” ๋‘ ๋ฒกํ„ฐ ๊ฐ„์˜ ๊ด€๋ จ์„ฑ์ด ์—†์œผ๋ฉฐ, ๋ฒกํ„ฐ ๋ฐฉํ–ฅ์ด ๋ฐ˜๋Œ€๊ฐ€ ๋ ์ˆ˜๋ก ๋‘ ๋ฒกํ„ฐ๋Š” ๋ฐ˜๋Œ€ ๊ด€๊ณ„๋ฅผ ๋ณด์ธ๋‹ค.

 

์ฐธ๊ณ : https://sy-programmingstudy.tistory.com/13

 

[6์ฃผ์ฐจ] Youtube ํ—ˆ๋ฏผ์„ : ๋”ฅ๋Ÿฌ๋‹ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ (1์ฐจ)

https://www.youtube.com/playlist?list=PLVNY1HnUlO26qqZznHVWAqjS1fWw0zqnT ๋”ฅ๋Ÿฌ๋‹ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ - YouTube www.youtube.com  ์ด ๊ธ€์€ Youtube ํ—ˆ๋ฏผ์„๋‹˜์˜ ๋”ฅ๋Ÿฌ๋‹ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๊ฐ•์˜ ๋ชฉ๋ก 13๊ฐœ๋ฅผ ์ˆ˜๊ฐ•ํ•˜๊ณ  ์ •๋ฆฌ..

sy-programmingstudy.tistory.com

 

์œ„์—์„œ ์ƒ์„ฑํ•œ ๋ฒกํ„ฐ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•˜๋‚˜์˜ ๋ฌธ์žฅ/๋ฌธ์„œ๋ฅผ One-hot ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

"John really really loves this movie" => John + really + really + loves + this + movie => [1 2 1 1 1 0 0 0]

"Jane really likes this song" => Jane + really + likes + this + song => [0 1 0 1 0 1 1 1]

์ €์ž‘์žํ‘œ์‹œ ๋น„์˜๋ฆฌ ๋ณ€๊ฒฝ๊ธˆ์ง€ (์ƒˆ์ฐฝ์—ด๋ฆผ)
    glowing713
    glowing713

    ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”