๐Ÿฆญ AI&Big Data/ML

๊ฒฐ์ • ํŠธ๋ฆฌ & ์•™์ƒ๋ธ”

๊ณ„๋ž€์†Œ๋…„ 2025. 3. 5. 15:42

1. ๊ฒฐ์ • ํŠธ๋ฆฌ

 

  • ๋ฐ์ดํ„ฐ๋ฅผ ํŠธ๋ฆฌ ๊ตฌ์กฐ๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ์ง€๋„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜
  • ๊ฐ ๋…ธ๋“œ์—์„œ ํŠน์ • ๊ธฐ์ค€์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„ํ• ํ•˜๊ณ , ์ตœ์ข…์ ์œผ๋กœ ๊ฐ ๋ฆฌํ”„ ๋…ธ๋“œ์—์„œ ์˜ˆ์ธก ๊ฐ’์„ ๊ฒฐ์ •

 

ํŠน์ง•

  • ํ•ด์„์ด ์‰ฌ์šฐ๋ฉฐ ์‹œ๊ฐ์ ์œผ๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅ
  • ๊ณผ๋Œ€์ ํ•ฉ(overfitting)๋  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Œ
  • ๋ถ„ํ•  ๊ธฐ์ค€์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์ด ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Œ

 

์‚ฌ์ดํ‚ท๋Ÿฐ์„ ํ™œ์šฉํ•œ ๊ฒฐ์ • ํŠธ๋ฆฌ ๊ตฌํ˜„

from sklearn.tree import DecisionTreeClassifier

# ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ํ•™์Šต
dt = DecisionTreeClassifier(max_depth=3, random_state=42)
dt.fit(X_train, y_train)

# ์˜ˆ์ธก
y_pred = dt.predict(X_test)

 

๊ต์ฐจ ๊ฒ€์ฆ๊ณผ ๊ฒ€์ฆ ์„ธํŠธ

 

๊ต์ฐจ ๊ฒ€์ฆ 

  • ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•  ๋•Œ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ ๋ฒˆ ํ•™์Šต ๋ฐ ๊ฒ€์ฆํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๋”์šฑ ์‹ ๋ขฐ์„ฑ ์žˆ๊ฒŒ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•
  • K-ํด๋“œ ๊ต์ฐจ ๊ฒ€์ฆ: ๋ฐ์ดํ„ฐ๋ฅผ K๊ฐœ์˜ ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋ˆ„์–ด, K-1๊ฐœ์˜ ํด๋“œ๋ฅผ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ, ๋‚˜๋จธ์ง€ 1๊ฐœ์˜ ํด๋“œ๋ฅผ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉํ•˜์—ฌ K๋ฒˆ ๋ฐ˜๋ณต
from sklearn.model_selection import cross_val_score

scores = cross_val_score(dt, X_train, y_train, cv=5)
print("๊ต์ฐจ ๊ฒ€์ฆ ์ ์ˆ˜:", scores.mean())

 

๊ฒ€์ฆ ์„ธํŠธ

๋ฐ์ดํ„ฐ๋ฅผ ํ›ˆ๋ จ, ๊ฒ€์ฆ, ํ…Œ์ŠคํŠธ3๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด, ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ์‹œ ๊ฒ€์ฆ ์„ธํŠธ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ตœ์ ์˜ ๋ชจ๋ธ์„ ์ฐพ์Œ

from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

 

2. ์•™์ƒ๋ธ” ํ•™์Šต

 

์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ํ•˜๋‚˜์˜ ๋ฉ”ํƒ€ ๋ถ„๋ฅ˜๊ธฐ๋กœ ์—ฐ๊ฒฐํ•˜์—ฌ ๊ฐœ๋ณ„ ๋ถ„๋ฅ˜๊ธฐ๋ณด๋‹ค ๋” ์ข‹์€ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋Š” ๊ธฐ๋ฒ•

์ข…๋ฅ˜

  • ๋ฐฐ๊น… (Bagging): ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šต์‹œ์ผœ ๋‹ค์ˆ˜๊ฒฐ ํˆฌํ‘œ๋กœ ์ตœ์ข… ์˜ˆ์ธก์„ ๊ฒฐ์ •
  • ๋ถ€์ŠคํŒ… (Boosting): ์ด์ „ ๋ถ„๋ฅ˜๊ธฐ์˜ ์˜ค์ฐจ๋ฅผ ์ค„์ด๋Š” ๋ฐฉ์‹์œผ๋กœ ์ˆœ์ฐจ์ ์œผ๋กœ ํ•™์Šต
  • ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ (Random Forest): ๋ฐฐ๊น…์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๊ฒฐ์ • ํŠธ๋ฆฌ ์•™์ƒ๋ธ” ๋ชจ๋ธ

 

๋ฐฐ๊น…

 

  • Bootstrap Aggregating์˜ ์•ฝ์ž
  • ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจ๋ธ์„ ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šต์‹œํ‚ค๊ณ  ๊ทธ ์˜ˆ์ธก์„ ๊ฒฐํ•ฉํ•˜๋Š” ๊ธฐ๋ฒ•
  • ๊ฒฐ์ • ํŠธ๋ฆฌ์™€ ๊ฐ™์€ ์•ฝํ•œ ํ•™์Šต๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ๊ฐ ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋žœ๋คํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ ํ•™์Šต

 

๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ

 

์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ฒฐ์ • ํŠธ๋ฆฌ๋ฅผ ํ•™์Šต์‹œํ‚ค๊ณ , ์ด๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ตœ์ข… ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜ ๋ชจ๋ธ

from sklearn.ensemble import RandomForestClassifier

# ๋ชจ๋ธ ์ƒ์„ฑ
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# ์˜ˆ์ธก
y_pred = rf.predict(X_test)

 

์žฅ์ 

  • ๋ชจ๋ธ์˜ ๋…๋ฆฝ์  ํ•™์Šต: ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ๋ณ‘๋ ฌ๋กœ ํ•™์Šต์‹œํ‚ค๊ณ , ๊ฐ ๋ชจ๋ธ์€ ์„œ๋กœ ๋‹ค๋ฅธ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ํ›ˆ๋ จ
  • Bootstrap Sampling: ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๋ณต์› ์ถ”์ถœํ•˜์—ฌ ๊ฐ ํŠธ๋ฆฌ๋ฅผ ํ•™์Šต -> ๊ฐœ๋ณ„ ๊ฒฐ์ • ํŠธ๋ฆฌ๋ณด๋‹ค ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚จ
  • ๋‹ค์–‘์„ฑ ํ™•๋ณด: ๊ฐ ํŠธ๋ฆฌ์—์„œ ํŠน์ง• ์„ ํƒ์— ์žˆ์–ด์„œ๋„ ์ผ๋ถ€ ํŠน์ง•๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ๊ณผ๋Œ€์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์Œ
  • ํŠน์„ฑ ์ค‘์š”๋„๋ฅผ ์ž๋™์œผ๋กœ ๊ณ„์‚ฐ ๊ฐ€๋Šฅ

 

๋ถ€์ŠคํŒ… 

 

์•ฝํ•œ ํ•™์Šต๊ธฐ๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ํ•™์Šตํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•

์—์ด๋‹ค๋ถ€์ŠคํŠธ (AdaBoost)

  • ๊ฐ„๋‹จํ•œ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์กฐํ•ฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋Š” ๊ธฐ๋ฒ•
  • ์ž˜๋ชป ๋ถ„๋ฅ˜๋œ ์ƒ˜ํ”Œ์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ํ•™์Šต์„ ๊ฐ•ํ™”

 

๊ทธ๋ž˜๋””์–ธํŠธ ๋ถ€์ŠคํŒ… (Gradient Boosting)

  • ์ด์ „ ํŠธ๋ฆฌ์˜ ์˜ค์ฐจ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒˆ๋กœ์šด ํŠธ๋ฆฌ๋ฅผ ํ•™์Šต
  • ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚˜์ง€๋งŒ ๊ณผ๋Œ€์ ํ•ฉ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Œ

 

๋ฐฐ๊น…๊ณผ ๋ถ€์ŠคํŒ… ๋น„๊ต

 

๊ธฐ๋ฒ• ์ž‘๋™ ๋ฐฉ์‹ ์žฅ์  ๋‹จ์ 
๋ฐฐ๊น… (Bagging) ๋…๋ฆฝ์ ์ธ ๋ชจ๋ธ์„ ๋ณ‘๋ ฌ ํ•™์Šต ํ›„ ๋‹ค์ˆ˜๊ฒฐ ํˆฌํ‘œ ๋ถ„์‚ฐ ๊ฐ์†Œ, ๊ณผ๋Œ€์ ํ•ฉ ๋ฐฉ์ง€ ํŽธํ–ฅ ๊ฐ์†Œ ํšจ๊ณผ๋Š” ์ ์Œ
๋ถ€์ŠคํŒ… (Boosting) ์ด์ „ ๋ชจ๋ธ์˜ ์˜ค์ฐจ๋ฅผ ๋ฐ˜์˜ํ•˜๋ฉฐ ์ˆœ์ฐจ์  ํ•™์Šต ํŽธํ–ฅ๊ณผ ๋ถ„์‚ฐ ๋ชจ๋‘ ์ค„์ผ ์ˆ˜ ์žˆ์Œ ๊ณผ๋Œ€์ ํ•ฉ ๊ฐ€๋Šฅ์„ฑ