๐Ÿฆญ AI&Big Data/DL

์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ

๊ณ„๋ž€์†Œ๋…„ 2025. 3. 16. 20:48

๊ฐœ๋…

 

  • ์‹œ๊ฐ„์˜ ํ๋ฆ„์— ๋”ฐ๋ผ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋กœ, ํŠน์ • ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์„ ๋‘๊ณ  ์—ฐ์†์ ์œผ๋กœ ๊ด€์ธก๋œ ๊ฐ’์„ ์˜๋ฏธ
  • ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ๋ณ€ํ™”์™€ ํŒจํ„ด์„ ๋ถ„์„ํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘๊ธฐ ๋•Œ๋ฌธ์—, ์˜ˆ์ธก, ํŠธ๋ Œ๋“œ ๋ถ„์„, ์ด์ƒ ํƒ์ง€ ๋“ฑ์— ์œ ์šฉํ•˜๊ฒŒ ํ™œ์šฉ

 

๋ชจ๋ธ


์ „ํ†ต์ ์ธ ์‹œ๊ณ„์—ด ๋ถ„์„ ๋ชจ๋ธ

 

  • AR (Auto-Regressive) ๋ชจ๋ธ: ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ(์ž๊ธฐ ํšŒ๊ท€)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ˜„์žฌ ๊ฐ’์„ ์˜ˆ์ธก
  • MA (Moving Average) ๋ชจ๋ธ: ๊ณผ๊ฑฐ์˜ ์˜ค์ฐจ(์ž”์ฐจ) ๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ˜„์žฌ ๊ฐ’์„ ์˜ˆ์ธก
  • ARMA (Auto-Regressive Moving Average): AR๊ณผ MA ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•œ ๋ฐฉ์‹
  • ARIMA (Auto-Regressive Integrated Moving Average) ๋ชจ๋ธ: ๋น„์ •์ƒ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์ฐจ๋ถ„ํ•˜์—ฌ ์•ˆ์ •์ ์œผ๋กœ ๋งŒ๋“  ํ›„ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ
  • SARIMA (Seasonal ARIMA) ๋ชจ๋ธ: ๊ณ„์ ˆ์„ฑ์„ ๊ณ ๋ คํ•œ ARIMA ๋ชจ๋ธ
  • VAR (Vector Auto-Regression) ๋ชจ๋ธ: ๋‹ค๋ณ€๋Ÿ‰ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜๋Š” ๋ชจ๋ธ
  • Exponential Smoothing ๋ชจ๋ธ: ์ตœ๊ทผ ๋ฐ์ดํ„ฐ์— ๋” ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋‘์–ด ์˜ˆ์ธก

 

๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ

 

์ „ํ†ต์ ์ธ ํ†ต๊ณ„ ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์ง€ ์•Š๊ฑฐ๋‚˜ ํŒจํ„ด์ด ๋ช…ํ™•ํ•  ๋•Œ ์œ ์šฉํ•œ ๋ฐ˜๋ฉด, ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ ๋ณต์žกํ•œ ํŒจํ„ด๊ณผ ๋น„์„ ํ˜•์„ฑ์„ ์ž˜ ํ•™์Šตํ•˜๋Š” ๋ฐ ๊ฐ•์ ์ด ์žˆ๋‹ค.

  • RNN: ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ๊ธฐ์–ตํ•˜๋ฉฐ ํ•™์Šตํ•˜์ง€๋งŒ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋กœ ์ธํ•ด ์žฅ๊ธฐ ์ข…์†์„ฑ ํ•™์Šต์ด ์–ด๋ ค์›€
  • LSTM: RNN์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ ๋ชจ๋ธ๋กœ, ์žฅ๊ธฐ ์ข…์†์„ฑ์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ์ž…๋ ฅ ๊ฒŒ์ดํŠธ, ๋ง๊ฐ ๊ฒŒ์ดํŠธ, ์ถœ๋ ฅ ๊ฒŒ์ดํŠธ๋ฅผ ์‚ฌ์šฉํ•ด ์ค‘์š”ํ•œ ์ •๋ณด๋งŒ ์ €์žฅ
  • GRU: LSTM๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ ๋” ๋‹จ์ˆœํ•œ ๊ตฌ์กฐ๋กœ ๋น ๋ฅด๊ฒŒ ํ•™์Šต ๊ฐ€๋Šฅ
  • 1D CNN: ๊ตญ์†Œ์ ์ธ ํŒจํ„ด์„ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ, ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์—์„œ๋„ ์‚ฌ์šฉ๋จ
  • TCN: CNN ๊ธฐ๋ฐ˜์œผ๋กœ ์žฅ๊ธฐ ์ข…์†์„ฑ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ
  • Transformer ๊ธฐ๋ฐ˜ ๋ชจ๋ธ: ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์—์„œ ๋ฐœ์ „ํ•œ ๋ชจ๋ธ

 

ํŠน์ง•

 

  1. ์‹œ๊ฐ„ ์˜์กด์„ฑ: ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ์‹œ๊ฐ„์˜ ํ๋ฆ„์— ๋”ฐ๋ผ ์—ฐ์†์ ์œผ๋กœ ๊ธฐ๋ก๋œ ํ˜•ํƒœ์ด๋ฏ€๋กœ, ์ด์ „ ์‹œ์ ๊ณผ ์ดํ›„ ์‹œ์  ๊ฐ„์˜ ๊ด€๊ณ„๊ฐ€ ์ค‘์š”
  2. ์ž๊ธฐ ์ƒ๊ด€์„ฑ: ์ด์ „ ๊ฐ’๋“ค์ด ์ดํ›„ ๊ฐ’์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Œ(์ž๊ธฐ ์ƒ๊ด€์„ฑ)
  3. ํŠธ๋ Œ๋“œ์™€ ๊ณ„์ ˆ์„ฑ: ์‹œ๊ฐ„์˜ ํ๋ฆ„์— ๋”ฐ๋ผ ์ƒ์Šน ๋˜๋Š” ํ•˜๋ฝํ•˜๋Š” ์žฅ๊ธฐ์ ์ธ ํŠธ๋ Œ๋“œ, ๊ณ„์ ˆ์  ์ฃผ๊ธฐ ํŒจํ„ด

 

1. ํŠธ๋ Œ๋“œ(Trend)

  • ์žฅ๊ธฐ์ ์ธ ๋ฐฉํ–ฅ์„ฑ. ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ๊ฐ€ ๊พธ์ค€ํžˆ ์ฆ๊ฐ€ํ•˜๊ฑฐ๋‚˜ ๊ฐ์†Œํ•˜๋Š” ํŒจํ„ด์ด ์žˆ์„ ๋•Œ
  • ํŠธ๋ Œ๋“œ๋Š” ์žฅ๊ธฐ์ ์ธ ๋ณ€ํ™”
  • ์ฆ๊ฐ€/๊ฐ์†Œํ•˜๋Š” ์ผ์ •ํ•œ ๋ฐฉํ–ฅ์ด ์žˆ์Œ

 

2. ๊ณ„์ ˆ์„ฑ(Seasonality)

  • ์ผ์ •ํ•œ ์ฃผ๊ธฐ๋งˆ๋‹ค ๋ฐ˜๋ณต๋˜๋Š” ํŒจํ„ด
  • ๋ฐ˜๋ณต๋˜๋Š” ์ฃผ๊ธฐ์ ์ธ ํŒจํ„ด์ด ์žˆ์Œ
  • ๊ณ„์ ˆ์„ฑ์ด ์žˆ๋‹ค๊ณ  ํ•ด์„œ ๋ฐ˜๋“œ์‹œ ๊ณ„์ ˆ๊ณผ ๊ด€๋ จ ์žˆ๋Š” ๊ฒƒ์€ ์•„๋‹˜ (์ผ์ฃผ์ผ ๋‹จ์œ„, ํ•˜๋ฃจ ๋‹จ์œ„ ์ฃผ๊ธฐ์„ฑ๋„ ํฌํ•จ)

 

3. ์ž๊ธฐ์ƒ๊ด€(Autocorrelation)

  • ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€ ํ˜„์žฌ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์— ์œ ์˜๋ฏธํ•œ ๊ด€๊ณ„๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ
  • ์ด์ „ ๊ฐ’์ด ํ˜„์žฌ ๊ฐ’์— ์˜ํ–ฅ์„ ์ฃผ๋Š” ํŒจํ„ด
  • ์˜ค๋Š˜์˜ ์˜จ๋„๊ฐ€ ์–ด์ œ์™€ ๋น„์Šทํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Œ
  • ์ฃผ์‹ ๊ฐ€๊ฒฉ์ด ์–ด์ œ์˜ ๊ฐ€๊ฒฉ๊ณผ ์œ ์‚ฌํ•œ ํ๋ฆ„์„ ๋ณด์ผ ๊ฐ€๋Šฅ์„ฑ์ด ํผ

 

4. ์žก์Œ(Noise)

  • ํŒจํ„ด์ด ์—†๊ณ  ์˜ˆ์ธกํ•  ์ˆ˜ ์—†๋Š” ๋ฌด์ž‘์œ„ ๋ณ€๋™
  • ์ฃผ์‹ ์‹œ์žฅ์—์„œ ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ๋‰ด์Šค๋กœ ์ธํ•œ ๊ธ‰๊ฒฉํ•œ ๊ฐ€๊ฒฉ ๋ณ€ํ™”

 

import numpy as np
import matplotlib.pyplot as plt

# ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” ํ•จ์ˆ˜
def plot_series(time, series, format="-", start=0, end=None):
    """
    ์ฃผ์–ด์ง„ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ์ถœ๋ ฅํ•˜๋Š” ํ•จ์ˆ˜
    
    Parameters:
    - time: ์‹œ๊ฐ„ ๋ฐฐ์—ด (x์ถ•)
    - series: ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ๊ฐ’ (y์ถ•)
    - format: ๊ทธ๋ž˜ํ”„ ์„  ์Šคํƒ€์ผ (๊ธฐ๋ณธ๊ฐ’ "-")
    - start, end: ์ถœ๋ ฅํ•  ๊ตฌ๊ฐ„ ๋ฒ”์œ„ ์ง€์ •
    """
    plt.plot(time[start:end], series[start:end], format) 
    plt.xlabel("Time")  
    plt.ylabel("Value")  
    plt.grid(True) 

# ํŠธ๋ Œ๋“œ ํ•จ์ˆ˜ 
def trend(time, slope=0):
    """
    ์ฃผ์–ด์ง„ ๊ธฐ์šธ๊ธฐ(slope)์— ๋”ฐ๋ผ ์„ ํ˜• ํŠธ๋ Œ๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜
    
    Parameters:
    - time: ์‹œ๊ฐ„ ๋ฐฐ์—ด
    - slope: ๊ธฐ์šธ๊ธฐ (์–‘์ˆ˜๋ฉด ์ฆ๊ฐ€, ์Œ์ˆ˜๋ฉด ๊ฐ์†Œ)
    
    Returns:
    - ํŠธ๋ Œ๋“œ๊ฐ€ ๋ฐ˜์˜๋œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ
    """
    return slope * time 

# ๊ณ„์ ˆ ํŒจํ„ด ํ•จ์ˆ˜ 
def seasonal_pattern(season_time):
    """
    ๊ณ„์ ˆ์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ํŒจํ„ด์„ ์ •์˜ํ•˜๋Š” ํ•จ์ˆ˜
    
    Parameters:
    - season_time: ๊ณ„์ ˆ ๋‚ด ์‹œ๊ฐ„ ๋น„์œจ (0 ~ 1 ๋ฒ”์œ„)
    
    Returns:
    - ๊ณ„์ ˆ ํŒจํ„ด ๊ฐ’ (์ž„์˜๋กœ ์„ค์ •๋œ ํŒจํ„ด)
    """
    return np.where(season_time < 0.4,  # ๊ณ„์ ˆ ์‹œ๊ฐ„ ๋น„์œจ์ด 0.4๋ณด๋‹ค ์ž‘์„ ๋•Œ
                    np.cos(season_time * 2 * np.pi),  # ์ฝ”์‚ฌ์ธ ํ•จ์ˆ˜ ์‚ฌ์šฉ
                    1 / np.exp(3 * season_time))  # ์ง€์ˆ˜ ๊ฐ์†Œ ํŒจํ„ด

# ๊ณ„์ ˆ์„ฑ ํ•จ์ˆ˜
def seasonality(time, period, amplitude=1, phase=0):
    """
    ์ฃผ์–ด์ง„ ์ฃผ๊ธฐ(period)์— ๋”ฐ๋ผ ๋ฐ˜๋ณต๋˜๋Š” ๊ณ„์ ˆ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜
    
    Parameters:
    - time: ์‹œ๊ฐ„ ๋ฐฐ์—ด
    - period: ๊ณ„์ ˆ ์ฃผ๊ธฐ (์˜ˆ: 365์ผ = 1๋…„ ์ฃผ๊ธฐ)
    - amplitude: ๊ณ„์ ˆ์„ฑ ์ง„ํญ (ํฌ๊ธฐ ์กฐ์ ˆ)
    - phase: ์œ„์ƒ (์‹œ์ž‘ ์œ„์น˜ ์กฐ์ •)
    
    Returns:
    - ๊ณ„์ ˆ์„ฑ์ด ์ถ”๊ฐ€๋œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ
    """
    season_time = ((time + phase) % period) / period  # ์ฃผ๊ธฐ ๋‹จ์œ„๋กœ ์ •๊ทœํ™” (0~1 ๋ฒ”์œ„)
    return amplitude * seasonal_pattern(season_time)  # ๊ณ„์ ˆ ํŒจํ„ด ์ ์šฉ

# ์žก์Œ ํ•จ์ˆ˜ 
def noise(time, noise_level=1, seed=None):
    """
    ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์— ๋žœ๋ค ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ํ•จ์ˆ˜
    
    Parameters:
    - time: ์‹œ๊ฐ„ ๋ฐฐ์—ด
    - noise_level: ์žก์Œ์˜ ํฌ๊ธฐ
    - seed: ๋‚œ์ˆ˜ ๊ณ ์ •๊ฐ’ (์žฌํ˜„ ๊ฐ€๋Šฅ์„ฑ ๋ณด์žฅ)
    
    Returns:
    - ์žก์Œ์ด ์ถ”๊ฐ€๋œ ๋žœ๋ค ๊ฐ’ ๋ฐฐ์—ด
    """
    rnd = np.random.RandomState(seed)  # ๋‚œ์ˆ˜ ์ƒ์„ฑ๊ธฐ (์‹œ๋“œ ๊ณ ์ • ๊ฐ€๋Šฅ)
    return rnd.randn(len(time)) * noise_level  # ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ๋‚œ์ˆ˜ ์ƒ์„ฑ

# ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
time = np.arange(4 * 365 + 1, dtype="float32")  
baseline = 10  # ๊ธฐ๋ณธ๊ฐ’ (์ดˆ๊ธฐ ๋ ˆ๋ฒจ)
slope = 0.09  # ํŠธ๋ Œ๋“œ ๊ธฐ์šธ๊ธฐ (์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ์ฆ๊ฐ€ํ•˜๋Š” ์ •๋„)
amplitude = 15  # ๊ณ„์ ˆ์„ฑ ํฌ๊ธฐ (ํŒจํ„ด์˜ ์ง„ํญ)
noise_level = 6  # ์žก์Œ ํฌ๊ธฐ (๋ณ€๋™์„ฑ ํฌ๊ธฐ)

# ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์กฐํ•ฉ
series = baseline + trend(time, slope)  # ํŠธ๋ Œ๋“œ ์ถ”๊ฐ€
series += seasonality(time, period=365, amplitude=amplitude)  # ๊ณ„์ ˆ์„ฑ ์ถ”๊ฐ€
series += noise(time, noise_level=noise_level, seed=42)  # ์žก์Œ ์ถ”๊ฐ€

# ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”
plt.figure(figsize=(10, 6))  
plot_series(time, series)  
plt.title("Synthetic Time Series Data")
plt.show()

import numpy as np
import matplotlib.pyplot as plt

# ์ด๋™ ํ‰๊ท  ๊ณ„์‚ฐ (window_size = 5)
window_size = 5  # ์ด๋™ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•  ๋•Œ ์‚ฌ์šฉํ•  ์œˆ๋„์šฐ ํฌ๊ธฐ ์„ค์ •
# series ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด window_size 5๋กœ ์ด๋™ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•œ ํ›„, split_time - 5๋ถ€ํ„ฐ ๋๊นŒ์ง€ ์ž˜๋ผ์„œ ์ €์žฅ
moving_avg = moving_average_forecast(series, window_size)[split_time-5:]  

# ์‹œ๊ฐํ™”
plt.figure(figsize=(10, 6))  # ๊ทธ๋ž˜ํ”„ ํฌ๊ธฐ ์„ค์ •
plot_series(time_valid, x_valid)  # ์‹ค์ œ ๊ฐ’ x_valid์™€ ์‹œ๊ฐ„ time_valid๋ฅผ ์‹œ๊ฐํ™”
plot_series(time_valid, moving_avg)  # ๊ณ„์‚ฐ๋œ ์ด๋™ ํ‰๊ท ๊ฐ’ moving_avg๋ฅผ ์‹œ๊ฐํ™”

์‹ค์ œ๊ฐ’๊ณผ ์˜ˆ์ธก๊ฐ’

 

์ด ๊ฐ’์€ diff_series์— ๋Œ€ํ•ด 1๋…„ ์ฐจ์ด๋ฅผ ๊ตฌํ•˜๊ณ , ๊ทธ์— ๋Œ€ํ•œ 5์ผ ์ด๋™ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•œ ํ›„, ์ตœ๊ทผ 1์ฃผ์ผ์˜ 5์ผ ์ด๋™ ํ‰๊ท ์„ ๋”ํ•œ ์˜ˆ์ธก๊ฐ’

์—๋Ÿฌ์œจ์ด ๊ฐ์†Œํ•œ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์ด๊ฑฐ๋Š” ์ด์ œ ์—†์•ค ๊ณ„์ ˆ์„ฑ์„ ๋‹ค์‹œ ์ค€๊ฒƒ -> ๊ฒฐ๊ณผ์ ์œผ๋กœ, diff_moving_avg_plus_smooth_past๋Š” ๊ณ„์ ˆ์„ฑ์„ ์ œ์™ธํ•œ ์ฐจ๋ถ„์น˜์— ์ตœ๊ทผ 1์ฃผ์ผ์˜ ๋ณ€ํ™”๋ฅผ ์ถ”๊ฐ€ํ•œ ๊ฐ’์œผ๋กœ, ๊ณ„์ ˆ์„ฑ์ด ๋ฐ˜์˜๋œ ์˜ˆ์ธก์ด ๋œ๋‹ค.