ROC Curve

Machine Learning

ROC Curve

devson 2024. 5. 23. 23:38

ROC Curve

ROC(Receiver Operating Characteristic) Curve는 threshold에 대한 이진 분류기 모델의 성능을 표현한 것이다.

위 ROC Curve 그래프를 보면 이는 threshold에 따른 false positive rate와 true positive rate에 대한 그래프이다.

threshold: 이진 분류에 대한 임계값으로 이 값이 커질 수록 true라고 판단하는 비율이 줄어든다.
- e.g.) threshold가 0.3이면 true일 예측 확률이 0.3 이상이면 true라고 판단한다. (일반적으로는 0.5)
false positive rate: $\frac{\text{FP}}{\text{FP} + \text{TN}}$
- 실제 negative 중에서 positive로 잘못 판단한 비율
true positive rate: $\frac{\text{TP}}{\text{TP} + \text{FN}}$
- 실제 positive 중에서 positive로 옳게 판단한 비율

예측 \ 실제	True	False
True	TP, True Positive	FP, False Posotive
False	FN, False Negative	TN, True Negative

ROC Curve 그래프를 이진 분류기 모델의 각 클래스에 대한 분류 확률 밀도 그래프와 함께 보면 아래와 같다.

이때 좌측의 그래프에 있는 세로 점선은 threshold로,

이 threshold가 좌우로 움직이는 것은 우측 ROC Curve 그래프에서의 위치가 반대로 움직이는 것과 같다.

예를 들어 threshold가 낮을 수록 쉽게 positive라고 판단을 하기 때문에

실제 negative 중에서 positive로 잘못 판단할 확률(false positive rate)이 높아지고
실제 positive 중에서 positive로 옳게 판단할 확률(true positive rate)도 높아진다.

여기서 false positive rate와 true positive rate는 threshold에 따른 일종의 비례 관계를 갖는 것을 확인할 수 있다.

(false positive rate와 true positive rate가 threshold와 비례라는 얘기는 아니다)

분류기 모델 성능에 따른 ROC Curve 변화

모델 성능이 좋아졌을 때

클래스를 더 정확하게 분류할 수 있게 되니 true positive rate가 높아지게 되고 반대로 false posotive rate는 낮아지게 될 것이다.

그렇기 때문에 ROC Curve 그래프는 아래와 같이 좌상단에 더 가깝게 휘어진다.

모델 성능이 안좋아졌을 때

클래스를 잘못 분류하는 경우가 많아져 true positive rate가 낮아지게 되고 반대로 false posotive rate는 높아지게 될 것이다.

그렇기 때문에 ROC Curve 그래프는 아래와 같이 우하단에 더 가깝게 휘어진다.

이렇듯 ROC Curve 그래프를 보고 성능을 직관적으로 확인할 수 있게되며, 그래프가 좌상단에 가깝게 휘어 있으면 성능이 좋다는 것을 의미한다.

또한 성능이 좋아질 수록 ROC Curve 하단의 면적이 넓어지게 되는데, 이 면적은 이진 분류기 모델의 성능을 측정하는 또 하나의 지표가 된다.

이 면적을 AUC(Area Under the Curve)라고 한다.

(Area Under the Curve는 범용적으로 쓰이는 용어로, 꼭 ROC Curve에 대해서만 사용하는 용어가 아니다)

(관련된 데모 예제는 이 블로그 글의 상단 그래프를 참고하길 바란다 - 손으로 움직이면서 이해하는 것을 추천한다)

코드 예제

간단한 이진 분류 예제를 통해 scikit-learn으로 ROC Curve와 AUC를 구하는 방법에 대해 알아보도록 하겠다.

이진 분류 데이터셋으로 scikit-learn에서 제공하는 유방암 종양 이진 분류 데이터셋을 사용한다.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

RANDOM_SEED = 42

breast_cancer_data = load_breast_cancer()
X = breast_cancer_data.data
y = breast_cancer_data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.3, stratify=y,
                                                    random_state=RANDOM_SEED)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

# ((398, 30), (171, 30), (398,), (171,))

이진 분류기 모델로 LogisticRegression을 사용하여 모델을 훈련시킨다.
- 이 모델의 test accuracy는 약 0.9415가 나왔다.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# train
model = LogisticRegression(random_state=RANDOM_SEED)
model.fit(X_train, y_train)

# evaluate
y_pred = model.predict(X_test)
accuracy_score(y_test, y_pred)

# 0.9415204678362573

sklearn.metrics.roc_curve(y_true, y_pred)를 사용하여 ROC Curve 데이터를 가져온다.
- 리턴되는 값은 threshold에 대한 false positive rate, true positive rate이다.

from sklearn.metrics import roc_curve

positive_prediction_probabilities = model.predict_proba(X_test)[:, 1]
false_positive_rates, true_positive_rates, thresholds = \
    roc_curve(y_test, positive_prediction_probabilities)

for fpr, tpr, th in zip(false_positive_rates, true_positive_rates, thresholds):
    print(f"Threshold: {th} - FPR: {fpr}, TPR: {tpr}")
    
# Threshold: inf - FPR: 0.0, TPR: 0.0
# Threshold: 0.9999550596085504 - FPR: 0.0, TPR: 0.009345794392523364
# Threshold: 0.9848986625939533 - FPR: 0.0, TPR: 0.6261682242990654
# Threshold: 0.9834229169982125 - FPR: 0.015625, TPR: 0.6261682242990654
# Threshold: 0.9391698962484204 - FPR: 0.015625, TPR: 0.8130841121495327
# Threshold: 0.9347940251848812 - FPR: 0.03125, TPR: 0.8130841121495327
# Threshold: 0.8625769438111659 - FPR: 0.03125, TPR: 0.9065420560747663
# Threshold: 0.8475391852064701 - FPR: 0.0625, TPR: 0.9065420560747663
# Threshold: 0.7023202565248234 - FPR: 0.0625, TPR: 0.9719626168224299
# Threshold: 0.6904250200409706 - FPR: 0.078125, TPR: 0.9719626168224299
# Threshold: 0.6791986522466532 - FPR: 0.078125, TPR: 0.9813084112149533
# Threshold: 0.5887847716439181 - FPR: 0.125, TPR: 0.9813084112149533
# Threshold: 0.35883415654168577 - FPR: 0.125, TPR: 1.0
# Threshold: 1.4799762153953177e-45 - FPR: 1.0, TPR: 1.0

ROC curve plot

import matplotlib.pyplot as plt

plt.plot(false_positive_rates, true_positive_rates)
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")

AUC 구하기

from sklearn.metrics import roc_auc_score

roc_auc = roc_auc_score(y_test, positive_prediction_probabilities)
roc_auc

혹은 ROC Curve plot 시 AUC를 같이 plot 할 수도 있다.

from sklearn.metrics import roc_auc_score
import matplotlib.pyplot as plt

roc_auc = roc_auc_score(y_test, positive_prediction_probabilities)

plt.plot(false_positive_rates, true_positive_rates)
plt.title(f"AUC: {roc_auc:.4f}")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")

최적의 threshold (optimal cut-off point) 찾기

이진 분류기 모델의 분류 임계점인 thresold는 기본적으로 0.5로 사용되는데,
ROC Curve를 통해서 최적의 분류 성능을 갖는 threshold(optimal cut-off point)를 구할 수 있다.

그 방법으로는 여러가지가 있는데, (참고)
그 중 자주 사용되는 방법인 Youden's Index를 통해 최적의 threshold를 찾는 방법을 알아보겠다.

Youden's Index(J)는 아래 그래프에서의 J 이고, true positive rate와 false positive rate의 차를 통해 구할 수 있다.
각 threshold 별 J 를 구한 뒤, 가장 큰 J 를 갖는 threshold가 최적의 threshold로 볼 수 있다.

코드 예제

threshold 별 Youden's Index 하기

for fpr, tpr, th in zip(false_positive_rates, true_positive_rates, thresholds):
    j = tpr - fpr
    print(f"Threshold: {th} - youden index {j}")
    
# Threshold: inf - youden's index 0.0
# Threshold: 0.9999550596085504 - youden's index 0.009345794392523364
# Threshold: 0.9848986625939533 - youden's index 0.6261682242990654
# Threshold: 0.9834229169982125 - youden's index 0.6105432242990654
# Threshold: 0.9391698962484204 - youden's index 0.7974591121495327
# Threshold: 0.9347940251848812 - youden's index 0.7818341121495327
# Threshold: 0.8625769438111659 - youden's index 0.8752920560747663
# Threshold: 0.8475391852064701 - youden's index 0.8440420560747663
# Threshold: 0.7023202565248234 - youden's index 0.9094626168224299
# Threshold: 0.6904250200409706 - youden's index 0.8938376168224299
# Threshold: 0.6791986522466532 - youden's index 0.9031834112149533
# Threshold: 0.5887847716439181 - youden's index 0.8563084112149533
# Threshold: 0.35883415654168577 - youden's index 0.875
# Threshold: 1.4799762153953177e-45 - youden's index 0.0

Youden's Index를 통해 최적의 threshold를 구하기

import numpy as np

optimal_threshold_index = np.argmax(true_positive_rates - false_positive_rates)
fpr, tpr, optimal_threshold = (
	false_positive_rates[optimal_threshold_index], 
    true_positive_rates[optimal_threshold_index], 
    thresholds[optimal_threshold_index])
j = tpr - fpr
print(f"{optimal_threshold=}, {j=}, {fpr=}, {tpr=}")

# optimal_threshold=0.7023202565248234, j=0.9094626168224299, fpr=0.0625, tpr=0.9719626168224299

optimal threshold plot

from sklearn.metrics import roc_auc_score
import matplotlib.pyplot as plt

roc_auc = roc_auc_score(y_test, positive_prediction_probabilities)

plt.plot(false_positive_rates, true_positive_rates)
# text 추가
plt.plot(fpr, tpr, "r*")
plt.text(fpr+0.02, tpr-0.07, f"Optimal Threshold: {optimal_threshold:.4f}", fontsize=12, color="red")

plt.title(f"AUC: {roc_auc:.4f}")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")

참고

ROC Curve

ROC Curve

분류기 모델 성능에 따른 ROC Curve 변화

모델 성능이 좋아졌을 때

모델 성능이 안좋아졌을 때

코드 예제

최적의 threshold (optimal cut-off point) 찾기

코드 예제

모델 성능이 안좋아졌을 때