This is the harmonic mean of precision and recall, because that doesn’t give an unfair weight to extreme values.
The higher the F1 score more is the predictive power of the classification model. A score close to 1 means a perfect model, however, score close to 0 shows decrement in the model’s predictive capability.
Sensitivity and Specificity
True Positive Rate (TPR / Sensitivity): measures the proportion of positive instances correctly classified by the model.
Sensitivity=TP+FNTP
True Negative Rate (TNR / Specificity): It is the proportion:
Specificity=TN+FPTN
False Positive Rate (FPR / 1 - Specificity):
FPR=1−TNR
ROC Curves
Receiver Operation Characteristic (ROC) is a graphical representation to evaluate Binary Classification.
The ROC curves plot the graph between True positive rate and false-positive rate
AUC-ROC
Looks at area under the curve, at different thresholds
AUC is considered to be scaled variant, it measures the rank of predictions rather than its absolute values
AUC always focuses on the quality of the Model’s skills on prediction irrespective of what threshold has been chosen.
A perfect model has an AUC-ROC of 1
A random model has an AUC-ROC of 0.5
Log Loss
Log Loss penalises false classifications
Log loss is a measure of how close a prediction probability comes to the true value in classification.
Best for multi-class classification.
Log loss takes the probabilities for all classes present in the sample.
LogarithmicLoss=N−1i=1∑Nj=1∑Myij∗log(pij)
yij, indicates whether sample i belongs to class j or not
pij, indicates the probability of sample i belonging to class j
Log Loss has a range of [0,∞) it thus has no upper bound limit, we can therefore say the loser it is to 0 the better the mode.
Regression
Mean Absolute Error
MAE=n1i=1∑n∣yi−y^i∣
Looks at average magnitude, in the same unit as the target.
Less sensitive to outliers compared to MSE.
Lower MAE = better performance, (MAE>0)
Mean Squared Error
MSE=n1i=1∑n(yi−y^i)2
MSE takes the average of the square of the difference between the original values and the predicted values.
Good for computing the gradient.
Good for when the target column is distributed around the mean.
Can amplify outliers.
Root Mean Squared Error
Quantifies the average magnitude of the errors or residuals.
Measures how well predicted values align with actual values.
Smaller RMSE indicate the model’s predications are closer to actual ones.
RMSE=MSE
Coefficient of Determination
Measures how well the model fits, how much the real values very from the regression line.
R2 ranges from [0,1), a value of 0 indicates no variance.
The higher the value the better fit.
R2=Total VariationExplained Variation
from sklearn.metrics import r2_scorer2 = r2_score(y, y_pred)
Root Mean Squared Log Error
Usually used when we don’t want to penalise huge differences in the predicted and the actual values.
These predicted and actual values are considered to be huge numbers.
RMSLE=n1i=1∑n(log(p1+1)−log(ai+1))2
Clustering
Adjusted Rand Score
RI=n(n−1)2(a+b)
Calculates a share of observations for which these splits i.r.initial and clustering result is consistent.
Where n be the number of observations in a sample.
a to be the number of observation pairs with the same labels and located in the same cluster.
b to be the number of observations with different labels and located in different clusters.
Silhouette
The Silhouette distance shows up to which extent the distance between the objects of the same class differs from the mean distance between the objects from different clusters.
s=max(a,b)b−a
The value lies between −1 to +1
If the value is closer to 1 then it the clustering results are good.
A value is closer to −1 represents bad clustering.
Natural Language Processing
BERTScore
BLEU
Bilingual Evaluation Understudy
Metric for machine translation tasks.
Cross-Entropy
Cross-entropy is a measure of the difference between two probability distributions for a given random variable or set of events.
H(P,Q)=−x∑P(x)logQ(x)
This computes the “surprise factor” of seeing a result.
P - true distribution of the data
Q - distribution predicted by model
The lower the entropy the better the model at matching true distributions.
Perplexity
Language Model Evaluation
Measures how well a model predicts a sample, it can capture the level of uncertainly.