Adisorn Owatsiriwong, D.Eng.
Introduction:
Naive Bayes (NB) classification is a supervised Machine Learning (ML) technique used to classify data. The data can be any category like gender, keywords, or integer, or it can be a continuous value. NB employs probability based on Bayes Theorem to classify which group the data would be in. NB is widely used in several applications, for example, spam mail filters, project bidding evaluation, risk analysis, medical assessment, etc.
Pros:
NB is quite a simple and robust ML algorithm. It works well with a small set of data. The computation is cheap and straightforward.
Cons:
To work with NB, it must be assured that each feature is mutually exclusive to independent. NB also doesn't consider the order of features in its analysis, for example, it will consider 'Dear' 'Friend' and 'Friend' 'Dear' as the same data point. Normally, this would not cause any problems for classification.
Mathematics and implementation:
The Bayes Theorem states that
The steps to perform Naive Bayes for category feature x are as follows
To test P(A|x), compute P(A) and P(x1,..,xn|A) where P(x1,..,xn|A) = P(x1|A).P(x2|A)...P(xn|A)
Do the same for P(B|x)
for given data x, if P(A|x) > P(B|x) the data x is classified as A, and vice versa
If the value of some xc is continuous, we can compute P(xc|A) from Gaussian Distribution, for example x = price and class = win. Then the probability of P(x=price|win) can be computed from:
The following MATLAB code can be used to perform Naive Bayes classification for 2 features (ML Toolbox is necessary)
% Naive Bayes Classification Template for 2 Features and 3 Classes (A, B, C)
% Step 1: Prepare Your Data
% Suppose you have the following training data:
% Features are stored in a matrix X, where each row is a sample and each column is a feature.
% Labels (classes) are stored in a vector y.
% Example Data (Features: X1, X2; Classes: A, B, C)
X = [
5.1, 3.5; % Sample 1
4.9, 3.0; % Sample 2
6.2, 3.4; % Sample 3
5.9, 3.0; % Sample 4
6.0, 2.2; % Sample 5
4.7, 3.2; % Sample 6
5.8, 2.7; % Sample 7
6.5, 3.0; % Sample 8
];
% Labels corresponding to the data
% 'A', 'B', 'C' represent the classes
y = categorical({'A', 'A', 'B', 'B', 'C', 'C', 'A', 'B'});
% Step 2: Train the Naive Bayes Model
% Fit a Naive Bayes model using the training data.
% 'DistributionNames' specifies the distribution type for each feature.
model = fitcnb(X, y, 'DistributionNames', 'normal');
% Step 3: Make Predictions
% You can now use the trained model to make predictions on new data.
newData = [5.5, 3.1; % New sample 1
6.3, 2.9]; % New sample 2
% Predict the class for the new data
predictedLabels = predict(model, newData);
% Display the predicted labels
disp('Predicted Class Labels:');
disp(predictedLabels);
% Step 4: Evaluate the Model (Optional)
% You can evaluate the model's performance on a test set or using cross-validation.
% Example: Confusion matrix on training data
predictedTrainLabels = predict(model, X);
confMat = confusionmat(y, predictedTrainLabels);
disp('Confusion Matrix:');
disp(array2table(confMat, 'VariableNames', {'Predicted_A', 'Predicted_B', 'Predicted_C'}, ...
'RowNames', {'Actual_A', 'Actual_B', 'Actual_C'}));
% You can also calculate accuracy, precision, recall, etc.
accuracy = sum(diag(confMat)) / sum(confMat(:));
disp(['Training Accuracy: ', num2str(accuracy * 100), '%']);
Reference: Mathworks, MATLAB: Machine Learning Toolbox, 2024