Naive Bayes Model is a widely used method to get data classified. It is useful because it allows to do multivariate analysis and obtain an estimated probability. In fact, the algorithm is based on the Bayes theorem of conditional probability that considers that the probability that an event occurs given the fact that another event occurred. It also assumes that predictors occur independently of any other which is a very ideal assumption that may not well contribute to fit the best model.
One should understand how the data really would contribute to fit the model and do a more strict judgment of the different alternative and events that would occur. For example: one could use data that has different features for a chemical element such as helium such as red, odorless,liquid and gas. We know in advance that we will be working with a gas based on the context of our problem so that applying naive bayes machine learning model may not be the best one. This statement is supported by the example in which one tries find the probability that helium is odorless given that it is liquid (P(Odorless/Liquid). As mentioned before Helium is managed in a gas phase so that using the model may be very helpful for this case. Another consideration that should be taken account on is that fact that there chemical properties also influence in the fluid phase so that it may not make sense trying to classify a chemical as helium based on the probability that it is in a liquid phased given that red color occurred.
Bayes theorem has been used in some fields such as the medical field or environmental one. This model is also used more specifically for cases such as spam classification and word classification.
The naive bayes formula is represented in this way:
P(x|c)=P(c|x)*P(x)/P(c)
P(x|c) = Probability that an event x occur given that a class occurred
P(x) = Probability that an event x occurred
P(c) = Probability of class c occurred
You can understanding better by integrating some events. Thus, let us suppose that we want to classify a right manufactured pressure gauge based on its features observed such as length, height, width.
P(right pressure gauge| (length,height,width)=P(length/right pressure gauge) x P P(height/right pressure gauge) x P(width/right pressure gauge) xP(right pressure gauge) / P(length) x P(height) x P(width)
There are several advantages that we can find in this method:
- Multiple variables can be integrated into the calculations to classify data
- It is easy to integrate with known features in a dataset
- A good understanding of the possible events in a specific classification or probabilistic case help to train and fit the model.
On the other hand, there are several disadvantages that may lead us to avoid using this model.
- Predictors are considered independent among themselves which is something the may not be true and this assumption will lead to build a model that does not fit the data properly
- Calculating naive theorem requires more computational effort as more features are in
- Data is assumed to be normally distributed which is a strong assume that we should be verify and understand from our data and specific case.
- Zero frequency occurs when a feature or variable in your data does not appear, the probability will be considered as zero. There is an option for you to specify a very low probability or value for the corresponding feature to cope with issue.