I wonder if anyone can comment on the motivation for using this sequential form of Bayes over the original naive form. In this sequential form, the result at each step is used as the prior, whereas with the original form, the prior is the one prior used throughout. Researching this indicates that the sequential form can produce different answers depending on ordering or calculations (although I just tried a simple 2 feature example and got the same answer both ways). But Im troubled with the idea that (the sequential form) is formulated assuming that new information is piecewise provided because the way I see this modeled, all the features (ie, sensors) are known a priori. At calculation time, we know the available sensors and their values, so the only thing to do is just run through the calcs. Maybe the module should allow itself to be configured to do the calculation using one or the other approach? Or maybe the difference in results is just small?
Another thing I dont get in the way the code is expressed, the denominator has the product of all the sensors, so P(sensor1)*P(sensor2)… and assumes independence between sensors. The sensors are presumably independent (but see my previous comments on that) but they are not independent over the hypothesis, so I would think this should really be computed using the law of total probability, ie as P(H)*P(sensor1|H)P(sensor2|H)… + (1-P(H))*P(sensor1|notH)*P(sensor2|notH)…
While Im at it, one more thing that isn’t making sense in the computation (if I understand it right) is that if a sensor is supposed to be considered but isn’t in the TRUE state, then the prior just propagates through, as if the FALSE state doesn’t have any say in the calculation. That just doesn’t seem to make sense. Assume we have sensors S1 and S2, we want to run the computation and at that time, S1 is TRUE and S2 is FALSE, I would think we would want to compute P(H|S1, not S2) and not just P(H|S1) and ignore S2.
Another thought about this whole thing is that if all the conditional probabilities are known (such as from a config file), then the logic can just be reduced to a lookup table. That is, for each state for the a class, there are 2**NUMBER_OF_SENSORS considered for that state in a table, to get the probability, so no need to recompute it. Its not really a big optimization, but another thought about how to implement this. OTOH, if NUMBER_OF_SENSOR >> 1, then you start to get larger lookup tables.