Pattern Recognition Project EEL 6825
Decision maker for a stock market buyer/seller by Francis & Harish Guide: Dr. John.G Harris The Project Report
Introduction
Earlier Works
Implementation
The Bayes Classifier:
The data was assumed to be normally distributed.
We did the bayes for two(buy, sell) and three(buy, sell, retain) Classes(C), and for different features(N) and dimensions(D=2).
Plots:
C=2, N=200.
C=2, N=450.
C=3, N=200.
C=3, N=450.
The error rates and the time taken for the computation for all the classifiers are given in the Results Table in the Appendix.
The Linear Classifier:
The Fischer Classifier was again done with two and three classes. The 'W' vector and wo was found and this gave one of the clear results. We also did fischer for three classes and found the W vector. This had d X c-1 dimensions, and it also gave us which feature was the most important.
Plots:
C=2, N=200. (wo = 1.0852e+003)
Error Vs wo.
C=2, N=450. (wo = 1.5084e+003)
Error Vs wo.
KL Dimensionality Reduction:
The KL dimensionality reduction technique was employed as tool to reduce the number of dimensions from 15 to 2 or 10, and, the effect on the classifers by reducing the dimensions were studied. It was found that, if we retain 10 dimensions, the error was less, but, if we just retain only 2 dimensions, the error started to increase.
Plots:
KL dimensional Reduction Performance
Parzen Windows
The Parzen Window was done for three classes, by varying the volume and the performance was charted for varying Volume for different dimensions. It can seen from the graph that as the volume increases the error increases.
Plots:
Performance of Parzen Windows with Volume
k-NN Classifier
k-NN was exhausted with K=1,2,3,4,5,6 and 10 and their performance was charted. This was done for three classes. We observed some unique features like the error did not decrease so much with increase in K, as there were cases of ties in all the cases as it was for three classes. The ties were rejected and the performace is charted below. The cases of k= 5,6 and 10 are shown in the Result Table in the Appendix. The k-NN showed excellent results.
Plots:
Performance of k for varying dimensions(d)
Performance of k for varying amount of data(N)
Neural Networks
The Neural networks took a lot of time to give a good answer, and that too, as we decrease the step size for more accuracy. But the results here too were good. Another interesting thing which we noted here was that, we expecting the tansig-tansig to give better results than tansig-purelin, but it was'nt to be so. The following plots gives us a clear picture.
Plots:
Performance based on # of hidden units
Performance based on the Step size
tansig-tansig Vs tansig-purelin
Classification: tansig-tansig with 20 hidden units and step size=300
Classification: tansig-tansig with 40 hidden units and step size=80
Classification: tansig-purelin with 40 hidden units and step size=80
MSE Vs Epochs
It is observed that the decision boundaries formed by using Neural network are some kind of curves. The boundaries are not straight lines and there are more than one boundary solution. This is because Neural network trains the data points by using gradient descent i.e. to find the minimum value among the neighbors. Neural network can train the data points to separate most of the non-separable points. Nevertheless, the training process always gets trapped in the local minima solution as it is observed in the Figure shown in the result. The longer the training time (higher epoch), the higher the possibility that the solution will trap in the local minima situation. Overfitting of the data points will only make the decision boundary worse instead. One of the best ways to avoid local minima is to use a smaller step size. Nonetheless, this method will lead to more training and classification time.
Support Vector MachinesSupport Vector Machines for classification tasks perform structural risk minimisation. They create a classifier with minimised VC Dimension. If the VC Dimension is low, the expected probability of error is low as well, which means good generalisation.
Support Vector Machines use a linear separating hyperplane to create a classifier, yet some problems cannot be linearly separated in the original input space. Support Vector Machines can non-linearly transform the original input space into a higher dimensional feature space. In this feature space it is trivial to find a linear optimal separating hyperplane. This hyperplane is optimal in the sense of being a maximal margin classifier with respect to the training data.
Open problems with this approach lie in two areas:
1. The theoretical problem of which non-linear transformation to use.REFERENCES:
2. The practical problem of creating an efficient implementation as the basic algorithm has memory requirements which are squared with respect to the number of training examples and computational complexity which is also related to the number of training examples.
Classification using Support Vector Machines
More theory on SVM
Inference & Conclusion
AppendixResults Table
Comparision of Computational Time for the Various Classifiers
Computation Time based on the Volume for different N in Parzen Windows
Computation Time based on the k for different N in k-NN Classifier