1. A. P.Bhuvaneswari - Research Scholar, Dept. of Computer Science and Engineering, JNTUA University, Ananthapuramu, A.P, India.
2. Dr. R. Praveen Sam - Professor, Dept. of Computer Science and Engineering, G.Pulla Reddy Engineering College, Kurnool, A.P, India.
3. Dr. C. Shoba Bindu - Professor, Dept. of Computer Science and Engineering, JNTUA College of Engineering, Ananthapuramu, A.P, India
The field of machine learning need no explicit programming and tries to learn from the given data by identifying the patterns like how humans try to recognize. But at sometimes the humans may make mistakes but with machines the scope is less, and the basic requirement is only to come up with a quality data for training. As the data is generated from multiple sources the available data is in different formats, huge in volume and more unwanted is accumulated making it something as big. For quality results pre-processing must be done because accurate results come from the quality data. Unwanted features which are not required must be deleted to make it a quality one. Feature engineering on big data can result in quality data which when trained with machines will produce the accurate results. In this paper different dimensionality reduction algorithms are used to reduce the dimensions on different datasets and collected the quality results in the identification of diseases. Early identification of disease will help us in taking the necessary protective measures for increasing the life span. Disease datasets with various dependency variables are pre-processed and then features are reduced with different dimensionality reduction algorithms later identifying the similarity in the data points by applying the k_means clustering. The accuracy of the results are tested with different supervised machine learning algorithms for different diseases.
Machine learning, dimensionality reduction, feature engineering, Accuracy, Prediction, Logistic regression, Clustering.