{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "

\n", "Classification of Breast Cancer Data \n", "
\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "Breast Cancer Analysis

\n", "\n", "https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer/." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Importing the Necessary Libraries

" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.metrics import confusion_matrix \n", "from sklearn.metrics import classification_report \n", "from sklearn.model_selection import train_test_split\n", "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.svm import SVC\n", "from sklearn.naive_bayes import GaussianNB\n", "from sklearn.ensemble import RandomForestClassifier\n", "import matplotlib.pyplot as plt\n", "from imblearn.over_sampling import SMOTE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Creating a Pandas DataFrame from a CSV file

\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
no-recurrence-events30-39premeno30-340-2no3leftleft_lowno.1
0no-recurrence-events40-49premeno20-240-2no2rightright_upno
1no-recurrence-events40-49premeno20-240-2no2leftleft_lowno
2no-recurrence-events60-69ge4015-190-2no2rightleft_upno
3no-recurrence-events40-49premeno0-40-2no2rightright_lowno
4no-recurrence-events60-69ge4015-190-2no2leftleft_lowno
\n", "
" ], "text/plain": [ " no-recurrence-events 30-39 premeno 30-34 0-2 no 3 left left_low \\\n", "0 no-recurrence-events 40-49 premeno 20-24 0-2 no 2 right right_up \n", "1 no-recurrence-events 40-49 premeno 20-24 0-2 no 2 left left_low \n", "2 no-recurrence-events 60-69 ge40 15-19 0-2 no 2 right left_up \n", "3 no-recurrence-events 40-49 premeno 0-4 0-2 no 2 right right_low \n", "4 no-recurrence-events 60-69 ge40 15-19 0-2 no 2 left left_low \n", "\n", " no.1 \n", "0 no \n", "1 no \n", "2 no \n", "3 no \n", "4 no " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pd.read_csv('./breast-cancer.csv')\n", "data.head()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Breast Cancer Data Description

\n", "
\n", "This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.

\n", "Let's now add column labels to all columns in the data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Classagemenopausetumor-sizeinv-nodesnode-capsdeg-maligbreastbreast-quadirradiat
0no-recurrence-events40-49premeno20-240-2no2rightright_upno
1no-recurrence-events40-49premeno20-240-2no2leftleft_lowno
2no-recurrence-events60-69ge4015-190-2no2rightleft_upno
3no-recurrence-events40-49premeno0-40-2no2rightright_lowno
4no-recurrence-events60-69ge4015-190-2no2leftleft_lowno
\n", "
" ], "text/plain": [ " Class age menopause tumor-size inv-nodes node-caps \\\n", "0 no-recurrence-events 40-49 premeno 20-24 0-2 no \n", "1 no-recurrence-events 40-49 premeno 20-24 0-2 no \n", "2 no-recurrence-events 60-69 ge40 15-19 0-2 no \n", "3 no-recurrence-events 40-49 premeno 0-4 0-2 no \n", "4 no-recurrence-events 60-69 ge40 15-19 0-2 no \n", "\n", " deg-malig breast breast-quad irradiat \n", "0 2 right right_up no \n", "1 2 left left_low no \n", "2 2 right left_up no \n", "3 2 right right_low no \n", "4 2 left left_low no " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_index = [ 'Class', 'age','menopause','tumor-size','inv-nodes','node-caps','deg-malig','breast','breast-quad','irradiat']\n", "data.columns = data_index\n", "data.head()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Data Variables

\n", "
Each row in data.csv contains an individual case of a woman with breastcancer. There are 285 cases in this data set. The data set is available from UCI repository (http://archive.ics.uci.edu/ml/datasets/Breast+Cancer)
\n", "\n", "
\n", "Each row, or sample consists of the following attributes:\n", "* **1. Age:** age (in years at last birthday) of the patient at the time of diagnosis;\n", "* **2. Menopause:** whether the patient is pre- or postmenopausal at time of diagnosis; \n", "* **3. Tumor size:** the greatest diameter (in mm) of the excised tumor; \n", "* **4. Inv-nodes:** the number (range 0 - 39) of axillary lymph nodes that contain metastatic breast cancer visible on histological examination;\n", "* **5. Node caps:** if the cancer does metastasise to a lymph node, although outside the original site of the tumor it may remain “contained” by the capsule of the lymph node. However, over time, and with more aggressive disease, the tumor may replace the lymph node and then penetrate the capsule, allowing it to invade the surrounding tissues; (yes = 1, no = 0)\n", "* **6. Degree of malignancy:** the histological grade (range 1-3) of the tumor. Tumors that are grade 1 predominantly consist of cells that, while neoplastic, retain many of their usual characteristics. Grade 3 tumors predominately consist of cells that are highly abnormal; \n", "* **7. Breast:** breast cancer may obviously occur in either breast(left = 1, right = 2)\n", "* **8. Breast quadrant:** the breast may be divided into four quadrants, using the nipple as a central point;(left_up = 1, left_low = 2, right_up = 3, right_low = 4, central = 5)\n", "* **9. Irradiation:** radiation therapy is a treatment that uses high-energy x-rays to destroy cancer cells.(yes = 1, no = 0) \n", "\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "##### data\n", "#data[data['breast-quad'].str.contains('left') & data['breast'].str.contains('right')]\n", "#data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find any null variables if they exist" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Classagemenopausetumor-sizeinv-nodesnode-capsdeg-maligbreastbreast-quadirradiat
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Class, age, menopause, tumor-size, inv-nodes, node-caps, deg-malig, breast, breast-quad, irradiat]\n", "Index: []" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[data.isnull().any(axis = 1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Cleaning and Preparing the Data

\n", "
\n", "Binarize node-caps & irradiat columns" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Classagemenopausetumor-sizeinv-nodesnode-capsdeg-maligbreastbreast-quadirradiat
0044.512210223.00
1044.512210212.00
2064.521710221.00
3044.51210224.00
4064.521710212.00
\n", "
" ], "text/plain": [ " Class age menopause tumor-size inv-nodes node-caps deg-malig \\\n", "0 0 44.5 1 22 1 0 2 \n", "1 0 44.5 1 22 1 0 2 \n", "2 0 64.5 2 17 1 0 2 \n", "3 0 44.5 1 2 1 0 2 \n", "4 0 64.5 2 17 1 0 2 \n", "\n", " breast breast-quad irradiat \n", "0 2 3.0 0 \n", "1 1 2.0 0 \n", "2 2 1.0 0 \n", "3 2 4.0 0 \n", "4 1 2.0 0 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CleanData = data.copy()\n", "\n", "#Binarize node-caps & irradiat& Class\n", "CleanData['node-caps']= (CleanData['node-caps']=='yes').astype(int)\n", "CleanData['irradiat']= (CleanData['irradiat']=='yes').astype(int)\n", "CleanData['Class']= (CleanData['Class']=='recurrence-events').astype(int)\n", "\n", "CleanData.head()\n", "\n", "#Convert *Breast Quadrant* string descriptive Information into number.\n", "#Create a Dictionary of the mapping & Replace Values\n", "quad = {'left_up':1, 'left_low': 2, 'right_up':3, 'right_low':4, 'central':5} \n", "CleanData = CleanData.replace({'breast-quad': quad})\n", "CleanData['breast-quad'] = CleanData['breast-quad'].apply(pd.to_numeric, downcast='float', errors='coerce')\n", "CleanData[CleanData.isnull().any(axis = 1)]\n", "CleanData = CleanData.dropna()\n", "CleanData.head()\n", "\n", "\n", "#Convert *Breast* string descriptive Information into number (Left = 1, Right = 2)\n", "#Create a Dictionary of the mapping & Replace Values\n", "breast = {'left':1, 'right':2} \n", "CleanData = CleanData.replace({'breast': breast})\n", "CleanData.head()\n", "\n", "#Convert *menopause* string descriptive Information into number.\n", "#Create a Dictionary of the mapping & Replace Values\n", "menopause = {'premeno':1, 'ge40': 2, 'lt40':3} \n", "CleanData = CleanData.replace({'menopause': menopause})\n", "CleanData.head()\n", "\n", "\n", "#Convert 'inv-nodes' to the median of its average range.\n", "nodes = {'0-2':1, '3-5':4,'6-8':7,'9-11':10, '12-14':13,'15-17':16,'18-20':19,'21-23':22,'24-26':25,'27-29':28,'30-32':31,'33-35':34,\n", " '36-38':37,'39':39}\n", "CleanData = CleanData.replace({'inv-nodes': nodes})\n", "(CleanData['inv-nodes'].describe)\n", "\n", "\n", "#Convert age to the numerical average of its average range.\n", "age = {'20-29':24.5, '30-39':34.5,'40-49':44.5,'50-59':54.5, '60-69':64.5,'70-79':74.5,'80-89':84.5,'90-99':94.5}\n", "CleanData = CleanData.replace({'age': age})\n", "CleanData.head()\n", "\n", ",\n", "#Convert tumor-size to the numerical average of its average range.\n", "Tumor = {'0-4':2, '5-9':7,'10-14':12,'15-19':17, '20-24':22,'25-29':27,'30-34':32,'35-39':37,'40-44':42,'45-49':47,'50-54':52}\n", "CleanData = CleanData.replace({'tumor-size': Tumor})\n", "CleanData.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Data Visualisation

\n", "
\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Class\n", "0 [[AxesSubplot(0.125,0.670278;0.215278x0.209722...\n", "1 [[AxesSubplot(0.125,0.670278;0.215278x0.209722...\n", "dtype: object" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "CleanData.groupby('Class').hist(figsize=(10, 10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "********************** Commence Classification Task****************************

\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We aim to classify patient data by the Class of Recurrence Event vs Non-Recurrence Events. This target variable is stored in 'y'.\n", "\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "Output=CleanData['Class']\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int32')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Output.dtype\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Use Cancer Characteristics & other gynechological details to predict

the possibility of Breast Cancer Recurrence in Women. \n", "

\n" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [], "source": [ "features_list = ['age','menopause','tumor-size','inv-nodes','node-caps','deg-malig']" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [], "source": [ "features = CleanData[features_list]\n" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agemenopausetumor-sizeinv-nodesnode-capsdeg-malig
044.5122102
144.5122102
264.5217102
344.512102
464.5217102
\n", "
" ], "text/plain": [ " age menopause tumor-size inv-nodes node-caps deg-malig\n", "0 44.5 1 22 1 0 2\n", "1 44.5 1 22 1 0 2\n", "2 64.5 2 17 1 0 2\n", "3 44.5 1 2 1 0 2\n", "4 64.5 2 17 1 0 2" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "features.head()" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0\n", "1 0\n", "2 0\n", "3 0\n", "4 0\n", "Name: Class, dtype: int32" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Output.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Perform Test and Train split\n", "\n", "

\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## REMINDER: Training Phase\n", "\n", "In the **training phase**, the learning algorithm uses the training data to adjust the model’s parameters to minimize errors. At the end of the training phase, you get the trained model.\n", "\n", "\n", "
\n", "In the **testing phase**, the trained model is applied to test data. Test data is separate from the training data, and is previously unseen by the model. The model is then evaluated on how it performs on the test data. The goal in building a classifier model is to have the model perform well on training as well as test data.\n" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [], "source": [ "features_train, features_test, Output_train, Output_test = train_test_split(features, Output, test_size = 0.33, random_state = 324)\n" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number instances in features_train dataset: (198, 6)\n", "Number instances in Output_train dataset: (198,)\n", "Number instances in features_test dataset: (86, 6)\n", "Number instances in Output_test dataset: (86,)\n" ] } ], "source": [ "print(\"Number instances in features_train dataset: \", features_train.shape)\n", "print(\"Number instances in Output_train dataset: \", Output_train.shape)\n", "print(\"Number instances in features_test dataset: \", features_test.shape)\n", "print(\"Number instances in Output_test dataset: \", Output_test.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "SMOTE Technique to address Data Imbalance\n", "\n", "

" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before OverSampling, counts of Recurrent Class '1': 53\n", "Before OverSampling, counts of No-Recurrent Class '0': 137 \n", "\n" ] } ], "source": [ "print(\"Before OverSampling, counts of Recurrent Class '1': {}\".format(sum(Output_train==1)))\n", "print(\"Before OverSampling, counts of No-Recurrent Class '0': {} \\n\".format(sum(Output_train==0)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Resampling using SMOTE" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [], "source": [ "sm = SMOTE(random_state=2)\n", "features_train_res, Output_train_res = sm.fit_sample(features_train, Output_train)" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "After OverSampling, the shape of features_X: (286, 6)\n", "After OverSampling, the shape of Output_y: (286,) \n", "\n", "After OverSampling, counts of Recurrent Class '1': 143\n", "After OverSampling, counts of Non-Recurrent Class '0': 143\n" ] } ], "source": [ "print('After OverSampling, the shape of features_X: {}'.format(features_train_res.shape))\n", "print('After OverSampling, the shape of Output_y: {} \\n'.format(Output_train_res.shape))\n", "\n", "print(\"After OverSampling, counts of Recurrent Class '1': {}\".format(sum(Output_train_res==1)))\n", "print(\"After OverSampling, counts of Non-Recurrent Class '0': {}\".format(sum(Output_train_res==0)))" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "age float64\n", "menopause int64\n", "tumor-size int64\n", "inv-nodes int64\n", "node-caps int32\n", "deg-malig int64\n", "dtype: object" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check features of the training and testing sets.\n", "\n", "#type(features_train)\n", "features_train.dtypes\n", "#type(features_test)\n", "#type(Output_train)\n", "#Output_train.dtype\n", "#type(Output_test)\n", "#features_train.describe()\n", "#Output_train.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Fit on Train Set\n", "

\n" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "GaussianNB(priors=None)" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Trying Different Types of Classifiers\n", "#1.Decision Tree Classifier\n", "#recurrence_classifier = DecisionTreeClassifier(max_leaf_nodes=19, random_state=0)\n", "\n", "#2. Logistic Regression\n", "#recurrence_classifier = LogisticRegression(random_state = 0)\n", "\n", "#3. K-Nearest Neighbours\n", "#recurrence_classifier = KNeighborsClassifier(n_neighbors = 3, metric = 'minkowski', p = 2)\n", "\n", "#4.Support Vector Classification\n", "#recurrence_classifier = SVC(kernel = 'linear', random_state = 0)\n", "\n", "#4.Gaussian Naïve Bayes Algorithm\n", "recurrence_classifier = GaussianNB()\n", "\n", "#5.Random Forest Algorithm\n", "#recurrence_classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)\n", "\n", "#****************************FITTING CLASSIFIER ON TRAINING SET******************************\n", "recurrence_classifier.fit(features_train_res, Output_train_res)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sklearn.naive_bayes.GaussianNB" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(recurrence_classifier)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Predict on Test Set \n", "\n", "

\n" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [], "source": [ "ModelPredictions = recurrence_classifier.predict(features_test)" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 0, 0, 1, 0, 0, 1, 0, 0])" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ModelPredictions[:10]" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "53 0\n", "231 1\n", "200 1\n", "8 0\n", "62 0\n", "26 0\n", "11 0\n", "155 0\n", "36 0\n", "31 0\n", "Name: Class, dtype: int32" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Indicator_test['high_humidity_label'][:10]\n", "Output_test.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Measure Accuracy of the Classifier\n", "

\n" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0 0.74 0.81 0.77 57\n", " 1 0.54 0.45 0.49 29\n", "\n", "avg / total 0.67 0.69 0.68 86\n", "\n" ] } ], "source": [ "#Model Accuracy per different Algorithms:\n", "\n", "\n", "#1.#Decscion Tree Classifier: 64.27%\n", "#2.#Logistic Regression: 73.17%\n", "#3.#KNearest Neighbour: 67.54%%\n", "#4.#Support Vector Machine Model (SVC): 70%\n", "#5.#Gaussian Naive Bayes Algorithm: 73.17%\n", "#6. #Random Forest Classifier: 70.21%\n", "\n", "accuracy_score(y_true = Output_test, y_pred = ModelPredictions)\n", "confusion_matrix(y_true = Output_test, y_pred = ModelPredictions)\n", "print(classification_report(y_true = Output_test, y_pred = ModelPredictions))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }