Q1. Do you think the model will be able to classify the activities based on the data?
When analyzing both static and dynamic activities through static and dynamic plotting of the data, it becomes evident that the data points for dynamic activities are more spread out compared to static activities. This increased spread signifies higher variability in the dynamic activities. The complexity of dynamic activities introduces more diverse patterns and movements, leading to a wider range of sensor data. As a result, the model has a better ability to classify dynamic activities accurately. The increased variance in the dynamic activities allows the model to capture and learn from a broader spectrum of patterns, making it more adept at distinguishing between different dynamic activities. In contrast, static activities exhibit less variability, which may pose challenges for the model in accurately discerning subtle differences.
Subplots of static activities with mean and standard deviation
import osimport pandas as pdimport matplotlib.pyplot as pltcombined_dir ="./Combined"dataset_dir = os.path.join(combined_dir, "Train")offset =0time =10figure, axis = plt.subplots(3, 6, figsize=(15, 12))folder_count =0for folder in ["LAYING", "SITTING", "STANDING"]: files = os.listdir(os.path.join(dataset_dir, folder)) X_train = [] y_train = [] count =0forfilein files:if count !=5: count +=1else:break df = pd.read_csv(os.path.join(dataset_dir, folder, file), sep=",", header=0) df = df[offset:offset + time *50] df["linacc"] = df["accx"]**2+ df["accy"]**2+ df["accz"]**2 axis[folder_count, count-1].plot(df["linacc"].values) axis[folder_count, count-1].set_title(folder) axis[folder_count, count-1].set_ylim(0, 1.5)# Calculate mean and variance for the folder mean_linacc = df["linacc"].mean() variance_linacc = df["linacc"].var()# Plot mean and variance subplot axis[folder_count, 5].bar(["Mean", "Variance"], [mean_linacc, variance_linacc], color=['green', 'red']) axis[folder_count, 5].set_title(f"{folder} - Mean and Variance") folder_count +=1plt.subplots_adjust(left=0, bottom=0, right=1.8, top=2, wspace=0.25, hspace=0.6)plt.savefig("./Figures/Q2.StaticDataAcceleration_visualisation.pdf", bbox_inches='tight')plt.show()
Subplots of dynamic activities with mean and standard deviation
combined_dir ="./Combined"dataset_dir = os.path.join(combined_dir, "Train")offset =0time =10figure, axis = plt.subplots(3, 6, figsize=(15, 12))folder_count =0for folder in ["WALKING", "WALKING_DOWNSTAIRS", "WALKING_UPSTAIRS"]: files = os.listdir(os.path.join(dataset_dir, folder)) X_train = [] y_train = [] count =0forfilein files:if count !=5: count +=1else:break df = pd.read_csv(os.path.join(dataset_dir, folder, file), sep=",", header=0) df = df[offset:offset + time *50] df["linacc"] = df["accx"]**2+ df["accy"]**2+ df["accz"]**2 axis[folder_count, count-1].plot(df["linacc"].values) axis[folder_count, count-1].set_title(folder) axis[folder_count, count-1].set_ylim(0, 6)# Calculate mean and variance for the folder mean_linacc = df["linacc"].mean() variance_linacc = df["linacc"].var()# Plot mean and variance subplot axis[folder_count, 5].bar(["Mean", "Variance"], [mean_linacc, variance_linacc], color=['green', 'red']) axis[folder_count, 5].set_title(f"{folder} - Mean and Variance") folder_count +=1plt.subplots_adjust(left=0, bottom=0, right=1.8, top=2, wspace=0.25, hspace=0.6)plt.savefig("./Figures/Q2.DynamicDataAcceleration_visualisation.pdf", bbox_inches='tight')plt.show()
Q2. Do you think we need a machine learning model to differentiate between static and dynamic activities based on the data?
By visualisation of data we can see that there is a clear difference between static and dynamic activities. The static activities have a very small range of values for each sensor, whereas the dynamic activities have a much larger range of values. This is because the dynamic activities involve more movement and therefore the sensors are recording a wider range of values and genarate more varience in data. This means that we can differentiate between static and dynamic activities without the need for a machine learning model.
Q3. Training Decision Tree using Training Set
from MakeDataset import*from sklearn.tree import DecisionTreeClassifier
Q4.Training Decision Tree for varying depths (2 to 8) using Training Set
label_names = ["LAY", "SIT", "STAND", "WALK", "W_D", "W_U"]accuracy_values = []recall_scores = []for i inrange(2, 9): dt = DecisionTreeClassifier(max_depth=i) dt.fit(xtrain, ytrain)# Plot individual decision tree plt.figure(figsize=(40, 40)) plot_tree(dt, filled=True, fontsize=10) plt.title(f'Decision Tree with max_depth={i}') plt.savefig(f"./Figures/Q4.DecisionTree-{i}.pdf", bbox_inches='tight') plt.show()# Evaluate the model yPred = dt.predict(X_test.reshape(36, 1500)) acc=accuracy_score(y_test, yPred)print("Accuracy Score:", acc) accuracy_values.append(accuracy_score(y_test, yPred)) label_names1 = [1, 2, 3, 4, 5, 6] recall_per_class = recall_score(y_test, yPred, labels=label_names1, average=None)# Print recall for each classprint(f'Recall Scores for Decision Tree (max_depth={i}):')for j, recall inenumerate(recall_per_class):print(f'Recall for class {label_names[j]}: {recall:.4f}') recall_scores.append(recall_score(y_test, yPred, average='weighted'))# Plot confusion matrix cm = confusion_matrix(y_test, yPred, labels=[1, 2, 3, 4, 5, 6]) plt.figure(figsize=(6, 6)) ax = plt.subplot() sns.heatmap(cm, annot=True, fmt='g', ax=ax) ax.set_xlabel('Predicted labels') ax.set_ylabel('True labels') ax.set_title(f'Confusion Matrix for Decision Tree (max_depth={i})') ax.tick_params(axis='x', labelsize=10) ax.tick_params(axis='y', labelsize=10) ax.xaxis.set_ticklabels(label_names) ax.yaxis.set_ticklabels(label_names) plt.savefig(f"./Figures/Q4.ConfusionMatrix-{i}.pdf", bbox_inches='tight') plt.show()
Accuracy Score: 0.4444444444444444
Recall Scores for Decision Tree (max_depth=2):
Recall for class LAY: 0.6667
Recall for class SIT: 0.0000
Recall for class STAND: 0.0000
Recall for class WALK: 1.0000
Recall for class W_D: 0.0000
Recall for class W_U: 1.0000
Accuracy Score: 0.6111111111111112
Recall Scores for Decision Tree (max_depth=3):
Recall for class LAY: 0.0000
Recall for class SIT: 0.0000
Recall for class STAND: 0.8333
Recall for class WALK: 1.0000
Recall for class W_D: 0.8333
Recall for class W_U: 1.0000
Accuracy Score: 0.6666666666666666
Recall Scores for Decision Tree (max_depth=4):
Recall for class LAY: 0.0000
Recall for class SIT: 0.5000
Recall for class STAND: 0.6667
Recall for class WALK: 1.0000
Recall for class W_D: 0.8333
Recall for class W_U: 1.0000
Accuracy Score: 0.5833333333333334
Recall Scores for Decision Tree (max_depth=5):
Recall for class LAY: 0.3333
Recall for class SIT: 0.5000
Recall for class STAND: 0.0000
Recall for class WALK: 1.0000
Recall for class W_D: 0.6667
Recall for class W_U: 1.0000
Accuracy Score: 0.5833333333333334
Recall Scores for Decision Tree (max_depth=6):
Recall for class LAY: 0.0000
Recall for class SIT: 0.3333
Recall for class STAND: 0.3333
Recall for class WALK: 1.0000
Recall for class W_D: 0.8333
Recall for class W_U: 1.0000
Accuracy Score: 0.6666666666666666
Recall Scores for Decision Tree (max_depth=7):
Recall for class LAY: 0.5000
Recall for class SIT: 0.5000
Recall for class STAND: 0.5000
Recall for class WALK: 1.0000
Recall for class W_D: 0.5000
Recall for class W_U: 1.0000
Accuracy Score: 0.6944444444444444
Recall Scores for Decision Tree (max_depth=8):
Recall for class LAY: 0.3333
Recall for class SIT: 0.5000
Recall for class STAND: 0.5000
Recall for class WALK: 1.0000
Recall for class W_D: 0.8333
Recall for class W_U: 1.0000
Does the accuracy changes when the depth is increased? Plot the accuracies and reason why such a result has been obtained.
The accuracy increases as the depth increases. This is because increasing the depth of a decision tree, it becomes more complex and can capture more intricate patterns in the training data. A deeper tree can potentially fit the training data more closely, achieving higher accuracy on the training set. However, deeper trees are more prone to overfitting, which can lead to poor generalization performance on the test set. This is evident in the plot above, Initially, as you increase the depth, the model might become better at fitting the training data, resulting in improved accuracy on the training set. However, beyond a certain depth, the model starts memorizing the training data and lead to a decrease in accuracy on new, unseen data (test set).
plt.plot(range(2, 9), accuracy_values)plt.xlabel('Max Depth')plt.ylabel('Accuracy')plt.title('Accuracy vs Max Depth')plt.savefig("./Figures/Q4.Accuracy-Vs-MaxDepth.pdf", bbox_inches='tight')plt.show()
Implement Principal Component Analysis (PCA) on the accelation data and plots
Use TSFEL (a featurizer library) to create features (your choice which ones you feel are useful) and then perform PCA to obtain two features. Plot a scatter plot to visualize different class of activities. Are you able to see any difference?
It is Normalising the extracted features before applying PCA to ensure that all features contribute equally. We can observe clear separation between classes, it indicates that the features used are effective in capturing distinctive patterns for each activity.
Q.6 Use the features obtained from TSFEL and train a Decision Tree. Report the accuracy and confusion matrix using test set.
time =10offset =100folders = ["LAYING","SITTING","STANDING","WALKING","WALKING_DOWNSTAIRS","WALKING_UPSTAIRS"]classes = {"WALKING":1,"WALKING_UPSTAIRS":2,"WALKING_DOWNSTAIRS":3,"SITTING":4,"STANDING":5,"LAYING":6}combined_dir = os.path.join("Combined")
X = np.concatenate((X_train_tsfel,X_test_tsfel))y = np.concatenate((y_train_tsfel,y_test_tsfel))X_train_tsfel,X_test_tsfel,y_train_tsfel,y_test_tsfel = train_test_split(X,y,test_size=0.4,random_state=4,stratify=y)print("Training data shape: ",X_train_tsfel.shape)print("Testing data shape: ",X_test_tsfel.shape)print("Training data shape: ",y_train_tsfel.shape)print("Testing data shape: ",y_test_tsfel.shape)
Training data shape: (108, 384)
Testing data shape: (72, 384)
Training data shape: (108,)
Testing data shape: (72,)
Train Decision Tree with varrying depths (2-8) and compare the accuracies obtained in Q4 with the accuracies obtained using featured trainset. Plot the accuracies obtained in Q4 against the accuracies obtained in this question.
label_names = ["LAY", "SIT", "STAND", "WALK", "W_D", "W_U"]accuracy_values_tsfel = []recall_scores = []for i inrange(2, 9): dt = DecisionTreeClassifier(max_depth=i) dt.fit(X_train_tsfel, y_train_tsfel)# Plot individual decision tree plt.figure(figsize=(45, 45)) plot_tree(dt, filled=True, fontsize=10) plt.title(f'Decision Tree with max_depth={i}') plt.savefig(f"./Figures/Q6.DecisionTree-Tsfel-{i}.pdf", bbox_inches='tight') plt.show()# Evaluate the model yPred = dt.predict(X_test_tsfel) acc=accuracy_score(y_test_tsfel, yPred)print("Accuracy Score:", acc) accuracy_values_tsfel.append(accuracy_score(y_test_tsfel, yPred)) label_names1 = [1, 2, 3, 4, 5, 6] recall_per_class = recall_score(y_test_tsfel,yPred, labels=label_names1, average=None)# Print recall for each classprint(f'Recall Scores for Decision Tree (max_depth={i}):')for j, recall inenumerate(recall_per_class):print(f'Recall for class {label_names[j]}: {recall:.4f}') recall_scores.append(recall_score(y_test_tsfel, yPred, average='weighted'))# Plot confusion matrix cm = confusion_matrix(y_test_tsfel, yPred, labels=[1, 2, 3, 4, 5, 6]) plt.figure(figsize=(6, 6)) ax = plt.subplot() sns.heatmap(cm, annot=True, fmt='g', ax=ax) ax.set_xlabel('Predicted labels') ax.set_ylabel('True labels') ax.set_title(f'Confusion Matrix for Decision Tree (max_depth={i})') ax.tick_params(axis='x', labelsize=10) ax.tick_params(axis='y', labelsize=10) ax.xaxis.set_ticklabels(label_names) ax.yaxis.set_ticklabels(label_names) plt.savefig(f"./Figures/Q6.ConfusionMatrix-Tsfel-{i}.pdf", bbox_inches='tight') plt.show()
Accuracy Score: 0.5277777777777778
Recall Scores for Decision Tree (max_depth=2):
Recall for class LAY: 0.7500
Recall for class SIT: 0.8333
Recall for class STAND: 0.0000
Recall for class WALK: 0.0000
Recall for class W_D: 0.7500
Recall for class W_U: 0.8333
Accuracy Score: 0.6944444444444444
Recall Scores for Decision Tree (max_depth=3):
Recall for class LAY: 0.7500
Recall for class SIT: 0.8333
Recall for class STAND: 0.7500
Recall for class WALK: 0.2500
Recall for class W_D: 0.9167
Recall for class W_U: 0.6667
Accuracy Score: 0.6666666666666666
Recall Scores for Decision Tree (max_depth=4):
Recall for class LAY: 0.7500
Recall for class SIT: 0.6667
Recall for class STAND: 0.7500
Recall for class WALK: 0.5833
Recall for class W_D: 0.5833
Recall for class W_U: 0.6667
Accuracy Score: 0.625
Recall Scores for Decision Tree (max_depth=5):
Recall for class LAY: 0.7500
Recall for class SIT: 0.7500
Recall for class STAND: 0.8333
Recall for class WALK: 0.3333
Recall for class W_D: 0.7500
Recall for class W_U: 0.3333
Accuracy Score: 0.7361111111111112
Recall Scores for Decision Tree (max_depth=6):
Recall for class LAY: 0.7500
Recall for class SIT: 0.7500
Recall for class STAND: 0.7500
Recall for class WALK: 0.5000
Recall for class W_D: 1.0000
Recall for class W_U: 0.6667
Accuracy Score: 0.625
Recall Scores for Decision Tree (max_depth=7):
Recall for class LAY: 0.7500
Recall for class SIT: 0.7500
Recall for class STAND: 0.7500
Recall for class WALK: 0.5000
Recall for class W_D: 0.5833
Recall for class W_U: 0.4167
Accuracy Score: 0.7083333333333334
Recall Scores for Decision Tree (max_depth=8):
Recall for class LAY: 0.7500
Recall for class SIT: 0.7500
Recall for class STAND: 0.7500
Recall for class WALK: 0.7500
Recall for class W_D: 0.7500
Recall for class W_U: 0.5000
Are there any participants/ activitivies where the Model performace is bad? If Yes, Why?
If we check the confusion matrix for the model trained on the featured data, we can see that the model is not able to classify the activities of laying and sitting. This is because the data for these two activities is very similar and the model is not able to differentiate between them. This is also evident in the scatter plot of the featured data, where the data points for these two activities are very close together and it is difficult to differentiate between them. and recall for these two activities is very low.
Deployment
We collected the data of four persons for all the 6 activities. We use the data of one person as test data. check “Mini-Project/Deployment/Test”
from sklearn.tree import plot_treefrom MakeDataset_deployment import*from matplotlib import pyplot as pltfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.metrics import accuracy_score, confusion_matrix
(180, 500, 3)
(180,)
(30, 500, 3)
(30,)
Training data shape: (180, 500, 3)
Testing data shape: (30, 500, 3)
yPred_dep = dt_raw_dep.predict(x_test_dep.reshape(30,1500))print("Accuracy Score:", accuracy_score(y_test_dep, yPred_dep))cm = confusion_matrix(y_test_dep, yPred_dep, labels=[1, 2, 3, 4, 5, 6])# Plot Confusion Matrixplt.figure(figsize=(6, 6))ax = plt.subplot() # Make sure 'ax' is defined heresns.heatmap(cm, annot=True, fmt='g', ax=ax)ax.set_xlabel('Predicted labels')ax.set_ylabel('True labels')ax.set_title('Confusion Matrix for Decision Tree')plt.savefig("./Figures/Q7.ConfusionMatrix-Deployment-rawdata.pdf", bbox_inches='tight')plt.show()
Accuracy Score: 0.16666666666666666
Train decision tree model with train data with TSFEL features
from MakeDataset_deployment_tsfel import*dt_tsfel_dep = DecisionTreeClassifier(max_depth=6)dt_tsfel_dep.fit(X_train_tsfel, y_train_tsfel)plt.figure(figsize=(50,50))plot_tree(dt_tsfel_dep, filled=True, fontsize=14)plt.savefig("./Figures/Q7.DecisionTree-Deployment-Tsfel.pdf", bbox_inches='tight')plt.show()
(1, 384)
(30, 384)
(30,)
[6 6 6 6 6 4 4 4 4 4 5 5 5 5 5 1 1 1 1 1 3 3 3 3 3 2 2 2 2 2]
Training data shape: (126, 384)
Testing data shape: (84, 384)
Training data shape: (126,)
Testing data shape: (84,)
Predict on test data with TSFEL features
yPred_tsfel = dt_tsfel_dep.predict(X_test_tsfel)print("Accuracy Score:", accuracy_score(y_test_tsfel, yPred_tsfel))accuracy_values_tsfel.append(accuracy_score(y_test_tsfel, yPred_tsfel))cm = confusion_matrix(y_test_tsfel, yPred_tsfel, labels=[1, 2, 3, 4, 5, 6])# Plot Confusion Matrixplt.figure(figsize=(6, 6))ax = plt.subplot() # Make sure 'ax' is defined heresns.heatmap(cm, annot=True, fmt='g', ax=ax)ax.set_xlabel('Predicted labels')ax.set_ylabel('True labels')ax.set_title('Confusion Matrix for Decision Tree on TSFEL')plt.savefig("./Figures/Q7.ConfusionMatrix-Deployment-Tsfel.pdf", bbox_inches='tight')plt.show()
Accuracy Score: 0.6904761904761905
Explain why the model succeeded or failed?
The raw data collected at a frequency of 200hz was preprocessed and downsampled to 50hz and given as test case to the trained decision tree model. It was observed that the Raw data was unable to be classified accurately due to variation in the data used to train and test the model even after the preprocessing. The Tsfel feature extracted data on the other hand was able to classify the test data far better than the raw data even after the variations in the trained and test preprocessed data.
Thus, Tsfel is able to find the features that persisted even after the variation in the train and preprocessed test data collected from different devices.