Non-residential building occupancy modeling. Part III. Defining occupancy patterns.
Recalling from the last part II we have got building occupancy data-set. In this part I will apply k-means cluster algorithm in order to identify typical occupancy patterns with daily resolution. To better visualize input data we have, see the following table below:
| Month | Day | Weekday | Minutes | Temperature | Relative humidity | Air Velocity | Occupancy out of 24 |
|---|---|---|---|---|---|---|---|
| 7 | 31 | 3 | 779 | 23 | 89 | 1.3411 | 0 |
| 8 | 1 | 4 | 823 | 24 | 88 | 3.578 | 2 |
| ... | ... | ... | ... | ... | ... | ... | ... |
Thus, the first task is to estimate the appropriate k number representing number of centroids or clusters. If do not remember how k-means algorithm works go to this site. To estimate k I has drown plot of k and cost function (also called cost function):
import pylab as pl import pandas as pd from sklearn.cluster import KMeans df = pd.read_csv('NewClusterInput.csv',header=None) Nc = range(1, 20) kmeans1 = [KMeans(n_clusters=i) for i in Nc] score = [kmeans1[i].fit(df).score(df) for i in range(len(kmeans1))] pl.plot(Nc,score) pl.xlabel('Number of Clusters') pl.ylabel('Score') pl.title('Elbow Curve') pl.savefig('Elbow Curve.png') pl.show()We will get the following elbow curve: You might notice that after k>3 module of cost function decreases sharply. I decided to take k=4 and run k-means clusterization for 4 centroids. After running k-means the following different occupancy profiles I got:


Comments
Post a Comment