Machine Learning. Part III

Non-residential building occupancy modeling. Part III. Defining occupancy patterns.

Recalling from the last part II we have got building occupancy data-set. In this part I will apply k-means cluster algorithm in order to identify typical occupancy patterns with daily resolution. To better visualize input data we have, see the following table below:

Month	Day	Weekday	Minutes	Temperature	Relative humidity	Air Velocity	Occupancy out of 24
7	31	3	779	23	89	1.3411	0
8	1	4	823	24	88	3.578	2
...	...	...	...	...	...	...	...

So I changed initial data by transforming Matlab absolute time readings into month, day, weekday(where 1 - Sunday,2 - Monday etc.), day time in minutes (with 15 min time step); temperature,velocity and humidity is general outdoor data; and finally total building occupancy. Original data contained binary occupancy but I decided to combine it for better description of daily occupancy of a whole building. One remark: since we have not considering personal occupancy, we cannot consider personal data inputs like temperature, CO2, humidity etc for each person. We can only use general data equal for every occupant such as outdoor weather readings introduced on table above.
Thus, the first task is to estimate the appropriate k number representing number of centroids or clusters. If do not remember how k-means algorithm works go to this site. To estimate k I has drown plot of k and cost function (also called cost function):

 import pylab as pl  
 import pandas as pd  
 from sklearn.cluster import KMeans  
 df = pd.read_csv('NewClusterInput.csv',header=None)  
 Nc = range(1, 20)  
 kmeans1 = [KMeans(n_clusters=i) for i in Nc]  
 score = [kmeans1[i].fit(df).score(df) for i in range(len(kmeans1))]  
 pl.plot(Nc,score)  
 pl.xlabel('Number of Clusters')  
 pl.ylabel('Score')  
 pl.title('Elbow Curve')  
 pl.savefig('Elbow Curve.png')  
 pl.show()

We will get the following elbow curve:

You might notice that after k>3 module of cost function decreases sharply. I decided to take k=4 and run k-means clusterization for 4 centroids. After running k-means the following different occupancy profiles I got:

Sultan Yerumbayev's blog

Search This Blog

Machine Learning. Part III

Non-residential building occupancy modeling. Part III. Defining occupancy patterns.

Comments

Post a Comment

Popular posts from this blog

Machine Learning. Part II.

Machine Learning