Sunday, May 31, 2009

For my roommate (cont'd)


Upon reviewing the methodology of the analysis, we found two major problems:
1) Data did not contain all the dates with 0 entries
2) Plotting trendline over individual dates is not right, since there could be trends within each 2 periods (i.e. positive/negative trend within the month of april, and positive/negative trends within the month of may) which skew the result

New methodology: Calculate the average twitter post per month for the past couple of months. This way, trends within each month will not affect the result. The aggregation period is chosen to be one month for convenience - esp since i've been here for one month. If aggregation is done for period less than one month, we will need to keep in mind the potential negative/positive trend within each month.

Jan - 4.258065
Feb -4.214286
March - 5.354839
April - 3.733333
May - 3.310345

Conclusion (Pending Raj reproducing the same result): May average is lower than Jan-April averages. My intuition through normal observation was right: there is a negative correlation between my being here and Raj's daily twitter posts.

