The goal of this analysis is to understand which location variables assist in determining most effective location tracking of customers based upon accessing public WIFI networks. In order to accomplish this goal, this report investigates the relationship between the following variables to determine which ones offer the most pertinent information to predict locations of customers and develop navigation system: Longitude, Latitude, Floor, Building ID (specific building), Space ID (specific space), Relative Position (location within a space such as internal or external), User ID, Phone ID, and Timestamp. Based upon the relationship of these previous mentioned variables and how they relate with the received signal strength of the network, this report offers specific recommendations as to which key locationing point will be vital in developing a comprehensive navigation system. There were 527 attributes with over 2 million data points.

Wifi Fingerprinting

Residential Energy Usage - Time Series Analysis

This was a time series analysis of four years of residential energy usage data to predict energy usage based upon sub metering. There were approximately 2 million data points with nine attributes with the focus on three sub-meters for various areas of the house and energy usage. The data was collected at one-minute intervals over a four year periods. The goal of the time series analysis was to use create a predictive model of sub metering to determine if client (residential builder) found the sub-metering effective as well as provide recommendations for further analysis and sub-metering usage.

Customer Buying Behavior

The data used for this analysis consisted of a dataset of fully completed survey responses and another data set that did not have full response. In the completed dataset, the number of responses was totaled at 10,000 responses with the following categories: salary, age, education level, model of primary car, zip code, credit limit, and brand of computer. While the incomplete survey had only 5,000 responses with the same categories yet the brand of computer attribute did not have any pertinent customer information. Therefore, the process of this analysis consisted of developing and training a model from the completed survey data to predict customer computer preference in the incomplete survey data. The first step in the both data sets was inspecting the data for missing values and transforming specific attributes into data that can be analyzed. This including transitions the attributes type of cars, zip code, and computer brand to factors and changing the attribute education level to ordinal. Then, I proceeded to develop testing and training datasets to determine the best predictive model utilizing the algorithms, K Nearest Neighbor (KNN) and Random Forrest. In building this model, the complete survey data were split into 75% of data (training) and 25$ (testing). Each of the algorithms were applied to the Complete Survey data and the training and testing data sets in order to build an effective model to more accurately predict customer computer brand preferences.

Github Portfolio