Below are projects we have completed either as a team or as a solo project.
Time Series Analysis and Forecasting
​
Time Series Analysis is the process of uncovering a pattern in the time series and then use the pattern in the future. The forecast is based solely on past values of the variable of interest and/or on past forecast errors. Time Series Analysis usually possess either random behavior, upward/downward trends, or seasonal/cyclical effects.
}Time series generally have components such as:
- random behavior
- trends (upward or downward)
- seasonal effects
- cyclical effectsTime
}Time series generally have components such as:
- random behavior
- trends (upward or downward)
- seasonal effects
- cyclical effects
PROJECTS
Data Mining using Cluster Analysis
For this data mining technique we used cluster analysis, which is an example of unsupervised learning to find patterns within the data. In order to measure similarity over large data sets, we cluster observations to build a typology for types of groups. Our data set had information on patients with heart disease including the following variables: gender, age of death, age of diagnosis, weight status, cholesterol status, and smoking status. We used SAS Visual Analytics to find similarities among the individuals who had died. Using box plots and parallel coordinates plots, we were able to characterize the clusters. Within SAS Visual Analytics, we were able to determine our best k, or number of clusters.
Data Visualization Using Tableau
Tableau is an incredible tool for the visualization, analysis, and manipulation of relational data. Our assignment was to use this tool to explore annual data from Chicago's Crime portal and do our bests to pull insight from the rows and rows of raw data. Step one in our process was to clean the data and get it into a format suitable for the tableau tool. We did this my loading the data set into access and performing some queries to pull just the data we would be using for the analysis. Some of our key findings include when where and what type of crimes are most prevalent and we have used a variation of data visualizations to make these insights readable and informative.
Multiple Linear Regression Analysis
​
Regression is the process of using data to formulate relationships among variables. It also can be used in short-term forecasting. This process contains one response variable (y) and the rest are explanatory variables (x). This multiple linear regression analysis example was completed using research data that assessed how age, systolic blood pressure, and smoking relate to the risk of strokes. In this example, we assessed the descriptive statistics and correlation matrix before completing the regression. Being a smoker had the highest correlation with risk of stroke. A multiple linear regression equation was then developed which can be used to predict risk based on the explanatory variables.
![]() |
---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
---|
![]() |
![]() |
![]() |
![]() |
---|
![]() |
![]() |
![]() |
---|
![]() |
![]() |
![]() |
Data Mining using Partitioning
This technique uses supervised learning to predict or classify the data. We used the data of high credit risk loan applications and had the following variables: the customer’s loan purpose, checking and savings account balances, months employed, gender, marital status, age, housing, and job. We used the XLMiner kNN data mining technique to create a model to predict and to classify customers that would be high risk versus low risk for the requested loan. The credit union will approve for credit or deny based on our results. We characterized customers utilizing the test Lift chart, ROC chart, and Decile-Lift charts. Also we determined the best k for the data with the XLMiner kNN technique.
![]() |
---|
![]() |
![]() |
![]() |
---|
![]() |
![]() |
---|
![]() |
![]() |
![]() |
![]() |
![]() |
Linear Programming and Optimization
​
Excel is a powerful tool when the correct add-ins are installed and available to you. For our assignment, our team utilized the Solver tool to optimize profit for the iDesgin business by determining which types of projects and how many of each would generate the most revenue with the least amount of costs by abiding by certain constraints.
Risk Simulation
​
The Monte Carlo Risk Simulation shows how uncertainty in the inputs influence the outputs. The Analytic Solver Platform in Microsoft Excel has the capability to run the simulation within the spreadsheets. It also shows the probability of the outcomes. This example focuses on a hypothetical pharmaceutical company that analyzes risk and its effect on net present value.
​
​
Decision Tree Visualization and Analysis
Decision tree analysis in SAS Visual Analytics is a tool that builds tree like statistical models for classification and regression. Using the values of one or more predictor data items to predict a response data item, SAS Visual's Decision tree displays a series of nodes as a tree where each branch and leaf coming from the response represents a split in the values of a predictor data item. For our assignment, we built a decision tree for whether or not someone is likely to die from coronary heart disease. Along with our tree model, we also observed the lift statistic, ROC, and mis-classification Assessment that further the accuracy of our model.