BUS5PA Predictive Analytics 2018 BUS5PA Assignment 2 BUS5PA Predictive Analytics Semester 2, 2018 Assignment 2: Building and Evaluating Predictive Models using SAS Enterprise Miner Release Date: 6th September 2018 Due Date: 7th October 2018 11.55pm Weight: 40% Format of Submission: A report (electronic form) + electronic submission of project in LMS site. Objective: a) Demonstrate knowledge of building different types of predictive models using SAS Enterprise Miner b) Demonstrate skill and knowledge in applying predictive models in a real-life predictive analytics task c) Relate theoretical knowledge of predictive models and best practices to application scenarios Business Case Predictive Model for Target Marketing A department store offers a new clothing line and the management wants to determine which customers are likely to purchase these products, for target marketing. The management intend to use data from their customer loyalty program to identify the customers who bought from the clothing line. The Customer_Purchase data set contains 12 variables and more than 15,000 observations that can be used for training a predictive model. The management intend to predict whether a new customer is likely to buy from the clothing line. The variables in the data set are shown below with the appropriate roles and levels. Although two target variables are listed, we will concentrate on the binary variable TargetBuy. Name Model Role Measureme nt Level Description ID ID Nominal Customer loyalty identification number affl Input Interval Affluence grade on a scale from 1 to 30 age Input Interval Age, in years clusterGroup Input Nominal Neighborhood group gender Input Nominal M = male, F = female, U = unknown region Input Nominal Geographic region tvReg Input Nominal Television region loyaltyStatus Input Nominal Loyalty status: tin, silver, gold, or platinum totSpent Input Interval Total amount spent totTime Input Interval Time as loyalty card member TargetBuy Target Binary Purchased from new clothing line? 1 = Yes, 0 = No TargetAmt Rejected Interval Number of items brought from the clothing line BUS5PA Predictive Analytics 2018 BUS5PA Assignment 2 1. Setting up the project and exploratory analysis (10%) a. Create a new project and use the Customer_Purchase data set as a data source. Use the data source in a diagram. b. As noted above, only TargetBuy is used for this analysis, and it should have a role of Target. Can TargetAmt be used as an input for a model that is used to predict TargetBuy? Why or why not? Please explain with justifications. c. Carry out a data exploration by using a StatExplore Node. Explain your findings. d. Create a Data Partition with 50% of the data for training and 50% for validation. 2. Decision tree based modeling and analysis (25%) a. Create 2 Decision Tree models. Use different model assessment statistics to build up the decision tree models. Explain each decision tree by explaining, how many leaves are in the optimal tree? which variable was used for the first split? what were the competing splits for this first split? b. Create another Decision Tree model based on 3 branches allowing three-way splits. You can select the most suitable model assessment statistic to build up the tree. Explain the decision tree by explaining how many leaves are in the optimal tree? c. Which of the decision tree models appears to be better? Explain. 3. Regression based modeling and analysis (25%) a. In preparation for regression, is any missing values imputation needed? If yes, should you do this imputation before generating the decision tree models? Why or why not? b. Use an Impute node connected to Data Partition node. Set the node to impute U for unknown class variable values and the overall mean for unknown interval variable values. Create imputation indicators for all imputed inputs. c. Create a Regression model. You can choose stepwise selection and use validation error as the selection criterion. d. Run the Regression node and view the results. Which variables are included in the final model? Explain what this means to the supermarket management (very briefly). Which variables are important in this model? Explain. What is the validation ASE? What does this mean? 4. Model Comparison and Scoring (25%) a. Compare and contrast the results from the decision tree and regression based analysis. Describe and justify how you ascertained the better model. b. Would it have been sufficient to use only one modeling techniques (decision tree or regression)? Provide justifications for your answer. (Hint: You may use screen shots from your Enterprise Miner project in the report. Answer for this part should not exceed 2 pages). BUS5PA Predictive Analytics 2018 BUS5PA Assignment 2 c. What are the advantages of using a decision tree model? What advantages would a regression model provide? Students are expected to use the lecture discussions on features of decision trees and regression and apply this understanding to the data analysis. Give examples from the models you have built. d. Use Customer_Purchase_Score data set to score the best model. Explain the output using plots. 5. Extending current knowledge with additional reading (15%) Refer Data Mining techniques, Linoff and Berry, Chapter 5. Pages 156, 157 in this chapter discusses issues with model stability identifying 4 key issues: a) Just getting things wrong b) Overfitting c) Sample bias d) Future not being like the past As the analytics team leader, how do you explain to your team what this means in the context of your current assignment (Customer_Purchase data analysis)? Discuss how each of these issues could impact the above business case (Your answers to this question MUST relate to the business case. NO marks will be given to general definitions of these terms).
Demonstrate knowledge of building different
Pssst…We can write an original essay just for you.
Any essay type. Any subject. We will even overcome a 6 hour deadline.
<< SAVE15 >>
Place your first order with code to get 15% discount right away!