Statistics homework help

Quantitative Analysis of Credit
This project is based on the data attached, which includes a splitting variable . In the spreadsheet under the tab “Data,” you will find data pertaining to 1,000 personal loan accounts. The tab “Data Dictionary” contains a description of what the various variables mean.
Please submit a single, well-formatted PDF or Word file. The instructor should not need to go searching for your answers! In addition, please upload an Excel file with your model outputs. Please answer all the questions. Supply supporting documentation and show calculations as needed.
As a part of a new credit application, the company collects information about the applicant. The company then decides an amount of the credit extended (the variable CREDIT_EXTENDED). For these 1,000 accounts, we also have information on how profitable each account turned out to be (the variable NPV). A negative value indicates a net loss, and this typically happens when the debtor defaults on his/her payments.
The goal in this assignment is to investigate how one can use this data to better manage the bank’s credit extension program. Specifically, our goal is to develop a classification model to classify a new credit account as “profitable” or “not profitable.” Secondly we want to compare its performance in the context of decision support to a linear regression model that predicts NPV directly.

Data Preparation
The data preparation repeats the steps from the live session:

The goal is to predict whether or not a new credit will result in a profitable account. Create a new variable to use as the dependent variable.

Create dummy variables for all categorical variables with more than 2 values (or if you prefer, you can sort your variables into numerical and categorical when you run the model).

Split the data into 2 parts using the splitting variablethat has been added to the data set. This is to ensure a more balanced split between the validation and training samples. Note that Analytic Solver Data Mining only allows 50 columns in the analysis, so leave out your base dummies (if you created them) when partitioning. After the data partition, you should have 666 rows in your training data and 334 in your validation data.

The Assignment

Applying Logistic Regression

If one fits a Logistic Regression Model using all the independent variables, one observes a) a gap in the classification performance between the training data and the validation data, and b) very high p-values for some of the variables. The performance gap between the training and validation may be a sign of overfitting, and the high p-values may be a sign of “useless” variables in the model, or of multicollinearity.

Our goal is to classify credit requests into “profitable” and “not profitable.” To that end, select to run “forward selection,” and set FIN down to 1.5 (this lowers the threshold for a variable to enter the model, resulting in more models to choose from). Select one of the forward selection models based on the principles discussed in the book and/or the tutorials on the course resource center and run it.
Note: Exclude Credit Extended and any other variables not appropriate for the analysis.

Include the model (the variables and the corresponding regression coefficients) as an Exhibit.

Why did you select this particular model?

Based on your model, and setting the cut-off value to 0.5, please provide the following information (based on the validation data):
- The sensitivity of the model
- The specificity of the model
ROC Curves
We now want to compare the predictive performance of the model on the training sample and on the validation sample. Create a singlefigure that compares the ROC curves for both the training sample and the validation sample. Please refer to the ROC tutorials in the resource center as needed for a step-by-step guide for creating an ROC curve. Alternatively, you can combine the two curves that Analytical Solver Data Mining provides into a single plot.

Include a clean figure as an Exhibit.

Finding the “best” cut-off
Create a data-table to calculate the total NPV (assuming we extend credit to all classified as “profitable” as a function of the cut-off based on the training data. Select the best cut-off. Include the table as an Exhibit.

What is your selected cut-off.

Create the same table for the validation data. Include the table as an Exhibit.

Apply the cut-off you selected based on the training data to the validation data. What is the total profit on the validation data?
Provide a figure that shows the cumulative NPV as a function of the cut-off for both the training data and the validation data.
Comparison with linear regression
Repeat our model development from our first live session (note you need to repeat the steps as we now have a new data split). Rerun a variable selection model to find a “good model” using the updated data.

Include the model (the variables and the corresponding regression coefficients) as an Exhibit.

Create a data table that summarizes the total profit as a function of the NPV cut-off for extending credit on the training data (note that now your cut-off is in $ you will need to investigate what is a good cut-off, for example -$50 or $50, or something else). Select the best cut-off.
Include the table as an Exhibit.

What is your selected cut-off?

Create the same table for the validation data and include it as an Exhibit.

Apply the cut-off you found to the validation data. What is the total profit on the validation data?

Provide a figure that shows the cumulative NPV as a function of the cut-off for both the training data and the validation data.
Model comparison
Compare the performance of the logistic regression model and the linear regression model. How does the total profit compare for the two models? Which model would you select as the foundation of a decision support system and why?

Solution:

15% off for this assignment.

Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!

Why US?

100% Confidentiality

Information about customers is confidential and never disclosed to third parties.

Timely Delivery

No missed deadlines – 97% of assignments are completed in time.

Original Writing

We complete all papers from scratch. You can get a plagiarism report.

Money Back

If you are convinced that our writer has not followed your requirements, feel free to ask for a refund.