major upload of (python) course material & solutions

2025-12-03 14:39:45 +01:00
parent 52552e20cb
commit e95a0b2ecc
39 changed files with 13598 additions and 0 deletions
--- a/Classification/02_Default_data.ipynb
+++ b/Classification/02_Default_data.ipynb
@@ -0,0 +1,184 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "id": "6cbef61b-0897-42bf-b456-c0a409b87c41",
+   "metadata": {},
+   "source": [
+    "\\vspace{-4cm}\n",
+    "\\begin{center}\n",
+    "  \\LARGE{Machine Learning for Economics and Finance}\\\\\n",
+    "  \\Large{Task 1: Logistic Regressions}\\\\[0.5cm]\n",
+    "  \\Large{\\textbf{02\\_Default\\_data}}\\\\[1.0cm]\n",
+    "  \\large{Ole Wilms}\\\\[0.5cm]\n",
+    "  \\large{July 29, 2024}\\\\\n",
+    "\\end{center}"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "id": "13be77f3-44f0-4983-b4cb-bd3e4b5dba8b",
+   "metadata": {},
+   "source": [
+    "\\setcounter{secnumdepth}{0}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "72f918a4-cdd4-4b46-a88f-f4b43c3c3a88",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "source": [
+    "## Task 1: Logistic Regressions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b3f9fc6-db4f-47b0-9dfa-e41d9f85a5ba",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "source": [
+    "1.1 Randomly split the data into $7000$ observations for training and $3000$ observations for testing and set the seed to $1$ before sampling the data. Call these two datasets *train_data* and *test_data* respectively. (Hint: use the code to split the data from 01 Auto_data_2.R or Auto_data_2.Rmd)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "335aa198-5a94-4c5a-8ad8-67c78bcf71f5",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "116c466d-0627-43d6-adbe-a937ac846a28",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "source": [
+    "1.2 Fit a logistic regression of default on *income* using the *train_data*. Analyze the significance of\n",
+    "the estimated coefficients."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2e38a201-7f2d-4999-beab-5739217a9318",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43c6dade-5a22-476a-b3bf-bfd1b880038d",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "source": [
+    "1.3 Compute the *out-of-sample accuracy* and *error rate* and compare to the *in-sample statistics*. Do\n",
+    "you think this is a good model to predict default?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "44028726-1eff-436f-bc47-04a6786ae3ad",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c28971ef-8bee-462d-9612-88f1534bfcb5",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "source": [
+    "1.4 Add balance as a predictor and compute the *out-of-sample error rate* and *accuracy*. Do you\n",
+    "think this is a good model to predict *default*?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3a7216df-adf5-4df0-9593-69c1a7649f64",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f267ef66-1775-42a8-a1e9-45fda849f4d9",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "source": [
+    "1.5 Compare the results for Task $1.4$ to a model with only balance as a predictor. Which model\n",
+    "would you choose?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "28082bd5-8fe1-4160-aec0-1a92aebfa671",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ccad70f-5ef5-42c8-8c2e-22e76943d281",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "source": [
+    "1.6 Take the model from Task $1.4$ but now re-estimate the model using different *seeds* to draw your\n",
+    "*training* and *test data*. Does your *test error rate* change with the seed? What’s going on here?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ab2f559-83b1-4a66-b1dc-8799b8301d85",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "date": " ",
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.1"
+  },
+  "title": " ",
+  "toc-autonumbering": false,
+  "toc-showcode": false,
+  "toc-showmarkdowntxt": false,
+  "toc-showtags": false
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/Classification/02_Default_data.pdf
+++ b/Classification/02_Default_data.pdf
--- a/Classification/02_Default_data_solution.ipynb
+++ b/Classification/02_Default_data_solution.ipynb
--- a/Classification/02_Default_data_solution.pdf
+++ b/Classification/02_Default_data_solution.pdf
--- a/Classification/default_data.parquet
+++ b/Classification/default_data.parquet