new inclusion of the second step: python-exercises

2025-12-03 13:14:52 +01:00
parent ee8c81afbd
commit 52552e20cb
5 changed files with 1447 additions and 0 deletions
--- a/Finance/python-exercises/python-exercises.ipynb
+++ b/Finance/python-exercises/python-exercises.ipynb
@@ -0,0 +1,425 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "id": "6cbef61b-0897-42bf-b456-c0a409b87c41",
+   "metadata": {},
+   "source": [
+    "\\vspace{-4cm}\n",
+    "\\begin{center}\n",
+    "  \\LARGE{Machine Learning for Economics and Finance}\\\\[0.5cm]\n",
+    "  \\Large{\\textbf{Python Exercises}}\\\\[1.0cm]\n",
+    "  \\large{Ole Wilms}\\\\[0.5cm]\n",
+    "  \\large{April 24, 2024}\\\\\n",
+    "\\end{center}"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "id": "13be77f3-44f0-4983-b4cb-bd3e4b5dba8b",
+   "metadata": {},
+   "source": [
+    "\\setcounter{secnumdepth}{0}"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "id": "a4c564a3-8712-4601-84b4-72b51df8bbbf",
+   "metadata": {},
+   "source": [
+    "\\tableofcontents"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "040dc2a4-910e-4cf5-9d1e-62fe7d0a8efd",
+   "metadata": {},
+   "source": [
+    "## Important Instructions\n",
+    "  - The purpose of these exercises is to get to know Python by solving some basic programming exercises\n",
+    "  - In case you struggle with some problems, please post your questions on the OpenOlat Forum.\n",
+    "  - Particularly difficult questions are marked by $\\color{red}{\\text{(D)}}$. Don’t worry if you cannot solve these questions right away. Throughout the course, these programming concepts will become easier to understand.\n",
+    "  - Sample solutions to the exercises will be provided next week. However, I strongly encourage all students to work on the exercises beforehand."
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "id": "d1a6cda1-d74f-4a81-8c17-cdd83a0dae17",
+   "metadata": {},
+   "source": [
+    "\\newpage"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87902d82-5336-456b-bec8-403530c75f00",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## Task 1: Constructing a dataset\n",
+    "\n",
+    "1. Create different kinds of vectors with $6$ entries each:\n",
+    "    - vector $a$: a vector with only ones (hint: you can use the `np.repeat()` function)\n",
+    "    - vector $b$: a vector of integers that goes from $1$ to $6$ (hint: you can use the `np.arange()` function)\n",
+    "    - vector $c$: a vector where each entry is drawn from a normal distribution with mean $2$ standard deviation $5$.\n",
+    "    - vector $d$: a vector where each entry consists of one of the words in \"*Machine Learning for Economics and Finance*\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f1cf1749-9e5b-434a-8f45-5d63db20ee2a",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73330b81-0e43-43ac-911f-4086a9f9788f",
+   "metadata": {},
+   "source": [
+    "2. Stack vector $b$ into a matrix $M1$ of dimension $2$ x $3$ where you fill in by column. Stack the same vector into a matrix $M2$ of dimension $3$ x $2$ where you fill in by row."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c658a6a-1c6a-4350-9c4f-6afdd4dbaa7c",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "80e4160e-374a-43e1-a159-45077703658e",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "3. Add the two matrices. You will obtain an error message. What’s going wrong? Solve the problem using the transpose function `np.transpose()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cb851b64-3518-406d-be06-46721a6eda01",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03d19235-25ee-4c3b-b7bf-97cdf27d41b2",
+   "metadata": {},
+   "source": [
+    "4. Create a vector *train_sample* with $4$ entries by randomly sampling $4$ values from vector $b$ without replacement (that is, you cannot draw the same number twice). For this you can use the function `np.random.choice()`. Run the code that creates the vector multiple times. Explain what’s happening. Fix the issue by using the function `np.random.seed()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "81aff077-3d61-468c-a872-9006f75af9e6",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79732a93-d610-4d49-9bf0-a03b3f4edf22",
+   "metadata": {},
+   "source": [
+    "5. Put vectors $a$, $b$, $c$ and $d$ together in a dataframe called *df*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "849fa290-26b8-44de-815e-59095fc3dd61",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "919dde6d-4ff0-481a-a0d8-9413abe8f56a",
+   "metadata": {},
+   "source": [
+    "6. Name the columns of *df* *’Ones’*, *’Seq’*, *’Normal’* and *’Coursename’* respectively (hint: you can use the function `pd.DataFrame()`). Provide a summary of the dataframe using the `describe()`function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "55cc73a2-17c7-4e5c-80c3-f9badf83bfce",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ada39fc4-a156-40e6-9281-9754302d2ae7",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "7. $\\color{red}{\\text{(D)}}$ Add a column called *’Int’* to the dataframe which checks whether column *’Normal’* is larger than $0$. If that is the case *’Int’* should contain a *TRUE*, if that is not the case *’Int’* should contain a FALSE. Proceed as follows:\n",
+    "    - Create a new column named *'Int'* in the DataFrame, initializing all elements to True. Use a loop to iterate through each row of the DataFrame. For each row, check if the corresponding value in the *'Normal'* column is greater than $0$. If it is, retain the *TRUE* value in the *'Int'* column; otherwise, replace it with *FALSE*.\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "59650599-11ed-4be4-8e21-4737642634db",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9f909ae-9a0e-4a69-a5f5-5f1eacb6bc2e",
+   "metadata": {},
+   "source": [
+    "8. $\\color{red}{\\text{(D)}}$ Can you think of an easier way to construct the column *’Int’* instead of the loop described above? If yes, add this column and call it *’Int2’*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a37153ee-cee2-4591-84a0-d57292ec4610",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20e52fac-725f-4b85-a6dd-6d70ea890928",
+   "metadata": {},
+   "source": [
+    "9. $\\color{red}{\\text{(D)}}$ Now we use our vector *train_sample* to construct two distinct datasets from *df*. The numbers in *train_sample* refer to the rows of our dataframe *df* that we want to use for the first dataset while all other rows can be used for the second dataset. Construct a new dataframe called *df_train* that only contains the rows in *train_sample*. Note that you can simply use square brackets to extract rows from a dataframe. Make sure that you extract all columns but only the rows that are in *train_sample*. Your object *df_train* should have $4$ rows and as many columns as *df*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dcb74cc8-21d7-4321-acf3-c2ea7ef5356e",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27a77f7f-437c-4d16-b34d-07dda30e2ac7",
+   "metadata": {},
+   "source": [
+    "10. $\\color{red}{\\text{(D)}}$ Construct another dataframe called *df_test* which contains the other two rows of *df* that are not in *df_train*. Note that you can use `~df.index.isin()` to select all rows that are *NOT* in *train_sample*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d519d6f8-ebe6-47e7-b135-7c74c0b1f4f5",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "raw",
+   "id": "3ba17c73-a83f-43fa-8f29-3b773e25887b",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "\\newpage"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df4f7f10-2779-43ab-a7b0-3bd1b3f15b0c",
+   "metadata": {},
+   "source": [
+    "## Task 2: Working data from the *ISLR2* library\n",
+    "\n",
+    "1. Install and load the library *ISLP*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "551285b4-ef00-4be0-8000-ceac1ca7742e",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45467793-413b-4441-8c43-3e4a613451c9",
+   "metadata": {},
+   "source": [
+    "2. Load the dataset *Auto* and save it into an object called *Auto*. Use the help function to obtain information about the variables in *Auto*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f55378d0-ff39-4533-89ec-59582fdace34",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3d8420f-8986-4b9e-ac8b-bfd42cd9cd8a",
+   "metadata": {},
+   "source": [
+    "3. Provide a summary of *Auto* using the `describe()` function. Do you think all the variables in *Auto* could be readily used for a linear regression model?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "abe5c34d-9f95-49bb-b9bb-0f1c0745a7f1",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7870bbb9-e5cd-4fcc-bb2d-d33e80b2c8d2",
+   "metadata": {},
+   "source": [
+    "4. The goal of the following exercises is to understand the relation between the variable *’mpg’* and *’horsepower’*:\n",
+    "    - Provide a histogram of *’mpg’* using the function `hist()`. Hint: For creating plots and visualizations, the `matplotlib` package  is a common choice.\n",
+    "    - Compute the pearson correlation between *’mpg’* and *’horsepower’*. For this, first select the two respective columns using `Auto[\"mpg\",\"horsepower\"]` and then use the function `corr()`. Is there a positive or negative relationship between the two variables?\n",
+    "    - Provide a plot with *’horsepower’* on the x-axis and *’mpg’* on the y-axis. Do you think a linear regression model is well suited to predict *’mpg’* using *’horsepower’* ?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4934957d-d920-4191-aa41-71fbadbe4b62",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "raw",
+   "id": "b7289365-b358-470b-b10a-f5ba082a8ab2",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "\\newpage"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02902876-5944-4612-973d-512bbb27fd4e",
+   "metadata": {},
+   "source": [
+    "## Task 3: Working with external data\n",
+    "\n",
+    "1. Load the dataset `’return_data.csv’` which contains historical returns of Apple (*’ret_apple’*), the index return of the *S\\&P500* which is a broad portfolio of stocks in the US (*’ret_index’*), as well as the return of a riskless investment in government bonds (*’rf’*). Make sure that you set the right working director when you try to load in the data. In the dataset, a number of $0.1$ corresponds to a return of $10\\%$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "db6354dd-52e3-462a-bac3-d4cc08d541ca",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "039801c1-3a1d-4870-94ba-662f23f762fe",
+   "metadata": {},
+   "source": [
+    "2. To get to know the data, construct three plots each having the date on the x-axis and the respective return time series on the y-axis."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f0f01a54-1571-409f-bf7b-080f749f874c",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b3745c5-4b7b-4118-abec-6d2b87af06d0",
+   "metadata": {},
+   "source": [
+    "3. Compute the means and the standard deviations of the three time series and interpret the results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "36dde5ec-bc8c-4280-8385-420a06b97d1f",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c91066a0-28a6-45fe-b036-03fdd2c79362",
+   "metadata": {},
+   "source": [
+    "4. What was the maximum loss in a single month when holding Apple stocks? What are the maximum losses for the *S\\&P500* and the risk-free rate? Interpret."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c4ba196-b0d3-4841-95f9-995f4e127c33",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30404916-65d2-40e3-be36-b0edb762db49",
+   "metadata": {},
+   "source": [
+    "5. Compute the pearson correlation between *’ret_apple’* and *’ret_index’* using the function `cor()`. Interpret the result."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5b0de57e-72cb-46cf-99cf-dd348f59ba55",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "date": " ",
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.8"
+  },
+  "title": " ",
+  "toc-autonumbering": false,
+  "toc-showcode": false,
+  "toc-showmarkdowntxt": false,
+  "toc-showtags": false
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/Finance/python-exercises/python-exercises.pdf
+++ b/Finance/python-exercises/python-exercises.pdf
--- a/Finance/python-exercises/python-exercises_solution.ipynb
+++ b/Finance/python-exercises/python-exercises_solution.ipynb
--- a/Finance/python-exercises/python-exercises_solution.pdf
+++ b/Finance/python-exercises/python-exercises_solution.pdf
--- a/Finance/python-exercises/randomly_split_train_test_NOTE.txt
+++ b/Finance/python-exercises/randomly_split_train_test_NOTE.txt
@@ -0,0 +1,21 @@
+import numpy as np
+
+# set seed
+np.random.seed(1)
+
+# Number of observations in the dataset
+n = len(default_data)
+
+# Randomly shuffle the indices of the dataset
+indices = np.random.permutation(n)
+
+# Compute training and validation sample sizes
+nT = int(0.7 * n)  # Training sample size
+
+# Split the dataset based on shuffled indices
+n_train = indices[:nT]   # First 70% for training
+n_test = indices[nT:]    # Remaining 30% for validation
+
+# Create training and validation datasets
+train_data = default_data.iloc[n_train]
+test_data = default_data.iloc[n_test]