new inclusion of the second step: python-exercises

This commit is contained in:
2025-12-03 13:14:52 +01:00
parent ee8c81afbd
commit 52552e20cb
5 changed files with 1447 additions and 0 deletions

View File

@@ -0,0 +1,425 @@
{
"cells": [
{
"cell_type": "raw",
"id": "6cbef61b-0897-42bf-b456-c0a409b87c41",
"metadata": {},
"source": [
"\\vspace{-4cm}\n",
"\\begin{center}\n",
" \\LARGE{Machine Learning for Economics and Finance}\\\\[0.5cm]\n",
" \\Large{\\textbf{Python Exercises}}\\\\[1.0cm]\n",
" \\large{Ole Wilms}\\\\[0.5cm]\n",
" \\large{April 24, 2024}\\\\\n",
"\\end{center}"
]
},
{
"cell_type": "raw",
"id": "13be77f3-44f0-4983-b4cb-bd3e4b5dba8b",
"metadata": {},
"source": [
"\\setcounter{secnumdepth}{0}"
]
},
{
"cell_type": "raw",
"id": "a4c564a3-8712-4601-84b4-72b51df8bbbf",
"metadata": {},
"source": [
"\\tableofcontents"
]
},
{
"cell_type": "markdown",
"id": "040dc2a4-910e-4cf5-9d1e-62fe7d0a8efd",
"metadata": {},
"source": [
"## Important Instructions\n",
" - The purpose of these exercises is to get to know Python by solving some basic programming exercises\n",
" - In case you struggle with some problems, please post your questions on the OpenOlat Forum.\n",
" - Particularly difficult questions are marked by $\\color{red}{\\text{(D)}}$. Dont worry if you cannot solve these questions right away. Throughout the course, these programming concepts will become easier to understand.\n",
" - Sample solutions to the exercises will be provided next week. However, I strongly encourage all students to work on the exercises beforehand."
]
},
{
"cell_type": "raw",
"id": "d1a6cda1-d74f-4a81-8c17-cdd83a0dae17",
"metadata": {},
"source": [
"\\newpage"
]
},
{
"cell_type": "markdown",
"id": "87902d82-5336-456b-bec8-403530c75f00",
"metadata": {
"tags": []
},
"source": [
"## Task 1: Constructing a dataset\n",
"\n",
"1. Create different kinds of vectors with $6$ entries each:\n",
" - vector $a$: a vector with only ones (hint: you can use the `np.repeat()` function)\n",
" - vector $b$: a vector of integers that goes from $1$ to $6$ (hint: you can use the `np.arange()` function)\n",
" - vector $c$: a vector where each entry is drawn from a normal distribution with mean $2$ standard deviation $5$.\n",
" - vector $d$: a vector where each entry consists of one of the words in \"*Machine Learning for Economics and Finance*\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1cf1749-9e5b-434a-8f45-5d63db20ee2a",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "73330b81-0e43-43ac-911f-4086a9f9788f",
"metadata": {},
"source": [
"2. Stack vector $b$ into a matrix $M1$ of dimension $2$ x $3$ where you fill in by column. Stack the same vector into a matrix $M2$ of dimension $3$ x $2$ where you fill in by row."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c658a6a-1c6a-4350-9c4f-6afdd4dbaa7c",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "80e4160e-374a-43e1-a159-45077703658e",
"metadata": {
"tags": []
},
"source": [
"3. Add the two matrices. You will obtain an error message. Whats going wrong? Solve the problem using the transpose function `np.transpose()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb851b64-3518-406d-be06-46721a6eda01",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "03d19235-25ee-4c3b-b7bf-97cdf27d41b2",
"metadata": {},
"source": [
"4. Create a vector *train_sample* with $4$ entries by randomly sampling $4$ values from vector $b$ without replacement (that is, you cannot draw the same number twice). For this you can use the function `np.random.choice()`. Run the code that creates the vector multiple times. Explain whats happening. Fix the issue by using the function `np.random.seed()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "81aff077-3d61-468c-a872-9006f75af9e6",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "79732a93-d610-4d49-9bf0-a03b3f4edf22",
"metadata": {},
"source": [
"5. Put vectors $a$, $b$, $c$ and $d$ together in a dataframe called *df*."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "849fa290-26b8-44de-815e-59095fc3dd61",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "919dde6d-4ff0-481a-a0d8-9413abe8f56a",
"metadata": {},
"source": [
"6. Name the columns of *df* *Ones*, *Seq*, *Normal* and *Coursename* respectively (hint: you can use the function `pd.DataFrame()`). Provide a summary of the dataframe using the `describe()`function."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "55cc73a2-17c7-4e5c-80c3-f9badf83bfce",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "ada39fc4-a156-40e6-9281-9754302d2ae7",
"metadata": {
"tags": []
},
"source": [
"7. $\\color{red}{\\text{(D)}}$ Add a column called *Int* to the dataframe which checks whether column *Normal* is larger than $0$. If that is the case *Int* should contain a *TRUE*, if that is not the case *Int* should contain a FALSE. Proceed as follows:\n",
" - Create a new column named *'Int'* in the DataFrame, initializing all elements to True. Use a loop to iterate through each row of the DataFrame. For each row, check if the corresponding value in the *'Normal'* column is greater than $0$. If it is, retain the *TRUE* value in the *'Int'* column; otherwise, replace it with *FALSE*.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "59650599-11ed-4be4-8e21-4737642634db",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "b9f909ae-9a0e-4a69-a5f5-5f1eacb6bc2e",
"metadata": {},
"source": [
"8. $\\color{red}{\\text{(D)}}$ Can you think of an easier way to construct the column *Int* instead of the loop described above? If yes, add this column and call it *Int2*"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a37153ee-cee2-4591-84a0-d57292ec4610",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "20e52fac-725f-4b85-a6dd-6d70ea890928",
"metadata": {},
"source": [
"9. $\\color{red}{\\text{(D)}}$ Now we use our vector *train_sample* to construct two distinct datasets from *df*. The numbers in *train_sample* refer to the rows of our dataframe *df* that we want to use for the first dataset while all other rows can be used for the second dataset. Construct a new dataframe called *df_train* that only contains the rows in *train_sample*. Note that you can simply use square brackets to extract rows from a dataframe. Make sure that you extract all columns but only the rows that are in *train_sample*. Your object *df_train* should have $4$ rows and as many columns as *df*."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dcb74cc8-21d7-4321-acf3-c2ea7ef5356e",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "27a77f7f-437c-4d16-b34d-07dda30e2ac7",
"metadata": {},
"source": [
"10. $\\color{red}{\\text{(D)}}$ Construct another dataframe called *df_test* which contains the other two rows of *df* that are not in *df_train*. Note that you can use `~df.index.isin()` to select all rows that are *NOT* in *train_sample*."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d519d6f8-ebe6-47e7-b135-7c74c0b1f4f5",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "raw",
"id": "3ba17c73-a83f-43fa-8f29-3b773e25887b",
"metadata": {
"tags": []
},
"source": [
"\\newpage"
]
},
{
"cell_type": "markdown",
"id": "df4f7f10-2779-43ab-a7b0-3bd1b3f15b0c",
"metadata": {},
"source": [
"## Task 2: Working data from the *ISLR2* library\n",
"\n",
"1. Install and load the library *ISLP*."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "551285b4-ef00-4be0-8000-ceac1ca7742e",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "45467793-413b-4441-8c43-3e4a613451c9",
"metadata": {},
"source": [
"2. Load the dataset *Auto* and save it into an object called *Auto*. Use the help function to obtain information about the variables in *Auto*."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f55378d0-ff39-4533-89ec-59582fdace34",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "f3d8420f-8986-4b9e-ac8b-bfd42cd9cd8a",
"metadata": {},
"source": [
"3. Provide a summary of *Auto* using the `describe()` function. Do you think all the variables in *Auto* could be readily used for a linear regression model?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "abe5c34d-9f95-49bb-b9bb-0f1c0745a7f1",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "7870bbb9-e5cd-4fcc-bb2d-d33e80b2c8d2",
"metadata": {},
"source": [
"4. The goal of the following exercises is to understand the relation between the variable *mpg* and *horsepower*:\n",
" - Provide a histogram of *mpg* using the function `hist()`. Hint: For creating plots and visualizations, the `matplotlib` package is a common choice.\n",
" - Compute the pearson correlation between *mpg* and *horsepower*. For this, first select the two respective columns using `Auto[\"mpg\",\"horsepower\"]` and then use the function `corr()`. Is there a positive or negative relationship between the two variables?\n",
" - Provide a plot with *horsepower* on the x-axis and *mpg* on the y-axis. Do you think a linear regression model is well suited to predict *mpg* using *horsepower* ?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4934957d-d920-4191-aa41-71fbadbe4b62",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "raw",
"id": "b7289365-b358-470b-b10a-f5ba082a8ab2",
"metadata": {
"tags": []
},
"source": [
"\\newpage"
]
},
{
"cell_type": "markdown",
"id": "02902876-5944-4612-973d-512bbb27fd4e",
"metadata": {},
"source": [
"## Task 3: Working with external data\n",
"\n",
"1. Load the dataset `return_data.csv` which contains historical returns of Apple (*ret_apple*), the index return of the *S\\&P500* which is a broad portfolio of stocks in the US (*ret_index*), as well as the return of a riskless investment in government bonds (*rf*). Make sure that you set the right working director when you try to load in the data. In the dataset, a number of $0.1$ corresponds to a return of $10\\%$."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db6354dd-52e3-462a-bac3-d4cc08d541ca",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "039801c1-3a1d-4870-94ba-662f23f762fe",
"metadata": {},
"source": [
"2. To get to know the data, construct three plots each having the date on the x-axis and the respective return time series on the y-axis."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0f01a54-1571-409f-bf7b-080f749f874c",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "7b3745c5-4b7b-4118-abec-6d2b87af06d0",
"metadata": {},
"source": [
"3. Compute the means and the standard deviations of the three time series and interpret the results."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "36dde5ec-bc8c-4280-8385-420a06b97d1f",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "c91066a0-28a6-45fe-b036-03fdd2c79362",
"metadata": {},
"source": [
"4. What was the maximum loss in a single month when holding Apple stocks? What are the maximum losses for the *S\\&P500* and the risk-free rate? Interpret."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c4ba196-b0d3-4841-95f9-995f4e127c33",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "30404916-65d2-40e3-be36-b0edb762db49",
"metadata": {},
"source": [
"5. Compute the pearson correlation between *ret_apple* and *ret_index* using the function `cor()`. Interpret the result."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5b0de57e-72cb-46cf-99cf-dd348f59ba55",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"date": " ",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
},
"title": " ",
"toc-autonumbering": false,
"toc-showcode": false,
"toc-showmarkdowntxt": false,
"toc-showtags": false
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,21 @@
import numpy as np
# set seed
np.random.seed(1)
# Number of observations in the dataset
n = len(default_data)
# Randomly shuffle the indices of the dataset
indices = np.random.permutation(n)
# Compute training and validation sample sizes
nT = int(0.7 * n) # Training sample size
# Split the dataset based on shuffled indices
n_train = indices[:nT] # First 70% for training
n_test = indices[nT:] # Remaining 30% for validation
# Create training and validation datasets
train_data = default_data.iloc[n_train]
test_data = default_data.iloc[n_test]