new inclusion of the second step: python-exercises
This commit is contained in:
@@ -0,0 +1,425 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "6cbef61b-0897-42bf-b456-c0a409b87c41",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\\vspace{-4cm}\n",
|
||||
"\\begin{center}\n",
|
||||
" \\LARGE{Machine Learning for Economics and Finance}\\\\[0.5cm]\n",
|
||||
" \\Large{\\textbf{Python Exercises}}\\\\[1.0cm]\n",
|
||||
" \\large{Ole Wilms}\\\\[0.5cm]\n",
|
||||
" \\large{April 24, 2024}\\\\\n",
|
||||
"\\end{center}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "13be77f3-44f0-4983-b4cb-bd3e4b5dba8b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\\setcounter{secnumdepth}{0}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "a4c564a3-8712-4601-84b4-72b51df8bbbf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\\tableofcontents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "040dc2a4-910e-4cf5-9d1e-62fe7d0a8efd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Important Instructions\n",
|
||||
" - The purpose of these exercises is to get to know Python by solving some basic programming exercises\n",
|
||||
" - In case you struggle with some problems, please post your questions on the OpenOlat Forum.\n",
|
||||
" - Particularly difficult questions are marked by $\\color{red}{\\text{(D)}}$. Don’t worry if you cannot solve these questions right away. Throughout the course, these programming concepts will become easier to understand.\n",
|
||||
" - Sample solutions to the exercises will be provided next week. However, I strongly encourage all students to work on the exercises beforehand."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "d1a6cda1-d74f-4a81-8c17-cdd83a0dae17",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\\newpage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "87902d82-5336-456b-bec8-403530c75f00",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Task 1: Constructing a dataset\n",
|
||||
"\n",
|
||||
"1. Create different kinds of vectors with $6$ entries each:\n",
|
||||
" - vector $a$: a vector with only ones (hint: you can use the `np.repeat()` function)\n",
|
||||
" - vector $b$: a vector of integers that goes from $1$ to $6$ (hint: you can use the `np.arange()` function)\n",
|
||||
" - vector $c$: a vector where each entry is drawn from a normal distribution with mean $2$ standard deviation $5$.\n",
|
||||
" - vector $d$: a vector where each entry consists of one of the words in \"*Machine Learning for Economics and Finance*\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f1cf1749-9e5b-434a-8f45-5d63db20ee2a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "73330b81-0e43-43ac-911f-4086a9f9788f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"2. Stack vector $b$ into a matrix $M1$ of dimension $2$ x $3$ where you fill in by column. Stack the same vector into a matrix $M2$ of dimension $3$ x $2$ where you fill in by row."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4c658a6a-1c6a-4350-9c4f-6afdd4dbaa7c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "80e4160e-374a-43e1-a159-45077703658e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"3. Add the two matrices. You will obtain an error message. What’s going wrong? Solve the problem using the transpose function `np.transpose()`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "cb851b64-3518-406d-be06-46721a6eda01",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "03d19235-25ee-4c3b-b7bf-97cdf27d41b2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"4. Create a vector *train_sample* with $4$ entries by randomly sampling $4$ values from vector $b$ without replacement (that is, you cannot draw the same number twice). For this you can use the function `np.random.choice()`. Run the code that creates the vector multiple times. Explain what’s happening. Fix the issue by using the function `np.random.seed()`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "81aff077-3d61-468c-a872-9006f75af9e6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "79732a93-d610-4d49-9bf0-a03b3f4edf22",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"5. Put vectors $a$, $b$, $c$ and $d$ together in a dataframe called *df*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "849fa290-26b8-44de-815e-59095fc3dd61",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "919dde6d-4ff0-481a-a0d8-9413abe8f56a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"6. Name the columns of *df* *’Ones’*, *’Seq’*, *’Normal’* and *’Coursename’* respectively (hint: you can use the function `pd.DataFrame()`). Provide a summary of the dataframe using the `describe()`function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "55cc73a2-17c7-4e5c-80c3-f9badf83bfce",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ada39fc4-a156-40e6-9281-9754302d2ae7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"7. $\\color{red}{\\text{(D)}}$ Add a column called *’Int’* to the dataframe which checks whether column *’Normal’* is larger than $0$. If that is the case *’Int’* should contain a *TRUE*, if that is not the case *’Int’* should contain a FALSE. Proceed as follows:\n",
|
||||
" - Create a new column named *'Int'* in the DataFrame, initializing all elements to True. Use a loop to iterate through each row of the DataFrame. For each row, check if the corresponding value in the *'Normal'* column is greater than $0$. If it is, retain the *TRUE* value in the *'Int'* column; otherwise, replace it with *FALSE*.\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "59650599-11ed-4be4-8e21-4737642634db",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b9f909ae-9a0e-4a69-a5f5-5f1eacb6bc2e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"8. $\\color{red}{\\text{(D)}}$ Can you think of an easier way to construct the column *’Int’* instead of the loop described above? If yes, add this column and call it *’Int2’*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a37153ee-cee2-4591-84a0-d57292ec4610",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "20e52fac-725f-4b85-a6dd-6d70ea890928",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"9. $\\color{red}{\\text{(D)}}$ Now we use our vector *train_sample* to construct two distinct datasets from *df*. The numbers in *train_sample* refer to the rows of our dataframe *df* that we want to use for the first dataset while all other rows can be used for the second dataset. Construct a new dataframe called *df_train* that only contains the rows in *train_sample*. Note that you can simply use square brackets to extract rows from a dataframe. Make sure that you extract all columns but only the rows that are in *train_sample*. Your object *df_train* should have $4$ rows and as many columns as *df*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dcb74cc8-21d7-4321-acf3-c2ea7ef5356e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "27a77f7f-437c-4d16-b34d-07dda30e2ac7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"10. $\\color{red}{\\text{(D)}}$ Construct another dataframe called *df_test* which contains the other two rows of *df* that are not in *df_train*. Note that you can use `~df.index.isin()` to select all rows that are *NOT* in *train_sample*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d519d6f8-ebe6-47e7-b135-7c74c0b1f4f5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "3ba17c73-a83f-43fa-8f29-3b773e25887b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"\\newpage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "df4f7f10-2779-43ab-a7b0-3bd1b3f15b0c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Task 2: Working data from the *ISLR2* library\n",
|
||||
"\n",
|
||||
"1. Install and load the library *ISLP*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "551285b4-ef00-4be0-8000-ceac1ca7742e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "45467793-413b-4441-8c43-3e4a613451c9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"2. Load the dataset *Auto* and save it into an object called *Auto*. Use the help function to obtain information about the variables in *Auto*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f55378d0-ff39-4533-89ec-59582fdace34",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f3d8420f-8986-4b9e-ac8b-bfd42cd9cd8a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"3. Provide a summary of *Auto* using the `describe()` function. Do you think all the variables in *Auto* could be readily used for a linear regression model?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "abe5c34d-9f95-49bb-b9bb-0f1c0745a7f1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7870bbb9-e5cd-4fcc-bb2d-d33e80b2c8d2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"4. The goal of the following exercises is to understand the relation between the variable *’mpg’* and *’horsepower’*:\n",
|
||||
" - Provide a histogram of *’mpg’* using the function `hist()`. Hint: For creating plots and visualizations, the `matplotlib` package is a common choice.\n",
|
||||
" - Compute the pearson correlation between *’mpg’* and *’horsepower’*. For this, first select the two respective columns using `Auto[\"mpg\",\"horsepower\"]` and then use the function `corr()`. Is there a positive or negative relationship between the two variables?\n",
|
||||
" - Provide a plot with *’horsepower’* on the x-axis and *’mpg’* on the y-axis. Do you think a linear regression model is well suited to predict *’mpg’* using *’horsepower’* ?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4934957d-d920-4191-aa41-71fbadbe4b62",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "b7289365-b358-470b-b10a-f5ba082a8ab2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"\\newpage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02902876-5944-4612-973d-512bbb27fd4e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Task 3: Working with external data\n",
|
||||
"\n",
|
||||
"1. Load the dataset `’return_data.csv’` which contains historical returns of Apple (*’ret_apple’*), the index return of the *S\\&P500* which is a broad portfolio of stocks in the US (*’ret_index’*), as well as the return of a riskless investment in government bonds (*’rf’*). Make sure that you set the right working director when you try to load in the data. In the dataset, a number of $0.1$ corresponds to a return of $10\\%$."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "db6354dd-52e3-462a-bac3-d4cc08d541ca",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "039801c1-3a1d-4870-94ba-662f23f762fe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"2. To get to know the data, construct three plots each having the date on the x-axis and the respective return time series on the y-axis."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f0f01a54-1571-409f-bf7b-080f749f874c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7b3745c5-4b7b-4118-abec-6d2b87af06d0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"3. Compute the means and the standard deviations of the three time series and interpret the results."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "36dde5ec-bc8c-4280-8385-420a06b97d1f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c91066a0-28a6-45fe-b036-03fdd2c79362",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"4. What was the maximum loss in a single month when holding Apple stocks? What are the maximum losses for the *S\\&P500* and the risk-free rate? Interpret."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4c4ba196-b0d3-4841-95f9-995f4e127c33",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "30404916-65d2-40e3-be36-b0edb762db49",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"5. Compute the pearson correlation between *’ret_apple’* and *’ret_index’* using the function `cor()`. Interpret the result."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5b0de57e-72cb-46cf-99cf-dd348f59ba55",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"date": " ",
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.8"
|
||||
},
|
||||
"title": " ",
|
||||
"toc-autonumbering": false,
|
||||
"toc-showcode": false,
|
||||
"toc-showmarkdowntxt": false,
|
||||
"toc-showtags": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
BIN
Machine Learning for Economics and Finance/python-exercises/python-exercises.pdf
Executable file
BIN
Machine Learning for Economics and Finance/python-exercises/python-exercises.pdf
Executable file
Binary file not shown.
File diff suppressed because one or more lines are too long
Binary file not shown.
@@ -0,0 +1,21 @@
|
||||
import numpy as np
|
||||
|
||||
# set seed
|
||||
np.random.seed(1)
|
||||
|
||||
# Number of observations in the dataset
|
||||
n = len(default_data)
|
||||
|
||||
# Randomly shuffle the indices of the dataset
|
||||
indices = np.random.permutation(n)
|
||||
|
||||
# Compute training and validation sample sizes
|
||||
nT = int(0.7 * n) # Training sample size
|
||||
|
||||
# Split the dataset based on shuffled indices
|
||||
n_train = indices[:nT] # First 70% for training
|
||||
n_test = indices[nT:] # Remaining 30% for validation
|
||||
|
||||
# Create training and validation datasets
|
||||
train_data = default_data.iloc[n_train]
|
||||
test_data = default_data.iloc[n_test]
|
||||
Reference in New Issue
Block a user