From f5a576b1d51f7ad622971f03fb12704723b100e8 Mon Sep 17 00:00:00 2001 From: Marcel Weschke Date: Sun, 14 Apr 2024 22:17:16 +0200 Subject: [PATCH] Upload files to "Machine Learning for Economics and Finance/Problem Set 1" --- .../Problem Set 1/Problem_Set_1.ipynb | 1563 +++++++++++++++++ .../Problem Set 1/stockmarketdata.rds | Bin 0 -> 19065 bytes 2 files changed, 1563 insertions(+) create mode 100644 Machine Learning for Economics and Finance/Problem Set 1/Problem_Set_1.ipynb create mode 100644 Machine Learning for Economics and Finance/Problem Set 1/stockmarketdata.rds diff --git a/Machine Learning for Economics and Finance/Problem Set 1/Problem_Set_1.ipynb b/Machine Learning for Economics and Finance/Problem Set 1/Problem_Set_1.ipynb new file mode 100644 index 0000000..0bd4c65 --- /dev/null +++ b/Machine Learning for Economics and Finance/Problem Set 1/Problem_Set_1.ipynb @@ -0,0 +1,1563 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "5b78890f-9175-49cf-be30-04d1557b9fd0", + "metadata": {}, + "source": [ + "

Maschinelles Lernen für Wirtschaft und Finanzen

Universität Hamburg

\n", + "

Aufgabenserie 1

" + ] + }, + { + "cell_type": "markdown", + "id": "66280676-a99a-4e20-98a7-c727bacc0dcd", + "metadata": { + "tags": [] + }, + "source": [ + "## Inhaltsverzeichnis:\n", + "* [Vorab](#Vorab)\n", + "* [Aufbau](#Aufbau)\n", + "* [Aufgabe 1](#Aufgabe-1)\n", + "* [Aufgabe 2](#Aufgabe-2)\n", + "* [Aufgabe 3](#Aufgabe-3)\n", + "* [Anhang](#Anhang)\n", + "* [Literatur](#Literatur)" + ] + }, + { + "cell_type": "markdown", + "id": "040dc2a4-910e-4cf5-9d1e-62fe7d0a8efd", + "metadata": {}, + "source": [ + "### Vorab \n", + "- Ziel dieses Tutorials ist es, einige der wichtigsten Konzepte zu üben, die in den ersten Wochen des ML Kurses behandelt werden." + ] + }, + { + "cell_type": "markdown", + "id": "baac6966-d67a-4a66-acec-8ef6411c4f66", + "metadata": {}, + "source": [ + "### Aufbau \n", + "\n", + "**Die Hauptaufgabe dieses Problemsets besteht darin, die Rendite des US-Aktienmarktes vorherzusagen.** Dazu verwenden wir den Datensatz ```stockmarketdata.RDS``` von Welch und Goyal (2007), der auf *OpenOlat* verfügbar ist. Der Datensatz enthält vierteljährliche Renditen des US-Aktienmarktes ($ret$) sowie mehrere andere Variablen, die von Finanzforschern vorgeschlagen wurden, um Aktienrenditen vorherzusagen. Eine Liste aller Variablen zusammen mit einer Beschreibung finden Sie im Anhang. Für das erste Quartal 1999 (*date = 19991*) enthält er beispielsweise Variablen wie die Rendite des Aktienmarktes ($ret_{t}$), das Dividenden-zu-Preis-Verhältnis ($DP_{t}$), den Kreditspread ($CS_{t}$) und so weiter. Da das Ziel darin besteht, Renditen im folgenden Quartal vorherzusagen, interessieren uns Modelle der Form\n", + "\n", + "$ret_{t+1} = f (DP_{t}, CS_{t}, ...) + ϵ_{t+1}$\n", + "\n", + "Angenommen, Sie sind ein Vermögensverwalter und es ist Ende 1994, dh Sie haben alle Daten vor 1995 zur Verfügung, um Ihr Modell zu trainieren und zu validieren. Ihr Ziel ist es, ein Modell zu entwickeln, das nicht nur in der Stichprobe funktioniert, sondern auch zukünftige Renditen vorhersagen kann (nach 1995)." + ] + }, + { + "cell_type": "markdown", + "id": "87902d82-5336-456b-bec8-403530c75f00", + "metadata": { + "tags": [] + }, + "source": [ + "## Aufgabe 1: Vorbereitung und Analyse der Daten \n", + "\n", + "1. Zuerst müssen wir die Daten so ausrichten, dass eine Zeile, die die Merkmale für das Datum t enthält, die Rendite für das Datum $t + 1$ enthält (anstelle der Rendite für das Datum t, wie es derzeit der Fall ist). Dadurch wird sichergestellt, dass wir tatsächlich die Renditen für das nächste Quartal vorhersagen. Hierfür müssen wir die Zeitreihe der Rendite um einen Zeitraum nach vorne verschieben. (Hinweis: Verwenden Sie die Funktion ```shift()```, um eine neue Variable zum Dataframe hinzuzufügen, die die Rendite um einen Zeitraum verschiebt. Entfernen Sie anschließend die alte Zeitreihe der Rendite aus dem Dataframe.)\n", + " \n", + "2. Entfernen Sie alle Zeilen, die fehlende Werte im Datensatz enthalten. Google bietet viele verschiedene Möglichkeiten, wie Sie dies tun können. Wenn Sie Schwierigkeiten mit dieser Übung haben, tun Sie Folgendes: Verwenden Sie die kombination der Funktionen ```.isna().sum()``` auf das DataFrame, um die Summe aller Zeilen zu bestimmen, die fehlende Werte enthalten. Finden Sie die fehlenden Werte für diese Variablen durch Augeninspektion. Beginnen und beenden Sie die Stichprobe so, dass diese Zeilen mit fehlenden Werten nicht enthalten sind. Verwenden Sie anschließend erneut die Funktion ```.isna().sum()```, um sicherzustellen, dass Sie alle fehlenden Werte entfernt haben.\n", + "\n", + "3. Teilen Sie die Stichprobe in zwei Teile auf. Daten vor 1995 für das Training und die Validierung und Daten nach und einschließlich 1995 für das Out-of-Sample-Testen.\n", + "\n", + "4. Berechnen Sie die durchschnittliche vierteljährliche Rendite und ihre Standardabweichung in den Trainings- und Testdaten. Gibt es etwas, das erwähnenswert ist?\n", + "\n", + "5. Berechnen Sie die Korrelationsmatrix für die Trainingsdaten (einschließlich sowohl der Ergebnisse als auch der Merkmale). Gibt es etwas, das erwähnenswert ist?" + ] + }, + { + "cell_type": "markdown", + "id": "c53eedac-cd76-4649-aebc-dc0c0d26c63e", + "metadata": {}, + "source": [ + "### Vorbereitung: \n", + " - Einlesen und Grundanpassungen der ```stockmarketdata.rds``` Daten" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "44ad3d11-abe5-4366-91dc-ac319197b93c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dateretDPCSntiscayTSsvar
019291.00.050490-3.3676880.0103570.079805NaN-0.00830.007982
119292.00.087235-3.4128510.0111050.116197NaN-0.01130.008405
219293.00.091067-3.4683920.0125170.121390NaN-0.00830.008056
319294.0-0.268418-3.0961840.0121550.163522NaN0.00370.100171
419301.00.165884-3.2523450.0105540.145496NaN0.00400.004662
\n", + "
" + ], + "text/plain": [ + " date ret DP CS ntis cay TS svar\n", + "0 19291.0 0.050490 -3.367688 0.010357 0.079805 NaN -0.0083 0.007982\n", + "1 19292.0 0.087235 -3.412851 0.011105 0.116197 NaN -0.0113 0.008405\n", + "2 19293.0 0.091067 -3.468392 0.012517 0.121390 NaN -0.0083 0.008056\n", + "3 19294.0 -0.268418 -3.096184 0.012155 0.163522 NaN 0.0037 0.100171\n", + "4 19301.0 0.165884 -3.252345 0.010554 0.145496 NaN 0.0040 0.004662" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import warnings # Paket zum Unterdrücken von \"warnings\"\n", + "import pyreadr # Paket zum einlesen von RDS Datein - https://github.com/ofajardo/pyreadr\n", + "\n", + "df = pyreadr.read_r('stockmarketdata.rds')\n", + "df = df[None] # Extrahieren des verfügbaren Pandas-DataFrame Objekts.\n", + "\n", + "df.head() # Zeige die ersten 5 Zeilen des DataFrames an." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "6d3ccd77-5c88-4a7a-9225-efd36768d36d", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dateretDPCSntiscayTSsvar
3602019-Q10.137489-3.9434000.010258-0.023230-0.0393360.00170.004651
3612019-Q20.042688-3.9600330.010006-0.012562-0.033844-0.00100.003271
3622019-Q30.017042-3.9516890.008505-0.010862-0.029529-0.00190.005517
3632019-Q40.090143-4.0158960.008410-0.007222-0.0336090.00320.002319
3642020-Q1-0.193794-3.7699920.012252-0.007731-0.0501410.00580.079049
\n", + "
" + ], + "text/plain": [ + " date ret DP CS ntis cay TS \\\n", + "360 2019-Q1 0.137489 -3.943400 0.010258 -0.023230 -0.039336 0.0017 \n", + "361 2019-Q2 0.042688 -3.960033 0.010006 -0.012562 -0.033844 -0.0010 \n", + "362 2019-Q3 0.017042 -3.951689 0.008505 -0.010862 -0.029529 -0.0019 \n", + "363 2019-Q4 0.090143 -4.015896 0.008410 -0.007222 -0.033609 0.0032 \n", + "364 2020-Q1 -0.193794 -3.769992 0.012252 -0.007731 -0.050141 0.0058 \n", + "\n", + " svar \n", + "360 0.004651 \n", + "361 0.003271 \n", + "362 0.005517 \n", + "363 0.002319 \n", + "364 0.079049 " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "# Definierung einer Funktion, um die numerische Darstellung in das Format 'JJJJ-QX' zu erreichen.\n", + "def convert_to_quarterly_date(numeric_date):\n", + " year = int(numeric_date) // 10 # Ziehen der Information des Jahres\n", + " quarter = int(numeric_date) % 10 # Ziehen der Information des Quartals\n", + " quarter_str = f'Q{quarter}' # Umwandling der \"float\" Quartal Daten zu \"string\" Format\n", + " return f'{year}-Q{quarter}' # Rückgabe der überschriebenen Schreibweise\n", + "\n", + "# Anwendung der Fuktion auf die Variable \"date\".\n", + "df['date'] = df['date'].apply(convert_to_quarterly_date)\n", + "\n", + "df.tail()" + ] + }, + { + "cell_type": "markdown", + "id": "f36b947b-7ab0-46f5-82d6-e8df53ae591f", + "metadata": {}, + "source": [ + "### Aufgabe 1.1:\n", + " - Anpassung der Rendite Zeitdaten" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "f3725bbe-1708-4559-b7b9-fa975a09083f", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dateDPCSntiscayTSsvarret_next
3602019-Q1-3.9434000.010258-0.023230-0.0393360.00170.0046510.042688
3612019-Q2-3.9600330.010006-0.012562-0.033844-0.00100.0032710.017042
3622019-Q3-3.9516890.008505-0.010862-0.029529-0.00190.0055170.090143
3632019-Q4-4.0158960.008410-0.007222-0.0336090.00320.002319-0.193794
3642020-Q1-3.7699920.012252-0.007731-0.0501410.00580.079049NaN
\n", + "
" + ], + "text/plain": [ + " date DP CS ntis cay TS svar \\\n", + "360 2019-Q1 -3.943400 0.010258 -0.023230 -0.039336 0.0017 0.004651 \n", + "361 2019-Q2 -3.960033 0.010006 -0.012562 -0.033844 -0.0010 0.003271 \n", + "362 2019-Q3 -3.951689 0.008505 -0.010862 -0.029529 -0.0019 0.005517 \n", + "363 2019-Q4 -4.015896 0.008410 -0.007222 -0.033609 0.0032 0.002319 \n", + "364 2020-Q1 -3.769992 0.012252 -0.007731 -0.050141 0.0058 0.079049 \n", + "\n", + " ret_next \n", + "360 0.042688 \n", + "361 0.017042 \n", + "362 0.090143 \n", + "363 -0.193794 \n", + "364 NaN " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Verschiebung der Spalte Rendite um einen Zeitraum t+1.\n", + "df['ret_next'] = df['ret'].shift(-1)\n", + "\n", + "# Entfernen der Spalte \"ret\" aus dem DataFrame.\n", + "df.drop('ret', axis=1, inplace=True)\n", + "\n", + "df.tail()" + ] + }, + { + "cell_type": "markdown", + "id": "73330b81-0e43-43ac-911f-4086a9f9788f", + "metadata": {}, + "source": [ + "### Aufgabe 1.2:\n", + " - Anpassung fehlender Datenwerte im DataFrame" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "5c083b5f-f0d4-4fe5-8824-604a073c1215", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "date 0\n", + "DP 0\n", + "CS 0\n", + "ntis 0\n", + "cay 92\n", + "TS 0\n", + "svar 0\n", + "ret_next 1\n", + "dtype: int64" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Aufzählung aller NaN's (je Variable) des DataFrame.\n", + "df.isna().sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "2c0b17c8-a060-4687-8047-83abcf22ae46", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "date 0\n", + "DP 0\n", + "CS 0\n", + "ntis 0\n", + "cay 0\n", + "TS 0\n", + "svar 0\n", + "ret_next 0\n", + "dtype: int64" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "\n", + "# Verwerfen aller Zeilen, bei denen Variablen eine NaN Zelle besitzen.\n", + "df = df.dropna()\n", + "df.isna().sum()" + ] + }, + { + "cell_type": "markdown", + "id": "80e4160e-374a-43e1-a159-45077703658e", + "metadata": { + "tags": [] + }, + "source": [ + "### Aufgabe 1.3:\n", + " - Aufteilung des Datensatzes in zwei Teile. Training Datensatz mit Daten vor 1995 und Validierung Datensatz mit den Daten nach 1995 als Out-of-Sample Datensatz." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "b27a4ab6-fb98-4d05-ad9e-340731f68d68", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "train_data besitzt 172 Beobachtung.\n", + "test_data besitzt 100 Beobachtung.\n" + ] + } + ], + "source": [ + "# Erstellen von Variablen mit der Information \"1994-Q4\" und der Position, welche als Schnittpunkt dienen.\n", + "split_date = '1994-Q4'\n", + "split_ind = df.index[df['date'] == split_date][0]\n", + "\n", + "# Aufteilung der Daten zu \"train_data\" und \"test_data\".\n", + "train_data = df.loc[:split_ind] # für in-sample tests\n", + "test_data = df.loc[split_ind + 1:] # für out-of-sample tests\n", + "\n", + "print(f\"train_data besitzt {len(train_data)} Beobachtung.\")\n", + "print(f\"test_data besitzt {len(test_data)} Beobachtung.\")" + ] + }, + { + "cell_type": "markdown", + "id": "03d19235-25ee-4c3b-b7bf-97cdf27d41b2", + "metadata": {}, + "source": [ + "### Aufgabe 1.4:\n", + " - Berechnung der durchschnittlichen vierteljährlichen Rendite und ihre Standardabweichung in den Trainings- und Testdaten." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a6833298-ab95-4596-85cd-5c4d9666037c", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Trainingsdaten:\n", + "Durchschnittliche Rendite: 0.0306\n", + "Standardabweichung der Rendite: 0.0763\n", + "\n", + "Testdaten:\n", + "Durchschnittliche Rendite: 0.0252\n", + "Standardabweichung der Rendite: 0.0823\n" + ] + } + ], + "source": [ + "train_mean_ret = train_data['ret_next'].mean() # durchschnittliche vierteljährlichen Rendite (train_data)\n", + "train_std_ret = train_data['ret_next'].std() # Standardabweichung der vierteljährlichen Rendite (train_data)\n", + "\n", + "test_mean_ret = test_data['ret_next'].mean() # durchschnittliche vierteljährlichen Rendite (test_data)\n", + "test_std_ret = test_data['ret_next'].std() # Standardabweichung der vierteljährlichen Rendite (test_data)\n", + "\n", + "# Ausgabe der Ergebnisse\n", + "print(\"Trainingsdaten:\")\n", + "print(f\"Durchschnittliche Rendite: {train_mean_ret:.4f}\")\n", + "print(f\"Standardabweichung der Rendite: {train_std_ret:.4f}\")\n", + "print(\"\\nTestdaten:\")\n", + "print(f\"Durchschnittliche Rendite: {test_mean_ret:.4f}\")\n", + "print(f\"Standardabweichung der Rendite: {test_std_ret:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "79732a93-d610-4d49-9bf0-a03b3f4edf22", + "metadata": {}, + "source": [ + "### Aufgabe 1.5:\n", + " - Berechnung der Korrelationsmatrix für die Trainingsdaten (einschließlich sowohl der Ergebnisse als auch der Merkmale)." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "1b390010-0b60-4bb0-873f-786c93fc34e5", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Korrelationsmatrix für Trainingsdaten:\n", + " DP CS ntis cay TS svar ret_next\n", + "DP 1.00 0.38 -0.12 -0.21 -0.14 0.11 0.23\n", + "CS 0.38 1.00 -0.31 -0.02 0.21 0.22 0.18\n", + "ntis -0.12 -0.31 1.00 -0.40 -0.07 -0.12 -0.19\n", + "cay -0.21 -0.02 -0.40 1.00 0.46 0.04 0.17\n", + "TS -0.14 0.21 -0.07 0.46 1.00 0.08 0.16\n", + "svar 0.11 0.22 -0.12 0.04 0.08 1.00 0.13\n", + "ret_next 0.23 0.18 -0.19 0.17 0.16 0.13 1.00\n" + ] + } + ], + "source": [ + "# Berechnung der Korrelationsmatrix für die Trainingsdaten\n", + "train_cor_matrix = train_data.loc[:, train_data.columns != 'date'].corr(method='pearson') # \"date\" Spalte ausgelassen (Nur numerische Spalten)\n", + "\n", + "# Ausgabe der Korrelationsmatrix. Werte auf zwei Nachkommastellen aufgerundet.\n", + "print(\"Korrelationsmatrix für Trainingsdaten:\")\n", + "print(round(train_cor_matrix,2))" + ] + }, + { + "cell_type": "markdown", + "id": "dd530aab-33af-4c70-bfce-193e32d49aed", + "metadata": {}, + "source": [ + " - **Zusatz**: Grafische Abbildung der Korrelationsmatrix" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "e2727ae0-ab97-4ae4-b7cb-8b3e957ccda5", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "import seaborn as sns \n", + "import matplotlib.pyplot as plt \n", + "\n", + "fig, ax = plt.subplots(figsize=(10,8))\n", + "s = sns.heatmap(train_data.loc[:, train_data.columns != 'date'].corr(),\n", + " annot=True, \n", + " center=0,\n", + " linewidths=.5, \n", + " square=True,\n", + " vmin=-1,\n", + " vmax=1, \n", + " xticklabels='auto',\n", + " yticklabels='auto', \n", + " fmt='0.2f',\n", + " cmap=\"coolwarm\")\n", + "s.set_title('Korrelationsmatrix der trainingsdaten Variablen')\n", + "s.set(xlabel='Variablen', ylabel='Variablen')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "e715dd42-7021-466d-a9c1-0c0b4efeee78", + "metadata": {}, + "source": [ + "## Aufgabe 2: Vorhersage von Renditen \n", + "\n", + "Nachdem die Daten bereinigt wurden, sind Sie bereit, das erste Modell zur Vorhersage von Renditen zu erstellen.\n", + "\n", + "1. Verwenden Sie die Trainingsdaten, um ein lineares Modell unter Verwendung aller Variablen (stellen Sie sicher, dass Sie die Datumsvariable ausschließen) anzupassen. Welche Merkmale sind nützlich für die Vorhersage von Renditen?\n", + "\n", + "2. Berechnen Sie das R² in der Stichprobe sowie den mittleren quadratischen Fehler. Glauben Sie, dass vierteljährliche Renditen leicht vorhergesagt werden können?\n", + "\n", + "3. Verwenden Sie eine 5-fache Kreuzvalidierung, um eine Schätzung für den mittleren quadratischen Fehler außerhalb der Stichprobe zu erhalten. Vergleichen Sie diese Schätzung mit dem mittleren quadratischen Fehler in der Stichprobe aus \"Aufgabe 2.2\".\n", + "\n", + "4. Basierend auf Ihren Ergebnissen aus \"Aufgabe 2.1\" wählen Sie nur eine Teilmenge der Merkmale aus, um Ihr Modell zu verbessern. Welche Merkmale wählen Sie aus und warum? Berechnen Sie das R² in der Stichprobe sowie den mittleren quadratischen Fehler für dieses Modell und verwenden Sie eine 5-fache Kreuzvalidierung, um eine Schätzung für den mittleren quadratischen Fehler außerhalb der Stichprobe zu erhalten. Vergleichen Sie Ihre Ergebnisse mit dem Modell unter Verwendung aller Merkmale.\n", + "\n", + "5. Angenommen, Sie verwenden die beiden Modelle, die Sie erstellt haben, um vierteljährliche Renditen in den nächsten 25 Jahren vorherzusagen. Berechnen Sie die mittleren quadratischen Fehler außerhalb der Stichprobe für die Testdaten. Vergleichen Sie diese Fehler mit den Schätzungen für die außerhalb der Stichprobe erhaltenen Fehler aus der k-fachen Kreuzvalidierung. Interpretieren Sie." + ] + }, + { + "cell_type": "markdown", + "id": "db75b418-3630-4afe-a80f-b1f2893ba0c6", + "metadata": {}, + "source": [ + "### Aufgabe 2.1:\n", + " - MLR unter Verwendung aller Variablen des Trainingdatensatzes" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "1d09cd12-bbe1-41a8-8b09-15588090f243", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "
OLS Regression Results
Dep. Variable: ret_next R-squared: 0.129
Model: OLS Adj. R-squared: 0.097
Method: Least Squares F-statistic: 4.063
Date: Sun, 14 Apr 2024 Prob (F-statistic): 0.000793
Time: 20:47:06 Log-Likelihood: 210.80
No. Observations: 172 AIC: -407.6
Df Residuals: 165 BIC: -385.6
Df Model: 6
Covariance Type: nonrobust
\n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "
coef std err t P>|t| [0.025 0.975]
intercept 0.2968 0.097 3.063 0.003 0.106 0.488
DP 0.0839 0.028 3.031 0.003 0.029 0.139
CS 0.4750 1.739 0.273 0.785 -2.958 3.908
ntis -0.3945 0.410 -0.961 0.338 -1.205 0.416
cay 0.4215 0.306 1.379 0.170 -0.182 1.025
TS 0.6310 0.482 1.309 0.192 -0.320 1.583
svar 0.8027 0.828 0.969 0.334 -0.832 2.438
\n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "
Omnibus: 26.211 Durbin-Watson: 1.810
Prob(Omnibus): 0.000 Jarque-Bera (JB): 40.178
Skew: -0.823 Prob(JB): 1.89e-09
Kurtosis: 4.701 Cond. No. 1.10e+03


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.1e+03. This might indicate that there are
strong multicollinearity or other numerical problems." + ], + "text/latex": [ + "\\begin{center}\n", + "\\begin{tabular}{lclc}\n", + "\\toprule\n", + "\\textbf{Dep. Variable:} & ret\\_next & \\textbf{ R-squared: } & 0.129 \\\\\n", + "\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.097 \\\\\n", + "\\textbf{Method:} & Least Squares & \\textbf{ F-statistic: } & 4.063 \\\\\n", + "\\textbf{Date:} & Sun, 14 Apr 2024 & \\textbf{ Prob (F-statistic):} & 0.000793 \\\\\n", + "\\textbf{Time:} & 20:47:06 & \\textbf{ Log-Likelihood: } & 210.80 \\\\\n", + "\\textbf{No. Observations:} & 172 & \\textbf{ AIC: } & -407.6 \\\\\n", + "\\textbf{Df Residuals:} & 165 & \\textbf{ BIC: } & -385.6 \\\\\n", + "\\textbf{Df Model:} & 6 & \\textbf{ } & \\\\\n", + "\\textbf{Covariance Type:} & nonrobust & \\textbf{ } & \\\\\n", + "\\bottomrule\n", + "\\end{tabular}\n", + "\\begin{tabular}{lcccccc}\n", + " & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n", + "\\midrule\n", + "\\textbf{intercept} & 0.2968 & 0.097 & 3.063 & 0.003 & 0.106 & 0.488 \\\\\n", + "\\textbf{DP} & 0.0839 & 0.028 & 3.031 & 0.003 & 0.029 & 0.139 \\\\\n", + "\\textbf{CS} & 0.4750 & 1.739 & 0.273 & 0.785 & -2.958 & 3.908 \\\\\n", + "\\textbf{ntis} & -0.3945 & 0.410 & -0.961 & 0.338 & -1.205 & 0.416 \\\\\n", + "\\textbf{cay} & 0.4215 & 0.306 & 1.379 & 0.170 & -0.182 & 1.025 \\\\\n", + "\\textbf{TS} & 0.6310 & 0.482 & 1.309 & 0.192 & -0.320 & 1.583 \\\\\n", + "\\textbf{svar} & 0.8027 & 0.828 & 0.969 & 0.334 & -0.832 & 2.438 \\\\\n", + "\\bottomrule\n", + "\\end{tabular}\n", + "\\begin{tabular}{lclc}\n", + "\\textbf{Omnibus:} & 26.211 & \\textbf{ Durbin-Watson: } & 1.810 \\\\\n", + "\\textbf{Prob(Omnibus):} & 0.000 & \\textbf{ Jarque-Bera (JB): } & 40.178 \\\\\n", + "\\textbf{Skew:} & -0.823 & \\textbf{ Prob(JB): } & 1.89e-09 \\\\\n", + "\\textbf{Kurtosis:} & 4.701 & \\textbf{ Cond. No. } & 1.10e+03 \\\\\n", + "\\bottomrule\n", + "\\end{tabular}\n", + "%\\caption{OLS Regression Results}\n", + "\\end{center}\n", + "\n", + "Notes: \\newline\n", + " [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. \\newline\n", + " [2] The condition number is large, 1.1e+03. This might indicate that there are \\newline\n", + " strong multicollinearity or other numerical problems." + ], + "text/plain": [ + "\n", + "\"\"\"\n", + " OLS Regression Results \n", + "==============================================================================\n", + "Dep. Variable: ret_next R-squared: 0.129\n", + "Model: OLS Adj. R-squared: 0.097\n", + "Method: Least Squares F-statistic: 4.063\n", + "Date: Sun, 14 Apr 2024 Prob (F-statistic): 0.000793\n", + "Time: 20:47:06 Log-Likelihood: 210.80\n", + "No. Observations: 172 AIC: -407.6\n", + "Df Residuals: 165 BIC: -385.6\n", + "Df Model: 6 \n", + "Covariance Type: nonrobust \n", + "==============================================================================\n", + " coef std err t P>|t| [0.025 0.975]\n", + "------------------------------------------------------------------------------\n", + "intercept 0.2968 0.097 3.063 0.003 0.106 0.488\n", + "DP 0.0839 0.028 3.031 0.003 0.029 0.139\n", + "CS 0.4750 1.739 0.273 0.785 -2.958 3.908\n", + "ntis -0.3945 0.410 -0.961 0.338 -1.205 0.416\n", + "cay 0.4215 0.306 1.379 0.170 -0.182 1.025\n", + "TS 0.6310 0.482 1.309 0.192 -0.320 1.583\n", + "svar 0.8027 0.828 0.969 0.334 -0.832 2.438\n", + "==============================================================================\n", + "Omnibus: 26.211 Durbin-Watson: 1.810\n", + "Prob(Omnibus): 0.000 Jarque-Bera (JB): 40.178\n", + "Skew: -0.823 Prob(JB): 1.89e-09\n", + "Kurtosis: 4.701 Cond. No. 1.10e+03\n", + "==============================================================================\n", + "\n", + "Notes:\n", + "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", + "[2] The condition number is large, 1.1e+03. This might indicate that there are\n", + "strong multicollinearity or other numerical problems.\n", + "\"\"\"" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import statsmodels.api as sm\n", + "# Alle nutzen außer - entfernen der \"Unabhängige Variablen\" und der \"qualitativen variablen /bzw. level Variablen\" - hier aber die \"date\" Variable!\n", + "\n", + "# Möglichkeit 1 - doch ohne weitere Anpassung problematisch:\n", + "#X = train_data.drop(columns=['date','ret_next'])\n", + "# DOCH dann fehlt aber der Intercept in der summary !!\n", + "\n", + "# Möglichkeit 2:\n", + "#X = pd.DataFrame({'intercept': np.ones(train_data.shape[0]), \n", + "# 'DP': train_data['DP'],\n", + "# 'CS': train_data['CS'],\n", + "# 'ntis': train_data['ntis'],\n", + "# 'cay': train_data['cay'],\n", + "# 'TS': train_data['TS'],\n", + "# 'svar': train_data['svar']\n", + "# })\n", + "\n", + "# Möglichkeit 3: finale Lösung - Füllung der Exogenen Variable\n", + "# Erstellen der Modell-Matrix:\n", + "X = train_data.drop(columns=['date','ret_next']) # Entfernen der endogenen und qualitativen Variablen\n", + "X.insert(0, 'intercept', np.ones(train_data.shape[0]))\n", + "\n", + "y = train_data['ret_next'] # Setzen der Endogene Variable\n", + "model = sm.OLS(y, X) # OLS Modell füllen.\n", + "fit_lm = model.fit() # Fit des univariaten linearen Regressionsmodells.\n", + "fit_lm.summary() # Ausgabe der Modellstatistik" + ] + }, + { + "cell_type": "markdown", + "id": "3069027d-f53f-4348-8c0c-0885483dc8d9", + "metadata": { + "tags": [] + }, + "source": [ + "### Aufgabe 2.2:\n", + " - $R^{2}$-Wert des Trainingdatatensatzes." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "69ae4d7d-16a9-436a-9cfc-1b087a563db8", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Das train_data R^2 beträgt: 0.1287\n" + ] + } + ], + "source": [ + "print(f\"Das train_data R^2 beträgt: {fit_lm.rsquared:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "d72a3a95-c163-4cbc-8bbb-0dd02b2c8902", + "metadata": {}, + "source": [ + " - Bestimmung des in-sample MSE Wertes." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "3c0eb59d-dc8a-4d44-bef2-1b2177a7166f", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "in-sample (MSE): 0.0050\n" + ] + } + ], + "source": [ + "#from sklearn.metrics import mean_squared_error # kann benutzt werden\n", + "#Oder wie hier, selbst definiert werden:\n", + "\n", + "##### Computing the MSE for the Auto Data: #####\n", + "# Compute predicted values y_head for training data\n", + "y_head = fit_lm.predict(X)\n", + "\n", + "# Function to compute the mean squared error (MSE)\n", + "def MSE(y, y_head):\n", + " return np.mean((y - y_head)**2)\n", + "\n", + "# Compute the mean squared error in the training data\n", + "MSE_train_data = MSE(train_data['ret_next'], y_head)\n", + "print(f\"in-sample (MSE): {MSE_train_data:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "581f7631-9c99-4143-b87e-11b43c243dd0", + "metadata": { + "tags": [] + }, + "source": [ + "### Aufgabe 2.3:\n", + " - Vergleich des out-of-sample MSE's mit dem 5-fach Kreuzvalidierung MSE's:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "aebbb2ff-9ddb-4e03-9d25-c9cc6e632d6e", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Out-of-sample MSE: 0.0086\n", + "Cross-validated MSE: 0.0081\n" + ] + } + ], + "source": [ + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import cross_val_score\n", + "from sklearn.metrics import mean_squared_error\n", + "import numpy as np\n", + "\n", + "# Separate features and target variables\n", + "X_train = train_data.drop(columns=['ret_next','date'])\n", + "y_train = train_data['ret_next']\n", + "X_test = test_data.drop(columns=['ret_next','date'])\n", + "y_test = test_data['ret_next']\n", + "\n", + "# Train the model on the training data\n", + "model = LinearRegression()\n", + "model.fit(X_train, y_train)\n", + "\n", + "# Predict on the test data\n", + "y_pred_test = model.predict(X_test)\n", + "\n", + "# Calculate the mean squared error (MSE) on the test data\n", + "mse_test = mean_squared_error(y_test, y_pred_test)\n", + "print(f\"Out-of-sample MSE: {mse_test:.4f}\")\n", + "\n", + "# Perform 5-fold cross-validation and calculate the mean of MSEs\n", + "mse_cv = -cross_val_score(model, X_train, y_train, scoring='neg_mean_squared_error', cv=5).mean()\n", + "print(f\"Cross-validated MSE: {mse_cv:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "18a9a179-4226-4734-8bcf-554671ce85e9", + "metadata": { + "tags": [] + }, + "source": [ + "### Aufgabe 2.4:\n", + " - Bestimmung des out-of-sample MSE's und mittel 5-fache Kreuzvalidierung den Cross-validation MSE\n", + " - Auswertung der Modellgüte mittels verschiedener variablen Kombinationen." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "c1761dc0-3714-457d-89e4-d19d00214aaf", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In-sample R^2 mit der Variable (DP): 0.0541\n", + "In-sample MSE mit der Variable (DP): 0.0055\n", + "Out-of-sample MSE bei 5-fold cross-validation mit der Variable (DP): 0.0056\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import cross_val_score\n", + "from sklearn.metrics import mean_squared_error, r2_score\n", + "\n", + "# Select subset of features\n", + "selected_features = ['DP']\n", + "\n", + "# Separate features and target variables for selected features\n", + "X_train_selected = train_data[selected_features]\n", + "X_test_selected = test_data[selected_features]\n", + "y_train = train_data['ret_next']\n", + "y_test = test_data['ret_next']\n", + "\n", + "# Train the model on the training data using selected features\n", + "model_selected = LinearRegression()\n", + "model_selected.fit(X_train_selected, y_train)\n", + "\n", + "# Predict on the training data\n", + "y_pred_train_selected = model_selected.predict(X_train_selected)\n", + "\n", + "# Compute in-sample R^2\n", + "r2_in_sample_selected = r2_score(y_train, y_pred_train_selected)\n", + "print(f\"In-sample R^2 mit der Variable (DP): {r2_in_sample_selected:.4f}\")\n", + "\n", + "# Compute mean squared error (MSE) for the training data\n", + "mse_train_selected = mean_squared_error(y_train, y_pred_train_selected)\n", + "print(f\"In-sample MSE mit der Variable (DP): {mse_train_selected:.4f}\")\n", + "\n", + "# Perform 5-fold cross-validation and compute out-of-sample MSE for the selected features\n", + "mse_cv_selected = -cross_val_score(model_selected, X_train_selected, y_train, scoring='neg_mean_squared_error', cv=5).mean()\n", + "print(f\"Out-of-sample MSE bei 5-fold cross-validation mit der Variable (DP): {mse_cv_selected:.4f}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "d7512923-949c-4733-be03-24cc6c7ce71c", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In-sample R^2 mit den Variablen (DP+cay): 0.1040\n", + "In-sample MSE mit den Variablen (DP+cay): 0.0052\n", + "Out-of-sample MSE bei 5-fold cross-validation mit den Variablen (DP+cay): 0.0053\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import cross_val_score\n", + "from sklearn.metrics import mean_squared_error, r2_score\n", + "\n", + "# Select subset of features\n", + "selected_features = ['DP', 'cay']\n", + "\n", + "# Separate features and target variables for selected features\n", + "X_train_selected = train_data[selected_features]\n", + "X_test_selected = test_data[selected_features]\n", + "y_train = train_data['ret_next']\n", + "y_test = test_data['ret_next']\n", + "\n", + "# Train the model on the training data using selected features\n", + "model_selected = LinearRegression()\n", + "model_selected.fit(X_train_selected, y_train)\n", + "\n", + "# Predict on the training data\n", + "y_pred_train_selected = model_selected.predict(X_train_selected)\n", + "\n", + "# Compute in-sample R^2\n", + "r2_in_sample_selected = r2_score(y_train, y_pred_train_selected)\n", + "print(f\"In-sample R^2 mit den Variablen (DP+cay): {r2_in_sample_selected:.4f}\")\n", + "\n", + "# Compute mean squared error (MSE) for the training data\n", + "mse_train_selected = mean_squared_error(y_train, y_pred_train_selected)\n", + "print(f\"In-sample MSE mit den Variablen (DP+cay): {mse_train_selected:.4f}\")\n", + "\n", + "# Perform 5-fold cross-validation and compute out-of-sample MSE for the selected features\n", + "mse_cv_selected = -cross_val_score(model_selected, X_train_selected, y_train, scoring='neg_mean_squared_error', cv=5).mean()\n", + "print(f\"Out-of-sample MSE bei 5-fold cross-validation mit den Variablen (DP+cay): {mse_cv_selected:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "df4f7f10-2779-43ab-a7b0-3bd1b3f15b0c", + "metadata": {}, + "source": [ + "## Aufgabe 3: Vorhersage der Richtung des Aktienmarktes \n", + "\n", + "Statt Renditen quantitativ vorherzusagen, nehmen Sie nun an, dass Sie die Richtung des Aktienmarktes vorhersagen möchten, d. h. ob die Aktien steigen oder fallen. Basierend auf diesen Vorhersagen möchten Sie entweder in Aktien investieren oder nicht.\n", + "\n", + "1. Erstellen Sie eine neue Variable sowohl in den Trainings- als auch in den Testdaten, die 1 ist, wenn die Rendite größer als Null ist, und 0 sonst.\n", + "\n", + "2. Berechnen Sie den Anteil positiver Aktienrenditen sowohl in den Trainings- als auch in den Testdaten.\n", + "\n", + "3. Passen Sie eine logistische Regression an die Trainingsdaten an, um die Richtung des Aktienmarktes vorherzusagen (stellen Sie sicher, dass Sie die Datumsvariable und die alte quantitative Renditevariable ausschließen). Welche Merkmale sind nützliche Prädiktoren? Berechnen Sie die In-Sample-Genauigkeit und die Fehlerquote. Glauben Sie, dass Sie ein gutes Modell zur Vorhersage der Richtung des Aktienmarktes erstellt haben?\n", + "\n", + "4. Angenommen, Sie verwenden das von Ihnen erstellte Modell, um die Richtung des Aktienmarktes in den nächsten 25 Jahren vorherzusagen. Berechnen Sie die außerhalb der Stichprobe liegende Genauigkeit und Fehlerquote für die Testdaten. Vergleichen Sie diese Ergebnisse mit den In-Sample-Statistiken. Glauben Sie, dass Ihr Modell gut außerhalb der Stichprobe funktioniert? Interpretieren Sie die Ergebnisse. " + ] + }, + { + "cell_type": "markdown", + "id": "309698e2-9577-43ac-84c8-12edad93f8aa", + "metadata": {}, + "source": [ + "### Aufgabe 3.1:\n", + " - Erstelle eine neue Variable, die 1 ist, wenn die Rendite größer als Null ist, und 0 sonst." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "b28e213d-9ca8-4e33-a15f-feae07d73a18", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dateDPCSntiscayTSsvarret_nextreturn_positive
921952-Q1-2.8426960.0053280.032094-0.0105950.01040.0021020.0382751
931952-Q2-2.8457110.0054250.0277310.0000550.00890.001660-0.0049800
941952-Q3-2.8287410.0055210.031038-0.0006950.01060.0010760.1022951
951952-Q4-2.9361930.0052310.026535-0.0159500.00700.001753-0.0356830
961953-Q1-2.8868190.0043540.024013-0.0190210.00930.001574-0.0321020
\n", + "
" + ], + "text/plain": [ + " date DP CS ntis cay TS svar \\\n", + "92 1952-Q1 -2.842696 0.005328 0.032094 -0.010595 0.0104 0.002102 \n", + "93 1952-Q2 -2.845711 0.005425 0.027731 0.000055 0.0089 0.001660 \n", + "94 1952-Q3 -2.828741 0.005521 0.031038 -0.000695 0.0106 0.001076 \n", + "95 1952-Q4 -2.936193 0.005231 0.026535 -0.015950 0.0070 0.001753 \n", + "96 1953-Q1 -2.886819 0.004354 0.024013 -0.019021 0.0093 0.001574 \n", + "\n", + " ret_next return_positive \n", + "92 0.038275 1 \n", + "93 -0.004980 0 \n", + "94 0.102295 1 \n", + "95 -0.035683 0 \n", + "96 -0.032102 0 " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "warnings.filterwarnings('ignore') # to hide the warning message - it's fixed here.\n", + "# Erstelle eine neue Variable, die 1 ist, wenn die Rendite größer als Null ist, und 0 sonst\n", + "\n", + "# raises a Warning message: A value is trying to be set on a copy of a slice from a DataFrame.\n", + "#train_data['return_positive'] = train_data['ret_next'].apply(lambda x: 1 if x > 0 else 0)\n", + "#test_data['return_positive'] = test_data['ret_next'].apply(lambda x: 1 if x > 0 else 0)\n", + "\n", + "# .iloc solution to fix the warning:\n", + "train_data.loc[:, 'return_positive'] = train_data['ret_next'].apply(lambda x: 1 if x > 0 else 0)\n", + "test_data.loc[:, 'return_positive'] = test_data['ret_next'].apply(lambda x: 1 if x > 0 else 0)\n", + "\n", + "train_data.head()" + ] + }, + { + "cell_type": "markdown", + "id": "e2c9d767-2c2a-4937-85f4-823ff387e11f", + "metadata": {}, + "source": [ + "### Aufgabe 3.2: \n", + " - Anteil der positiven Aktienrenditen in den Trainingsdaten & Testdaten" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "340c54c5-8db6-4fce-ab35-8a782ad501c7", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Anteil der positiven Aktienrenditen in den Trainingsdaten: 0.69\n", + "Anteil der positiven Aktienrenditen in den Testdaten: 0.73\n" + ] + } + ], + "source": [ + "# Berechne den Anteil der positiven Aktienrenditen in den Trainingsdaten\n", + "positive_proportion_train = train_data['return_positive'].mean()\n", + "print(f\"Anteil der positiven Aktienrenditen in den Trainingsdaten: {positive_proportion_train:.2f}\")\n", + "\n", + "# Berechne den Anteil der positiven Aktienrenditen in den Testdaten\n", + "positive_proportion_test = test_data['return_positive'].mean()\n", + "print(f\"Anteil der positiven Aktienrenditen in den Testdaten: {positive_proportion_test:.2f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "26f00d13-9110-4011-b896-cb1e0e3edd08", + "metadata": {}, + "source": [ + "### Aufgabe 3.3:\n", + " - Logistische Regression über die Trainingsdaten, um die Richtung des Aktienmarktes vorherzusagen." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "d9b63103-441a-4ecc-b75f-d03d8c948221", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In-sample accuracy: 0.69\n", + "In-sample error rate: 0.31\n", + "Useful predictors: Index(['DP', 'CS', 'ntis', 'cay', 'TS', 'svar'], dtype='object')\n" + ] + } + ], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.metrics import accuracy_score, confusion_matrix\n", + "\n", + "# Separate features and target variables\n", + "X_train = train_data.drop(columns=['return_positive', 'date', 'ret_next'])\n", + "y_train = train_data['return_positive']\n", + "\n", + "# Train the logistic regression model\n", + "model = LogisticRegression()\n", + "model.fit(X_train, y_train)\n", + "\n", + "# Predict on the training data\n", + "y_pred_train = model.predict(X_train)\n", + "\n", + "# Compute in-sample accuracy and error rate\n", + "accuracy = accuracy_score(y_train, y_pred_train)\n", + "error_rate = 1 - accuracy\n", + "\n", + "print(f\"In-sample accuracy: {accuracy:.2f}\")\n", + "print(f\"In-sample error rate: {error_rate:.2f}\")\n", + "\n", + "# Compute useful predictors\n", + "coefficients = model.coef_[0]\n", + "useful_predictors = X_train.columns[coefficients != 0]\n", + "print(\"Useful predictors:\", useful_predictors) # Variablen die nicht-null Koeffizienten sind, sind nützlich." + ] + }, + { + "cell_type": "markdown", + "id": "2c1bbcad-b760-41b5-a455-01f063e6036e", + "metadata": {}, + "source": [ + "### Aufgabe 3.4:\n", + "- Äquivalent hierzu werden nun die Testdaten genutzt um Out-of-sample accuracy und error rate zu bestimmen und anschließend zu vergleichen mit den Ergebnissen aus Aufgabe 3.3." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "055822b9-d17f-47be-ada6-f3fa45f4554d", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Out-of-sample accuracy: 0.73\n", + "Out-of-sample error rate: 0.27\n", + "\n", + "Comparison with in-sample statistics:\n", + "In-sample accuracy: 0.69\n", + "In-sample error rate: 0.31\n" + ] + } + ], + "source": [ + "# Separate features and target variables for test data\n", + "X_test = test_data.drop(columns=['return_positive', 'date', 'ret_next'])\n", + "y_test = test_data['return_positive']\n", + "\n", + "# Predict on the test data\n", + "y_pred_test = model.predict(X_test)\n", + "\n", + "# Compute out-of-sample accuracy and error rate\n", + "accuracy_test = accuracy_score(y_test, y_pred_test)\n", + "error_rate_test = 1 - accuracy_test\n", + "\n", + "print(f\"Out-of-sample accuracy: {accuracy_test:.2f}\")\n", + "print(f\"Out-of-sample error rate: {error_rate_test:.2f}\")\n", + "\n", + "# Compare with in-sample statistics\n", + "print(\"\\nComparison with in-sample statistics:\")\n", + "print(f\"In-sample accuracy: {accuracy:.2f}\")\n", + "print(f\"In-sample error rate: {error_rate:.2f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "81cbfae3-7385-40a2-8d0d-d7db7ae9a9f5", + "metadata": {}, + "source": [ + "## Anhang \n", + "Der Datensatz enthält die folgenden Variablen:\n", + " - **ret**: Die vierteljährliche Rendite des US-Aktienmarktes (eine Zahl von 0,01 entspricht einer Rendite von 1% pro Quartal)\n", + " date: Das Datum im Format JJJJQ (19941 bedeutet das erste Quartal 1994)\n", + " - **DP**: Das Dividenden-zu-Preis-Verhältnis des Aktienmarktes (eine Bewertungsmessung, ob die Preise im Verhältnis zu den gezahlten Dividenden hoch oder niedrig sind)\n", + " - **CS**: Der Kreditspread definiert als die Differenz der Renditen zwischen hoch bewerteten Unternehmensanleihen (sichere Anlagen) und niedrig bewerteten Unternehmensanleihen (Unternehmen, die möglicherweise bankrott gehen). CS misst die zusätzliche Rendite, die Investoren für Investitionen in riskante Unternehmen im Vergleich zu etablierten Unternehmen mit geringeren Risiken verlangen.\n", + " - **ntis**: Ein Maß für die Aktivität bei der Ausgabe von Unternehmensanleihen (IPOs, Rückkäufe von Aktien,...)\n", + " - **cay**: Ein Maß für das Verhältnis von Vermögen zu Verbrauch (wie viel wird im Verhältnis zum Gesamtvermögen verbraucht)\n", + " - **TS**: Der Term Spread ist die Differenz zwischen der langfristigen Rendite von Staatsanleihen und kurzfristigen Renditen.\n", + " - **svar**: Ein Maß für die Varianz des Aktienmarktes\n", + "Für eine vollständige Beschreibung der Daten siehe Welch und Goyal (2007). Google ist auch sehr hilfreich, wenn Sie mehr über die Variablen erfahren möchten.\n" + ] + }, + { + "cell_type": "markdown", + "id": "db90f03c-18a4-4e7f-a31c-56f206baf5cc", + "metadata": {}, + "source": [ + "## Literatur \n", + "\n", + "Welch, I. and A. Goyal (2007, 03). A Comprehensive Look at The Empirical Performance of Equity\n", + "Premium Prediction. The Review of Financial Studies 21 (4), 1455–1508.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Machine Learning for Economics and Finance/Problem Set 1/stockmarketdata.rds b/Machine Learning for Economics and Finance/Problem Set 1/stockmarketdata.rds new file mode 100644 index 0000000000000000000000000000000000000000..a5866be4ae43c9b2a9b32ded38e3114da0873942 GIT binary patch literal 19065 zcmV)JK)b&miwFP!000001MQsoJ5_7g_;;D-jS6Kh$y7>7Dz}P~%u_04o@Jh=ZJy_O z-sT}hh(zl#mO@eqO*E>6N~HncowM}A`~C;t>pj<1%V+J+TF-Rf_w(#?ox_lkKp;>N zs40J_Xed7tX!P|?O36{~aZ-Su@`sf`*x;l%I)?s!syOB@r*VH8_g~@sYMpWaHSWK``7O?GasMsu&)__> z&ba>`_dnqL5$BJ%{}J~;;rwZxaeof?=W$-Zc>(tqaQ_R=U)CA-zvBKP&fjqUhWp=e z{|C-L)*1JgaQ`RHzi|GA`@eAi56*wq8TXfQe+B1NoL6yw^*{f?21We3&j0xjHvZ>7 zpu(9NXKLK1#(i3xY1bL|>2RMOX9k=ZaGwG9nQ&%WXWVDTeHNTqac0GRR@^7zOk8K& zXUBaGoH=pk#C=ZO-;DF-b;kWIxX*<%H_qI+&yD+Aao)PlxW5hed2!~$nGg5*a9;pt zfpx}xLEIO@Sr}(w+!x0E9XRh;XWSRT{hc_A;w*~$qPQ=Hv)DT0zBukn;4F!=B<@S% z{%)LiuQTrN!F_3*_u{-4_xIwy49+s^jQjg>{{YUiILqR`EbhzUEVs_Me+c*GaXyUm zVcb8A`$us;y3V+-fct>6BF>7quZa7{aX!AzxPJoom2g(ZSsC}0asL#~r`8$wRdD|_ z&Z;=8;=U^GtKqD+&bY6R`x-cF;;f1Lnz*lxv-UdUz7Fo|;;e_W9`5Vmz5&h#>x}zn zao-Tx}y@xbKRy8_sUH?}qyx zID4!!?t9|C7tY=|d*i-0?)&2GyUw`phx`6G2jCol`vJHggmch3<9;yilW-2fIRy7Z za6b&^uyw}$aNLi;ITGhc+>gZlXq=aV}nG+%LiXQk=_hF2nsY+^@j7Vx4in68EccuEx0<_p5RLGR~LR8TV^( zzYgbmoa=GF9`_q?ZdhmBZ^ZqpI5*+kg!@gn--2_?I^+H|+;7FX4d*u8Z^QleH74Na zZK$4j@icBx5w7WP9RHx+0IgcvufIM=#_mh?Y%}7nhrYv2tL?@lte-5+8<1*>HEVD4 zEpT+f`V<0&F1}iXw#4hIoypQz`=uP}{EZ~+QRv;}3m0p#J|T;z<_tva+Gx%Elyw_6 zK6dCt?H*p3%{k-SWV?iQh4JSH$*{W-YFE+lMVWB537GlKCAWUG4wUTBE3pMDeX0@vyIt~T{AV0|Z{ffQ?aC6g+^7#E9Ug&r z+1=ZWJ4sl%_CnpPU^4g$1TiRmB0yEOdH9L3DGW6T0|YM5wC!(suR{rJu6Azo@3 z$?AWvnKXrkv^;X;y@vpqy0A!zL_?X{)|$7!O`)P!#6XsQ3QDtk1ac9PKlcDSIQ9<$UDkwZU8!Zw;0yN(Diq0q-8cS|+I;Ruh@3eL=nn~2aH!O2w3^0?G9O(AY9`U8A9prD}|?z2x2wrb%77k6`?IPqx@S65zwO#SKnDF z!f93dAM8s6C_9l;xI9XLX4}e#7JXV+#{C;xDgsThJGOi09#0NJ3Bl~M&Kw!Kjyt2z z>TPi4=lt&C-3Y2l#xxqWQ&4uNyTRnszEWRVAP`i{GNz|j(;1U7 z)%z}a_f!a2D#N=J76()8mS{>1@0<_xMbGl)EONt*-4%!RZ!N&}E#Aw0+GJ4cu;6xS zBVb-luPz=jq~zV+!ygiF5uncZ_#=S@G6bAFc~ayxWq!q)BcHEQ;v%eN!$h$bRwFDL z_9namtDBl;J8g})Q?YLtxByeySxRl{Ggu&1F8sXGT6Ux^Hv)+vi;0eTggk% zcGzG#t{%aVES-(J#xCe-um2)+h=6tQoE%=zPS=s^P36q;n5*Xp_&p44#3GX<^_?15zHSIw-GswY9WNDjCU5L7KVmwwRZSSmD5l z=fBG+d9?q;0}FjctZQVCOuIKVw2IUyKhmS@ccXWh4uo=JX8Qd%?qyKwj!CdPdPdpb z9A0TZs1(F<>VEXTo;8J3lHoOmToM%Zsi&FPkf7!pwURI&B`*iguGZ!waBD)6<~s;okqv6l6Bsj_gUM?598Mx${mZ zLtBaWx7+WC&|G;!a~pDjdZ~BCMSg-%tMf{;$9xJZIpr$(evzT*H@|S96s4Y>Ys;bj zPJlc{o$%3ZB)W%_ z$oENbH$P!zo4^uOZ;aacnbJ;Uro=|^ql(y-(V#y{3M0_d{m>&YgBx;nnZvK!#NkT6*5B7^ItGWRdYhU6kj0#)}7+=qTecefp=% zOA&64Pcl>8UVzl*r;-8&BuHs=DBxH^Seo1lqZqp&v<5L7Fpo)N=@Usi0<+1Gu=(8J z2SkFB&ehc}< zy)OuG1=T9Y@0o)1Zoj?MlZsf8>&|7{+$pS$pliwWX9~+T>18AE5wK=HE7QZK?JeWnc>;0mjIj0h7Yp1OAI8t5Wnoe4UZ-cr#+ zr@V%jE*&LK+g}77>T!WOUH$mXRa4N^^thb3i!x7Fp6>L>9)ZrGOP9DP??IA0QWq%a zlDzZD$Hq)4{XVv%;tdtTTn|N7{Zt~r4e@(d{n=fhplKlM;(Y`)y*tH)gem!a!uz<@ zVM^SxMQu4qO@JE3gNebf5JY+j4{+QeQR2|G_1Sd<5wTx)eWZ-vRJ>Dx#g>3IsO~-X zRBsntW!N?B)jhppfINYKieDXV#%fZZNGBEMH*Wc|F27(dhQtJ><0nAF2N zh$k_`gejf;c&!o9_x@p&aUVpC)I*nderQ4T=jsiuB%2Uj^7f&$L;i?%gRx6O_(w!X z^>Wbwu^CP;)>n1TpM7cl~gqDx%L&{lKko1~IU^j4pV}Aclxkquj23M1Qly zm0hF)(etR~o8(R)daLDAEHnoZo!>W2_fY}FAW=eVpEQFQC61R%9i&2x_rH|i{#=8Y z#G6k0CS6C2MnQ4Mqu(RO?N9d$I(s4p)7~8CkH(0>B3fV|RR_^2vv~;wm?Ao2MgIXU z8bn=UNH|$Ti>S6WYhPS>jA)iY->2^?qBAPIrSbMMVx%9J*XGPZ^wu_3)95Us-}P#5 z@aM~TKi(mFS`Jp>WA=!yGx_1UR((V>G3L;E)dJB9_iF4eLx`5sV1oVO21MhllPPqU z4bdpovyY8eBAQEWHs8%P5j8{n%k)_?qAJ*XY5G+tqCUnh^mc5J;)U&qn?8(aJ&Ff( zCmRvnNb9+Ya}N={Ueck$rk#kM`nmA*DJn#luNLSkV}s~g!;dW}D;Pi4su??5 ze;Lu|Ck7@+_9FUZB9T*^9?_e9_MPfWKy*uaXA;juBf7kM#!7o%Ai9ICuU(kFAi53r z+v3dh5Z&Xe8_kv|^UNTa$9`iMqCGd}&q_OnXr7eH57~c3G)y6l#06(WQ|U>5eQh72 zAuVri_44>jjz@@2N$sV%y#ZxC8BB)GbRhb+gjWX-Fd&8l zg{S#u9wCOKM9%T)pNMfAm)zGD%DkP=S_ynuh#1pLzV2AzM~pR0*{kvch>?54T)zG{ z#8ArHGArSU7`3#nnZ+|BhMDwH4=XRk=$H3#_<#yx@HF~3tEh|UC+7X0(<>r|^NNnI zS|t%9uX(qs%oJk$*uGJz&<8Q)x4)>PazjiBOIi5}!HDVAg1Xq#48)Y)Cf)v(4KckN zWng*VgBXWB9N&;Aalf4QOD1Ee!YdBn7BqP{jJ5HWcf<{ooU zKumdM7VHv~`(soRs?1b~>B~g3=U6Xd-qUw;%=#5#K5cgY>2e%m)_v)3SMiH-e^HiR zMieoNNk1}nr#yE%ptanL(yqjo;0t>~5%d13rPFc~hzZeWDy%pmro98d8wlBmQFhk< zgM|xXtjywMyK@6E1(O$6+`l8HnCwIETP+YXLHXeK7K50hSE;=W+8nE#5A)M$o@$TF-2`dz0c1hCay?D-j77Y_%m}~3Clsm z*rC^vD|8YuFkCH}8QYH-_cZg7^Meo*eU|P6w@}1%herIY7&~I>6=>M~LlrUA=hyZ) zsv>6kM51|c5Mp-BU~849L(C4&(W4=nh`H>uGM%6SVqQMyGs*N0G5e3tm4Bza@BZA& z-eD)iys`1@_aoMb@vZ6oRc9K+==mqaZrdTmkUy^A^W+(#{qwz9Z>bm2rLL^}?m2+y ze?QSt{Q3tmd~-P#=bMA*zlijyF_|HTtmN}A7{4IK0!PmifwPo&yvO|}KLRm&3#}xn zY(9YxfL zPXDp^>+{TrHbi*e^U@qdEAMKs@^c8$&T|g#3ob+SJ6E_glHC#gtxd)jF>Ht-xM+6a z<^*EUTTalm2|x^zZIBt&hv@If9=<-%iRcabHEfHk5&e7aq#8{bL_hTAkibr1#GvwV z4_{S3VyJHu+`jiCVh}U;;IAq|jH>2y=Q;)`c_L&wDBVtp^D9DyMdpaHK} zP&}z9%Y>Mnzx?7c`;3^ghaRh)kweV0JOAu`#EBR$*Q$NGlS|3h4ewsMJV8vn=Ynn~ zm>|aI1Tn`mEr{{1=Wl0C6U0R1Y$@VOM@;794k8(zi0O%Dih!mfVruQ2U;6$DG072+ z@m2C6Cdx&``Q$iRz*r)+=U(d>AY!~Z+csH7#7_O9inHb{VrM-XKHbZY zh^eXKFN)<5ySCw5n*5`P-FZRb@R=UOel_C8t*k=CZnk~*VsHgwf6utnmn{piFRhfE z@7RpkpEKV6WN3@n-<^w;&E0_5Gp=i;eqBZEJM2dND0#ttiQe{%0ud2=qo4UmSR(da zS(SC>LzFAIN;~QhA_h8r{jhQkvBd{8M@2Xy;>NVCNi#nYTP1&@>HR!JwDNk>z(__! zGuItA7(EcNgg((i`Y|HLisZG-RU)D{V~B#_DMVE4qph7irBh#9=OIQideg%`Q5fKAXd}opYH6=Ml9Jz zy!`KeQ|gqQ#Jl)N#6%T5zQM;BF;{r(EB~{IShdnMMD95vw&UhAQywmeZB*#!fygb0 zwMSToyZ;Ab^NKo9^jZKBSuXdNsA^H#%krFjAAs1t@^A1H*@#$OX0PAiJAhc zc!JoXS&2cGY>4%bGxJqO6U4Up{1DG}0mPcJsh#Oa3}WM*NC`wOh>hClPH|Q~hHtI}NOvQ+8 zzlbx@LI<(>XE~S8NFdhx4`LpoTEu#%LH-8YF2p*vWBgZn8)99V3W-(dMr=aM&1&|A zh|TrKW%GPF#1^9F*~D*3SL~I3YG7=+{cp#qqqxx?G%bZUqwP>|l-lmXjATO%4z8Z9ZDp^6scHXT};_`T!?tVMecszdQ z?wbtx95Ie{g9}h%x7eiavkA&nj@_6z_#JYT&(Zw2EDrf%#p%~^AXPC7h{=mn^gF}-Ldt2f|-_3YPmY+Gh)XM>> zKGTEBm##u~Q~goaifky{)?P4p=P4A(KU7_H9EGe;$Nav|k?)Cx350u zO`sz94?Jcb7dHZL7W7Q>`cv>Zq`rOAMFfFn!vRKdWe}9@DmeAyDFp2^Ov~Q>2ZG4V z!utBm5Xe*|{Ak8wPi%tup$F z44#i#&ez|!1UD$&O4!e%y`F&}>7RT-SLq?((1kwX10x3n_QrvkpWNz6IEGvKeg zPr7pt0RpcGdy&QDAh_&k`%UI#2t>)lR!mwDs2`>Dqj>~^bqb^VSKC>SXX=e0Gj0vH|=9`JOZ_+Csoj_qeLAXb5l}&Q<-o1OeWsBsa`ULy*4f zi(E>-f>RsnWZEe6@J_>=|LO?%9@?0l!A5zHZ`ilbTc;pIM@c#OGdG0fZS<=;A`O9W zA~>GkBtXDYX5J|+7l;lZca6Z+@;zeV(6#J_Bum| z@Mf=bC%&tEFf=UsBc=pbOoI(&1zf(PDw+kd6zFRIiIu6MnRH?Xw z2#}(@lhOK+3ncznKGuJgazC=2L2j%J5`LzWT+eeu)NsI4jRGQs?UA%l%=3YWP%-s_ z$!G{W{HSrS9)eJzm%l155q{v>CkNrW#H8>!6$tmY4!X4}2hmDiiaYdGAa=~IY5Ktuq{TQ4 z8jh(zVn9ruV4O6hXr6y(^?eai37K7)GTD%-dfMxkiXp_vyiDB_MuhaterEM*7f5?x zN*{}ELu&b|s5!GBWN(dW7dUwj4;V?K}brRehdn^0y(vaojjby=08Au7;h!pe3E zA}?Ie%JZbW?@fJ^TXQr-JiL5?T`(CUejXdS`B54ob~s$8P3DGh#-Z0F@dgOr;vss> zjxyh3qb<3^l(;iL`ptAJ8AxLzXly7NLVhcb^~EawB=BDhTOQdZ|^i1f;iQV`C3`2zeR%iP202NDDud>t`tMIpZAl z%*hn|{H^P?)1@KUSTe(eVFWzOxrFp@k|1D5dz$Q568OgW}wF76^Rv zFBVrZNQ3V)Nq*~RBj6`GA)Vl!41VHYI-?>e{Wz6NcXUt-yl0I&=MpLPb>F@x&nz3j zV<|aN#YhW0oyM5+z7oOf`)z4{HHx?4$b+BSl;{0uO7v(c_0X*LyXlK12(cO3H}-@G zLBzMBWpWD;bW2P0X~h)ysHMMKag-{oI-DD!XPRI>955u$CD%|#iwA-cbYv)kDeV)nUv&aI3?)Xy=& z3w(wU)tAEpNdXXUefW5j9VHHG^nA2W+Cq5I6;)l?MF{;MqjT7f3_)_XXO2yeLtt6& zZJ%pCl=b}fO!-wd`00#%ig+UjzA4H3#Ux!IB*68<*W@D*;(4S+ala#knzb3Kh$%tH zhOr|o-VG44W8bx{e3ZDj)p@0evaTUc2b_gj1R?ZRSZjzRWxZqb1FNj|Q^s>e;pM$_ zh_ve2qGHkjvH4TiioWzhZ1*=^uVi{i`qUJ@XIC_kWyef~Mav;?aj|E@j+SyRnT+r> z5`@SnbA{8==@8-nsA*s$rCu1ld*tUR39);FqI(Wf;+#gc@Msq^gfH^BWRq3?^*1n% zbAQM#N17gO1Ub5Oy3(+UBbI)Y7Aa2V_QR^fDVq<^KdoCqInDP))QfnK8 zGMB_XE*pW6KSLMap@@gRr;P7W~xaKUwrr_74fgh{JP4@Ux2h#kI8x{8y(>2P*hLkYa50 z+>dMsVOQrBw~>R83Xh!voQ@FD$^4dTA3de+Tk__+X+e14*ArLHszHbYed+Ez1qeCh z6gYkA4#aSwrUNFH|IXXZmpI&Rb_+r$#w%O2WDXdojDs&eWT5dNQ z0n}S&_tL8#kjhrBISVi76eaJStjU6_3&ylxDhi-M$G^LtwjYXkhM!f_^kT7Ix~^PX zNm$5{oloXAKEcW!ZQFCRG#gHBLWj?s(8E0P7Z?)1=VF<*E#!Map`bF)P7pA~0 z%@D>I0KTSoKf8Syg|J%x$lIDa;1go4bor+!q`e!xa9{N+WN3bBjy5lY=*#hstY`)y zZFokFr3HZGa(q0={wajqbuWq8;}3C(V{IM8(~zOXzdY;j1R3xBUmmTOgPezxuIhzG8;a!JizoSuRf1{?uBhB`4zrhzY;T3VSe`etPxngG3ySpQwQ(kiG!Ef6Ty$` zKsELDM-bN4zyI07eu%kkbZV-f3&L|tBb-;vA=EhW!U+z4h+u1~Yg(#?h~0%1TSxam zcss*|aH?1ce>q{Lb~XikQy;L;_i%%81NR-_{rACSm4R@gT?TZk^EC8!heAA)ZYbYz zE=Y(;c4w+ffwbI55@+(xL27Bh)(HVF2)#gzyZta8(jvAA^){Ct@*nPps1%Q zr$25V@@Iyghp}?haPbdk!u(QHuGmtNpMLyiX z1o>8(%&a~~py;+sexOPe$`hmAom zvN|ENsFw5NKoO)4o{kCCv4a%VEv?>%m>{o=@LPoZ1oBI7R5t&Vf}Ed(dVA|R$c4Ll z_ip}zoQW3Vk#oh6r~mcZRk1qAJza7|a1%Eava_)+c5Z{*{hN>a&$ZUp6`dcVE zUePFf*q*iw0pF;!92H8Vy+>`QteURFeIYh5s3lReRIaeN@hfsqj@$-LJA@xA- zuM-0EkiK%Gp>Q|=(qAXd+I$g&tO~6Sk-PzrmNh?V(sc$>mcHb@YNvt}by_DblTyeq zxOCw4M`uWp{#bcybUUO(*9*vO7lstei)|Z;`yr)p%DY!;6%vzZPhr&Z5Pp~Nj#PFN zyyhPr395F31iK4KhrDDU=6jTgNR~N7cX!l_MifBm^{1PXs7HWotSDVsaSzJmEy*i; zWT2A&gglc^2~_oWFppofgW`C7y+KueC^`4w%^wd>D6bTzwdo0kQYUF$vVR$nw_(=> z)h|LGy;>-h@(;+Hdqa1*s21{b$Nq3IyoO?jP~S^C`+=-LV=`=Q4aK1uThjfsfc%hh z>TOJf;xK+6mOv#axht|9BQFJ|Ka-v4ojy?3|E}$Xa|gth7SU**xd0JrNq6Koegl## z$K}wW6!2tXNU5#TfspSH+m)5dApP6ve)7TZkQJtE;L<(@`I+;PoF>$JN1=3_x(UVpW)%PT@%Qu{=X*- z-ayg*&klm>Dv*D8|E0u3y^#Dd^2>DA6vR@8*tycBL!#XeQx#GjM7_=3v)DZh(XAT0 zq^B04Xbe(N8I)|F|T>;D2tXJ`0s&sadV@v_v8%EOT7i2|eeofkJ-|MML3nuiL5$GIT8{`>3CMiHQITn<7b z(^#0xLyrC=TP$zF*ZutyEiA`QV@og93YG^1dX{;6j7&3f%I85cmNr)zn#dD{C68$A zxoRVZ#YpWl&knqac`~1@7#|+TqSVxGxpsG9=HjRO1b1|Tr>4~QAY*Ru+c>NcTC)Y> z_T0^uxXuPX?=IF9)>~p;yh4FVDzaF(f2}K>q%4;4(ZY&mJ3W?FwfUR#ydRbw@p$-} zNEViB>=x5r=77Z<*HvVDa{`OttZREHupLt@((T(p&;^5_6B_aI{FsS?8jpm~ey}gO zJmE-P1PM|D6|rsekTtPcWk8b>->0tydRgs)a`!*zCtD0uunGm5tGGcqapT8*+R{)u zeLgvZ{v4F7=2zP-@j*Gq_7OfWZ76pXIXyRZ3M%cwM6&G%q1+?GfvHgqifx{a^=EZK zK@+2d{%J8tUf~wx9~glI@7^ER5+;H4#pThkY9aUu9f@sJp`7amji-e^ib0ARANP#w z2;@Ca3~n(@ghExj($BkOA-mbe=}F80WK@1{Yu5CJNZCQ-{F)n(u$1z7zIYIlo|V09 zS(=CBmmb|a=t-11EgyL_nsTj{zMPv$iTjv`o)b&IAVZv1_Q@WAG@a8_GM_Uc{ka|Q zVQV)?)xIQ1pV$eh?Z0e4<^F`g-KS1{qL~Lbjg7(X!H+tYv zUT}Ci<}`$~MV{f?WeaiXzKh^Y+!>`Gs~=sIG@LIHs8`|zSV=3in*{)oKnIv$t_j^IeTu-Q-9g> z`Sc!$xSDJg>LLhvwA8PzUAhNphhCXvPv?WfS!-!SX?}=#xOINtn{kLqkZnxfRtVAb z4l=*?v_Wk9aW)>V2+S`(Ih^H3E*39)a{qo02`r7I)bnDt2um4fq7I02#G)!sJ9XFb zVxiC)d!O?q2nh7KP6wWY@auw^f|TzU(a&f0a&?qI@;#5Qr}lY6yz|ll#+FM!GQK}2 z;jQJL@e%=mV`1_x7)0O<+_5ZH_KVGqjt1TqA-3`F}M7@zu zwG%tpf4%riQa+Zfuibs8>I|0VCm`xG{|!r9J~d7bH^EZmN3ZmJqwmf=!3DVWdO#@6LAtoiOG}ZYyMA$Yr5djS%u9MpNTP6NEI- zAH$E~sK;;}4E^ zmYgxN%9kmh)Am^D8_CNiPiL^=g$J5)F7a65b)ux(uR^SZ{qx=DG`v`WTYN&BC_R?G zJA}G(m~xE8`<`}`Gu7peJZiz z`%kP`_tN>@ItEzP#l`P}x|~?$Bmk>?m#CgSU4~U~ zJ~0))5sZ~gTwm%uEsl|8d-Wo#OR?OBR`0ggSFx0D%7I6w(y>&R*}}YfJ1iyWxYVPt z=UB>e>-noW=~#NL;b~n*RV+>FnkVf+87!r4-ri-e5EgS@`>T-lJ}lVjF!@x+0W7)Z z8dgyH21^V36BH|Hjis}gcb=CJ!{T*p?hL^Q7AN=Uzy&*BEcNs`D>cTgSUPR$XPvwE zvGn0j1rym}SWJlg2Ja|7EScxSuimB-EZp>l`%B9h%+rx-l1XtD3k!>_(w%a|;vPJe z&@O(1g)}!D<5)e4g$-N&{Kzehc^zJGsuaz{65WpJ4c&LgBCMWGTbjMcylGd8pVI4L zZi8;yW^Xpa8N-v=GItvxstXPkXPZFkzC4djP7{!!kEn8#0g7eZzZm%HP|op@WAGvl z(pP)T95Z)9#;Ct8;Yt%^6b}hoT^EG3B{HKF?=2|o%`q(07YDM>(qW$Y3sB-ZxZ&tl zZ762kdsABf6cmooifk|9rhH$ex;<6a1esJZPchm^$RNg=mFdt!c95D>iR&`t?er;d zlPrMT^q9qthoX?_nx<`Ya3kcnCNVKpe5QOa=K2*>y#dHoHkXV1ZGc=V%$Ab(9LQTM zFWYi9Ly1(cWai)hZ|QpF>#NZlp+uN*boo~xkQ-R0^*y%%`K9P-8xMadXH{~vte=MR z!y`U9HeaA*#>2tHWIvQFRTaw+pFr`Q99rjRwvcNsCi;x|6=YWti2kOBApL3Rg#Nb- z$fni~*`8$yd54%Zn48U@$Sr@wdejpNhKA9tUzG3r(wR)BA94Y?>~V%`Sp<|;KUltM zb{R@ezS_1;gAvL%U3fx!YbTT%+>L)0_zX(u-uK(ce1LMEhleZ14nWz$>g0uc*P#4+ z@;qwvgYruUJJ64#P>RL_mZX0}X(^5NrPIAo`q}JPk3%pN*XC#y2GKy#E+MzP{FhL~ zn`sKMKOtv|@!jX>d`Q}MEHOEHGbA4_SCV0yh18zYob4~ZL8O@b3&u{rN{(V&@^I_?@`- z_hm?Gai{wv{S-1^V1tLJUO>!e&y}h>t&rfOv}0Dl5W=>wUR2Qi3DJu}c4wIBAoQp~ zhusz#@VVfyr7L+3qNa&y4S^RSHly7CM|3JAv8wE4Tq%b%#m$i~ObQ{x>`JTTW-kbH z?C&`idkK>LZulMd`vz%X;vpS?Acolg(=hBKsC4o_$*SrFuMzEEmQoKPR5ATdf8Qd6 zO%zr|U)Tg;TbJzxx7J~i>bc(!W)wnV_(t=&Dme%k%DHN4dl6EF3MGgybih($@Xavw z97JH|_dD$xA-aa9MaGE2P~}X>&!Kp1T5{WpL1>#HG)AXY&mQ}5EMqO0|%UACo{dKz? z%Xc5M4oVWm3iiKZF;%~gV zHa;iZhvc>QBtgPeDOLk&ZXnYpw6))22eQ-@%Qja}C>pQ3T%?}|X*;>PI?IVrSh)6L zC=!zL-ka11Mb4uFUtT9d{;%4J8HE#&&wBGk8N7jlzsP)znyCu3Bd1vfSehHz8!&{!=z5)E#9c#$c={zFKz8{~5 zla%vL_N|wpijZ-yW~eaiHDt;~o>N`?3>hJ`JD%;?3Tf5i()#<%A&oMKEm0DXs%Khb z*}Vj*YpIxY$bN%=s=KC^&rlRA*WEuJHNlNB=5vMmq=e;;%K%e8h$nE2oKDG?~=;Q!iNU(f&Y_zNW6^t#=A$Pkk8eEcuop5x>F^zZW+ zJYQnh`X>(2X&xQ+L4P2U|L+3VL*&BW#O}qp76Jn?mEgPC;8*nb1#biP(SZpooS29p=xe|K{;H z0q+;_U;qD`PlbQ=B^j@qv43aRzX8(m^~heEuV0Xf&rdu)AE9gOt^&bp^>PGL|J#p2 z0`}j0b7TLlKf?6a;tqlNT3iT%-QTKZMgX_9c$J3yUEjKsjCMi(T0+V~CZ130^{?RV z-%3V?ytTZ$19^*4f39yYgM!oAloe-&B77az>*9a;PmTBA5z^Q4i-6ZFQ+$3KFs;9F zRYSns*2bZQ;p>3?m$wTT?qm7=u!-_2yuN$j@#cXQuf^Fc7P>YrX)I-}oS5S4L&EE; z9z_4mx|1%DOsT>#2UVbKtqf^E;ppE&^apa*=3^5+&wn8I@A6!9fkL~w<2?fl|Gd(X z{}SN8dbb26c%G6e#h;SOP=eQ$QoJ7I;Cc05eM0bG{|upMEzVRSdo8|Z@#`EE;rUXs zHlE#3oYY4h;!HVTfAPP%(&q$uXf2WjK^^?Uf)Y){{FKMpP$1}ymtE>WcwbS2sUrS&jILP z9g17qm&fsX+eSG*{vDsezjMNW`<>*!=l?Z5d*#oJ9@+ki>nu@=X+_h`QPmN z)&phuK1rr*e#xuZP(o>qat47?N_FPf^oA0Szloqt`JO@9?RKmFd#-e?jy!>~wTm#m zE@t?-A_g+vPdCJncR?I}ZiuE7g)Ixz|KebMyq;@yARA=xIKkruTZ^ib1n^ml|0R4M z^?}5-JZ;0{RqfyTiM*CaYEX!OUd*$W_)yNoCVFGfES_KLP`P(4fT7}Y3dhUy4=DMv zOs?6(3FKPtCH~kwQ07B(ciXl%cs%`tvbA>o|DAu<&sA&7C``cXUmavpbN}rrer~eC z>z5di|2r^2F}{u^_G_;@q_hP~A?q~rbmw~w#I*>ijyXd&uv)LWkacg|QpxBQp4dT?QFKhXMD z5AnMFZ$J0`uP&~iw}Ety+5C%~AcV-;OLR^`a-0uuM=u>@*(I!I+08fH1b5O#>W1Ahc*z_VA8a2>+pA>*~V}UWD!&uCLj_?{UbB717(^ z6aP~@*r0{-eJ(~LlsW@4+|0}0Wof__mh59wp_)Lx@`N{oi%j`mP8@lXeGdxLgyTVP z38D_|US6p!gS3q6d2hp=A^99@4d0A6PhH44i0zIpeM9$e-2g=r*4g4HLJ!cWf} zq0pq*Nk4o8km$9T_}q2^Nt8eJnx_l+c#_>B7Wu&~Qnbe+)EcaXD(4-Ox*>G6jr*Om zEcksCY}cti37#$-ERVhogFV0a(dS{YV8^sMsglnTZ0X#~u!vxAXC8fC^S%+>-RgW+ znnmDJm+KqpLnUz0#YA~5a35TxO(SZ(Pyj1ON1^xH55ZJ)`{YiG7I3uI-f$x}2<*8} za;r#A;fee8$%I+I$ z&VW~3&Y!0zbHU%o`o@e)BsiVt65eoW931W#os2%(3Fa2r&c?Jy!S=-WV@C%Q!0g}) z<$DL~;DU6cT?N(%+C>*QLIfSLz8^SnE}v6Lkaw zORA^mcIbd^=3$%b<4?g%BD?ZjizZldh&uk6sR66Ti7k1PC%|iQpTz37BruJ6VckBq z2SSgXiDpdk2anN~>SLWB!Txhd@a8?=;L^cO;=G0qU|Dh~gZMH7Tvw08die-|`OOzR ze;o9|K=0tIVtq+)JtcI3`RQQ@yMLF_AZ-{-uhG!VD6PWz(Ca~rA0@!pW^kXm^(U}> z@jmXyARk=xZM|YwW(_7BTKRqt)xo}Z+jGWGhH&rmA!F#|j^KTR#kHo@6}v#dvUodb0uwMx+jbI^5JC}}fw2Iq0P zj^`JIK<==Aq^C9qsNHs!x)JdJRN0CQD+{JTrC`f8|66QuiZzK%;EXp|u=MpQPjZ8X zV~*Mr?FCS(zdSE4#sp_i1T#*WyMy)Ez$D}R0r0NfCQWS|1TGREWkW8%hx329`bk+Q z!Fu-4AJ4}}L5+O&l6rz4Trl|<#|2to%{%m`^2-ufI$d~sFdzr?e-;Hi-}oEUE=WD= z4Q&Ub)V?2EZkmCSfsT%4cPzL(v8el`cnB<}vN~UDhJ&5Pi+!&H+92lAY%c41&X1G$q0+2>w?Vley%X!FfJIe3X9~BJ0kDbljkY zfNae)L5myU%kyGL&OZTC79Q(7%JPGhhy6m*)U%XxiPnixni=qN+#`HMq#b;OO;4D- zp9a_5c_;oLRd7>1f7n-B0&J!mJ$+&T%*E(lJM2BafTfnr@8F8l;L0hHvw>$D zI8C-O7QLMYkAq&`yI*_-$A+@|AKv$a{U%86=I11+519s)L=YpMbO1^OS z4RBo)81m%w1g}GMGx^4^|L!lp&u9jMi?-gJM`;iEY_!Ehcp5@PZ!;{FSb>|(p9|eAb>NhMp6Q$j1xHS?q(_vzbksAQ z+}z0zwqg-pEiB#;q&s!4T{{Ro_Q%IuYI+2hwx^IkeO-mns(XXkCQl)_qJlIytP25` zf1Q&%%?0iw(*noEX2GfCshM;9KJc_!Z1|Dp4(5KeygR!h!OkhgsedyUm__QBkCcbN z#qdQo5yP`!=|3#owyX*k>68T9QLyI2=xwKef}VYYWzvK_nDQuvSF(qKfrfn2T=a7=F%KHM^!^LDuGAM@ zGoA;_poIqy4pa6;7T!%S`WeBtV=s$Cb^y5FJC4;?$3j4%HgWd^8w74K)|_>*f`lWw zjsovfz#}kxn&VhlG5w- zGc$;{zV2ir+zo-xHs{8>D1rNmR&yf@FW8&kxIba?2AmXw1*{Soz}A*^z~SH(u=(MF zCK3k0hMg+oQxOR)INdf~|~Nig_)lny%$f2%?2vD4G3C6~d8Rz{b%`XDSH9P18qtK5(P*_at8^goTze6u z(hL57TGcLC%Y%onjFkPjJ|wlTd>k8hgpfU3bESe0g45f+9+$@s;2PGKsql&poNGuq z5ii&vzx09Ph))BgDnBrG_Ui_VmplQiv@YQHBIA3gXE!(;F}zQ9ybR8jVpie%B)7c4i=9>_-X<4Z&TEwo8Dx z1QzdRc1bjAf(h%X5Z!(Y(8;=DfI37e`@&h5q?5~_H<%&A5x5&v?`T|l@f$(dK^cwV zj9=igk(bK-UK$w4nwfeFY=8tFIr-m6?+3h(rvTd)^3y+Q)uP?0SRVDY>8X|0F(n9rvLx| literal 0 HcmV?d00001