diff --git a/Machine Learning for Economics and Finance/python-intro/Jupyter_IDE.png b/Machine Learning for Economics and Finance/python-intro/Jupyter_IDE.png new file mode 100755 index 0000000..4db4942 Binary files /dev/null and b/Machine Learning for Economics and Finance/python-intro/Jupyter_IDE.png differ diff --git a/Machine Learning for Economics and Finance/python-intro/Jupyter_Markdown_ex.png b/Machine Learning for Economics and Finance/python-intro/Jupyter_Markdown_ex.png new file mode 100755 index 0000000..d0e9d24 Binary files /dev/null and b/Machine Learning for Economics and Finance/python-intro/Jupyter_Markdown_ex.png differ diff --git a/Machine Learning for Economics and Finance/python-intro/Spyder_IDE.png b/Machine Learning for Economics and Finance/python-intro/Spyder_IDE.png new file mode 100755 index 0000000..708daa4 Binary files /dev/null and b/Machine Learning for Economics and Finance/python-intro/Spyder_IDE.png differ diff --git a/Machine Learning for Economics and Finance/python-intro/python-intro.ipynb b/Machine Learning for Economics and Finance/python-intro/python-intro.ipynb new file mode 100755 index 0000000..b6520f8 --- /dev/null +++ b/Machine Learning for Economics and Finance/python-intro/python-intro.ipynb @@ -0,0 +1,3883 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "4d38f6de-7234-4844-aac0-22d01ced7c69", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "\\vspace{-4cm}\n", + "\\begin{center}\n", + " \\LARGE{Machine Learning for Economics and Finance}\\\\[0.5cm]\n", + " \\Large{\\textbf{Python Intro}}\\\\[1.0cm]\n", + " \\large{Ole Wilms}\\\\[0.5cm]\n", + " %\\large{July 25, 2024}\\\\\n", + " \\large{\\today}\\\\\n", + "\\end{center}" + ] + }, + { + "cell_type": "raw", + "id": "13be77f3-44f0-4983-b4cb-bd3e4b5dba8b", + "metadata": {}, + "source": [ + "\\setcounter{secnumdepth}{0}" + ] + }, + { + "cell_type": "raw", + "id": "a4c564a3-8712-4601-84b4-72b51df8bbbf", + "metadata": {}, + "source": [ + "\\tableofcontents" + ] + }, + { + "cell_type": "markdown", + "id": "040dc2a4-910e-4cf5-9d1e-62fe7d0a8efd", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [], + "user_expressions": [] + }, + "source": [ + "## Important Instructions\n", + " - The purpose of this introduction is to get to know Python by some basic programming exercises.\n", + " - In case you struggle with some problems, please post your questions on the OpenOlat Forum.\n", + " - Don’t worry if you struggle at the beginning. Throughout the course, these programming concepts will become easier to understand.\n", + " - Consistent practice with basic Python tasks is key to mastering the language. I strongly encourage all students to engage in regular exercises and to proactively tackle future tasks to strengthen your skills and build confidence in your abilities." + ] + }, + { + "cell_type": "raw", + "id": "d1a6cda1-d74f-4a81-8c17-cdd83a0dae17", + "metadata": {}, + "source": [ + "\\newpage" + ] + }, + { + "cell_type": "markdown", + "id": "9619670b-219b-4e35-a4ef-2357ed837cb1", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Recommended integrated development environments (IDE's) for Python" + ] + }, + { + "cell_type": "markdown", + "id": "18b26857-4da9-4354-b66a-296fdb52f118", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### **Jupyter** (notebooks) - Recommended for this lecture" + ] + }, + { + "cell_type": "markdown", + "id": "fdaa6581-855c-403e-a0db-931f431c90a0", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "![Spyder_IDE](./Jupyter_IDE.png)\n", + "\n", + "[Jupyterlab direct website-IDE](https://jupyter.org/try) try it out!\n", + "\n", + "or\n", + "\n", + "[Jupyter IDE weblink](https://jupyter.org/) for installation.\n", + "\n", + "**Jupyter** is a graphical user interface in a web browser. It is an open-source application for creating and sharing documents that contain code, equations, graphical representations and text. It is possible to include and execute different language codes in Jupyter notebooks (e.g. **Code** cells like *Python*, **Markdown** cells or even raw cells). " + ] + }, + { + "cell_type": "markdown", + "id": "b2644b5d-838e-43f1-b77b-ee8311e30c65", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "**Markdown cell (syntax example):**\n", + "\n", + "![Spyder_IDE](./Jupyter_Markdown_ex.png)" + ] + }, + { + "cell_type": "markdown", + "id": "93c8fd5f-d08c-43bf-9c38-c39f9dab412b", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "**Note**\n", + "\n", + "Jupyter Notebook can be launched by Jupyter Desktop or via terminal. While executing the **.ipynb** file, the default web browser launches and a server will be launched as well as a Python process (a kernel). If the browser does not launch automatically, the page that should have been displayed can be accessed at the following address: [http://localhost:8890/tree?](http://localhost:8890/tree?)." + ] + }, + { + "cell_type": "markdown", + "id": "ac824a0f-9331-434b-973d-8b93d1022b7d", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### **Spyder**" + ] + }, + { + "cell_type": "markdown", + "id": "39f5c003-5376-483c-8acc-8bf2c66bc20e", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "![Spyder_IDE](./Spyder_IDE.png)\n", + "\n", + "\n", + "[Spyder IDE weblink](https://www.spyder-ide.org/) for installation.\n", + "\n", + "Spyder is a single integrated development environment (IDE) that includes both an editor and a console. This is what Spyder offers, with many additional features, such as project management, variable inspector, file explorer, command log, debugger, etc." + ] + }, + { + "cell_type": "raw", + "id": "3ba17c73-a83f-43fa-8f29-3b773e25887b", + "metadata": { + "tags": [] + }, + "source": [ + "\\newpage" + ] + }, + { + "cell_type": "markdown", + "id": "dd4a52d1-4dce-4415-9824-ab22305a54d4", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "# Exercise 1: Getting started" + ] + }, + { + "cell_type": "markdown", + "id": "87902d82-5336-456b-bec8-403530c75f00", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "## Defining variables" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "f1cf1749-9e5b-434a-8f45-5d63db20ee2a", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "5\n" + ] + } + ], + "source": [ + "x = 5\n", + "y = 7\n", + "print(x)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "4c658a6a-1c6a-4350-9c4f-6afdd4dbaa7c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "35\n" + ] + } + ], + "source": [ + "z = x * y\n", + "print(z)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "cb851b64-3518-406d-be06-46721a6eda01", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "78125\n" + ] + } + ], + "source": [ + "a = x**y\n", + "print(a)" + ] + }, + { + "cell_type": "markdown", + "id": "734be15a-cadb-46ae-9657-9650bcc8e24a", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "## Jupyter Extensions and Python packages" + ] + }, + { + "cell_type": "markdown", + "id": "977bfce5-b072-4ca9-b43b-5e9a44ce26d6", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Jupyter Extensions (ipywidget recommendation)" + ] + }, + { + "cell_type": "markdown", + "id": "444410ca-99a8-4ab1-8645-972e96c5c493", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "**Table of Contents extension**: \n", + "\n", + " - Extension name: `@jupyterlab/toc-extension`\n", + " \n", + " Jupyter extension to display the table of contents." + ] + }, + { + "cell_type": "markdown", + "id": "f8f35f6c-b068-48d7-9aa3-99b6b35e186a", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "**Variable Inspector**: \n", + "\n", + " - Extension name: `@lckr/jupyterlab_variableinspector`\n", + "\n", + "When you use the *variableinspector* it will allow you to get a side-by-side overview of your variables besides your code." + ] + }, + { + "cell_type": "markdown", + "id": "6a9e9e39-2553-43ad-8457-7b216ddc06b4", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "## Python packages" + ] + }, + { + "cell_type": "markdown", + "id": "bd050ec0-309a-405c-9726-7c7d2bd3622a", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "## Installing missing packages" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "85514554-b954-41f7-8cb0-c26cddbcee84", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "skipping\n" + ] + } + ], + "source": [ + "%%script echo skipping\n", + "python -m pip install \"package name\"" + ] + }, + { + "cell_type": "markdown", + "id": "2a78a12d-b9fc-45fd-bbf3-00823cba53c8", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "## Loading packages" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "d232b45d-c602-49bc-ad01-db1fdb26b806", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "skipping\n" + ] + } + ], + "source": [ + "%%script echo skipping\n", + "import \"package name\"" + ] + }, + { + "cell_type": "markdown", + "id": "17fbe6d6-1231-42c6-9901-3a591ea06f2d", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Some basic functions in Python are loaded by default. Others require a module to be loaded. These **modules** are files that contain *definitions* as well as *instructions*. \n", + "**Package** are defined as a combination of modules that offer a set of functions. Among the packages that will be used in these notes are:\n", + "\n", + "- `NumPy`, a fundamental package for scientific calculations\n", + "- `pandas`, a package allowing easy data manipulation and analysis\n", + "- `Matplotlib`, a package allowing us to create graphics.\n", + "\n", + "To load a module (or a package), we use the command import. For example, to load the package `numpy`:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "5a7ecc14-ee15-4c53-80a1-b0a16c96f943", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "11.266065387038703\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "\n", + "b = np.log(a) # here we use the numpy package to call the function log()\n", + "print(b)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "68a714f4-3a75-4f80-b623-41ef94e32336", + "metadata": {}, + "outputs": [], + "source": [ + "# remove x\n", + "del(x)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "89022ebb-4cbb-4391-9138-18a5971d6950", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In is \n", + "NamespaceMagics is \n", + "Out is \n", + "a is \n", + "b is \n", + "exit is \n", + "get_ipython is \n", + "json is \n", + "np is \n", + "open is \n", + "quit is \n", + "sys is \n", + "y is \n", + "z is \n" + ] + } + ], + "source": [ + "#dir() # show all objects\n", + "all_variables = dir() \n", + " \n", + "# Iterate over the whole list where dir( ) is stored. \n", + "for myvariable in all_variables: \n", + " # Print the item if it doesn't start with '_' \n", + " if not myvariable.startswith('_'): \n", + " myvalue = eval(myvariable) \n", + " print(myvariable, \"is\", type(myvalue))" + ] + }, + { + "cell_type": "markdown", + "id": "a9ded457-d5e5-41ad-98a3-c6d1e4272850", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## The Help System" + ] + }, + { + "cell_type": "markdown", + "id": "b4611c08-d44f-4a10-ace8-25512ba78b2a", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The help can be accessed using different syntaxes:\n", + "\n", + "- `?` : provides an introduction and an overview of the features offered in Python (you leave it with the *ESC* key)\n", + "- `object?` : provides details about object (for example `x?` or `plt.plot?`)\n", + "- `object??` : more details about object\n", + "- `%quickref` : short reference on Python syntaxes\n", + "- `help()` : access to the Python help system.\n", + "\n", + "Note: the tabulation key on the keyboard allows not only *autocompletion*, but\n", + "also an *exploration of the content* of an object or module.\n", + "\n", + "In addition, when it comes to finding help on a more complex problem, the right\n", + "thing to do is not hesitate to search on a search engine, in mailing lists and of course\n", + "on the many questions on [Stack Overflow](https://stackoverflow.com/) ." + ] + }, + { + "cell_type": "markdown", + "id": "b1995776-70fa-4683-bd16-8d580d656597", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Working with vectors" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "fbdc8241-9cb3-40cc-b638-f001d2dd0fa9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1, 4, 5]\n" + ] + } + ], + "source": [ + "x = [1, 4, 5]\n", + "print(x)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "448ce24c-f126-490e-ad5d-db07a5eac040", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "10" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(x) # sum all elements" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "f0a44893-77a5-4ceb-b6b2-c74f4a524053", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[0;31mSignature:\u001b[0m \u001b[0msum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miterable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m/\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstart\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mDocstring:\u001b[0m\n", + "Return the sum of a 'start' value (default: 0) plus an iterable of numbers\n", + "\n", + "When the iterable is empty, return the start value.\n", + "This function is intended specifically for use with numeric values and may\n", + "reject non-numeric types.\n", + "\u001b[0;31mType:\u001b[0m builtin_function_or_method" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Same as the help(sum) function\n", + "?sum" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "602635ae-1935-43d3-85b5-a2e749f07b4f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3.3333333333333335" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.mean(x) # compute mean" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "afbbcf46-7cad-4db8-b6c3-d51cc9d486c8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.699673171197595\n" + ] + } + ], + "source": [ + "sd_x = np.std(x) # compute standard deviation and store in sd_x\n", + "print(sd_x)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "30750f8b-51dc-4fd6-aae8-ae43bbb6a319", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[2, 3, 1]\n" + ] + } + ], + "source": [ + "y = [2, 3, 1]\n", + "print(y)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "ea0435ef-f7a5-464f-bce6-1e64d90be698", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1, 4, 5, 2, 3, 1]\n" + ] + } + ], + "source": [ + "z = x + y\n", + "print(z)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "df17a23d-f44e-40a8-8008-eecdd464519b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[2, 12, 5]\n" + ] + } + ], + "source": [ + "#z = x * y # Element by element multiplication\n", + "# Multiplying list 'a' elementwise with list 'b':\n", + "z = list(map(lambda a, b: a * b, x, y))\n", + "print(z)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "bd64ba52-2893-4f8c-a41f-8f15cf920111", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(z)" + ] + }, + { + "cell_type": "markdown", + "id": "3589e7dc-9e7f-498c-8bc4-9f84c462e356", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Special “Numbers”" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "1b7721cf-6f1a-46cb-bf68-0d09b1a64202", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "inf\n", + "-inf\n" + ] + }, + { + "data": { + "text/plain": [ + "0.0" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Example 1:\n", + "# 1/0 <--- This is not allowed since it represents a division by zero # Infinity\n", + "# instead:\n", + "print(float('inf'))\n", + "print(float('-inf'))\n", + "\n", + "1/float('inf')" + ] + }, + { + "cell_type": "markdown", + "id": "8b1e0b2f-1d1d-4404-b264-b217e6929543", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Types of Data class types and vector modes" + ] + }, + { + "cell_type": "markdown", + "id": "0e4a5a82-8e0e-4640-9f60-19d037651270", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "In Python, there are several class types that are used to represent different types of data objects. Some of the commonly used class types include: *Integer* (Numeric), *Float*, *String* (Character), *Factor* and *Logical*." + ] + }, + { + "cell_type": "markdown", + "id": "9e358ba4-4ec9-4f53-9ce4-f0c01c491eba", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Numeric: Represents integer (real) values" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "99c21848-fc7d-4f53-8327-31f3337c4493", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "x = [1, 4, 5]\n", + "print(type(x))" + ] + }, + { + "cell_type": "markdown", + "id": "5d065a70-7efc-404a-a40d-95625887be72", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Logical: Represents logical (Boolean) values `True` or `False`" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "519e3137-845d-4b23-baac-0177d536a7c5", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Check if x is a vector of integers\n", + "all(isinstance(i, (int)) for i in x)" + ] + }, + { + "cell_type": "markdown", + "id": "1e532cab-ea7c-4a88-bd57-85280b994845", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Float: Represents floating Point Numbers" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "445142d6-3eb7-4411-8045-193f8a22bf71", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x2 = [1.2, 2, 3]\n", + "\n", + "# Check if x2 is a vector of integers\n", + "all(isinstance(i, (int)) for i in x2)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "ce086f3a-9684-4ff6-9094-63fd03f33ab2", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "int64\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "\n", + "x3 = np.array([1, 4, 5]) # Example array\n", + "print(x3.dtype) # getting the data type of the array" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "bdb41144-93dd-4600-be27-f005457ffac7", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Check if x2 is a vector of floats\n", + "all(isinstance(i, (float)) for i in x2)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "93e5bc23-ca5f-4eb3-afae-a36c504126a4", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Check if x2 is a vector of integers and floats\n", + "all(isinstance(i, (int, float)) for i in x2)" + ] + }, + { + "cell_type": "markdown", + "id": "a151cd59-4a0f-4d06-95f3-3c0c79fa2e2f", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Conversions" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "e69eae53-d0c0-456f-9986-7691a20b42be", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "x4 = \"3\"\n", + "x_int = int(x4)\n", + "print(type(x_int))" + ] + }, + { + "cell_type": "markdown", + "id": "3f055fa6-2bfa-4560-892b-01c225b03af9", + "metadata": { + "user_expressions": [] + }, + "source": [ + "`int()`, `float()`, and `str()` convert values to integers, floats, and strings. Example: Converting a float to an integer can be done with `int(1.2)`, which results in $1$." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "e41f438b-bda2-48e7-9a4f-1c5e3c4873fa", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1, 2, 3]\n" + ] + } + ], + "source": [ + "x2 = [int(i) for i in x2] # Convert all elements in x2 to integers\n", + "print(x2)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "2ba742c2-4eab-43c0-9c16-b72788cd1000", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Check again if x2 is a vector of ints\n", + "all(isinstance(i, (int)) for i in x2)" + ] + }, + { + "cell_type": "markdown", + "id": "4fe74f54-162b-4b4d-9536-ca0765b71c7f", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### String: Represents characters (text) values" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "559be4b1-105e-43d3-80de-23e5ba8a1de6", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Hello World\n" + ] + } + ], + "source": [ + "t = \"Hello World\" # single line text\n", + "print(t)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "d07cdb9b-a9c3-425e-b74a-4693f862dd81", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Hello World\n" + ] + } + ], + "source": [ + "t2 = \"\"\"Hello \\\n", + "World\"\"\" # multi line text\n", + "print(t2)" + ] + }, + { + "cell_type": "markdown", + "id": "a8b479e3-c2f3-445e-9954-484b1e07570d", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Factor: Represents categorical variables with a fixed number of levels" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "2092c978-2d1e-4ffc-bea8-be9233bf40fc", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unique entries: 0 [Male, Female, Diverse]\n", + "dtype: object\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "gender = pd.DataFrame(['Male', 'Female', 'Diverse', 'Male'])\n", + "\n", + "print('Unique entries: ', pd.Series({c: gender[c].unique() for c in gender}))" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "f91bb68a-af9a-42de-bccb-440558943ab7", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Male\n", + "1 Female\n", + "2 Diverse\n", + "3 Male\n", + "dtype: category\n", + "Categories (3, object): ['Diverse', 'Female', 'Male']" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "g = pd.Series(['Male', 'Female', 'Diverse', 'Male'])\n", + "cat_s = g.astype('category')\n", + "\n", + "# Checking the data types and unique entries (levels)\n", + "cat_s\n", + "\n", + "# Alternatively:\n", + "# cat_s.cat.categories" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "398c240a-786b-48cc-aaad-ce43322c3734", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Male\n", + "1 Female\n", + "2 Diverse\n", + "3 Male\n", + "dtype: category\n", + "Categories (4, object): ['Male', 'Female', 'Diverse', 'Other']" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#cat_s.cat.codes\n", + "\n", + "actual_categories = ['Male', 'Female', 'Diverse', 'Other']\n", + "cat_s2 = cat_s.cat.set_categories(actual_categories)\n", + "cat_s2" + ] + }, + { + "cell_type": "markdown", + "id": "45741463-ee0b-4cf5-9aa9-1f7412266544", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Difference between an `none` and `NaN`" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "16f51a30-4f59-4062-b7d1-cdfc2da63808", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "\n", + "a = None # setting a to an empty object (placeholder)\n", + "b = np.nan # setting 'b' to 'NaN'\n", + "# b = float(\"nan\") # equal to the line above\n", + "\n", + "print(np.isnan(b)) # checking if 'a' equals 'NaN'" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "f7c1db4f-8434-4aa2-ace1-7d8e505aadb9", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "None\n" + ] + } + ], + "source": [ + "print(a)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "9763523b-201f-4dfe-971d-f8c652aaa158", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "print(type(a)) # The None object is a neutral variable, with “null” behavior (an empty placeholder)." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "b9789d53-da12-4465-b9ac-fa0548b6b3ef", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "print(type(b)) # Representation as a floating point number of 'Not a Number'." + ] + }, + { + "cell_type": "raw", + "id": "cf157c1e-3309-457b-95b3-6cf2d6dbd7a1", + "metadata": {}, + "source": [ + "\\newpage" + ] + }, + { + "cell_type": "markdown", + "id": "bef75ffd-62e1-45d8-af23-466e5c7f8489", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "# Exercise 2: Loading and changing Data" + ] + }, + { + "cell_type": "markdown", + "id": "0f520dc7-d5dc-4d86-a2e9-b5b440fbc04d", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Get / Set working directory path" + ] + }, + { + "cell_type": "markdown", + "id": "6c3e13cd-4c00-4497-b021-31bfd9893d7e", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "## Loading datasets from package" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "1002f737-6f34-4f74-9437-383a9318ed62", + "metadata": {}, + "outputs": [], + "source": [ + "from ISLP import load_data\n", + "\n", + "# Load Hitters dataset from the ISLP package\n", + "Hitters = load_data('Hitters')" + ] + }, + { + "cell_type": "markdown", + "id": "7205d9f8-5739-4b1f-84a0-7930929a8c36", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Loading datasets from external files" + ] + }, + { + "cell_type": "markdown", + "id": "7e1c2c2f-bd4f-4b8c-b077-eb228b158869", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Loading data from a CSV file" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "74d34d4d-b6a9-44e0-b9ce-6fd11ef22d7e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "skipping\n" + ] + } + ], + "source": [ + "%%script echo skipping\n", + "import pandas as pd\n", + "df = pd.read_csv(\"PATH/filename.csv\",\n", + " sep=\";\",\n", + " index_col=0,\n", + " decimal=\",\")\n", + "print(df.head())" + ] + }, + { + "cell_type": "markdown", + "id": "3e84803f-4828-4a54-a882-71094c66a729", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "## Creating DataFrame" + ] + }, + { + "cell_type": "markdown", + "id": "1085691c-c124-4ed7-b867-bb131d046c7c", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Group by example" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "743d233c-875b-4830-8d06-d37c2f5f7698", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
workersengineers
departement
Accounting222877762
Finance2517495
I.T.249594931
Marketing198454004
R&D168512754
Sales1911512
\n", + "
" + ], + "text/plain": [ + " workers engineers\n", + "departement \n", + "Accounting 22287 7762\n", + "Finance 2517 495\n", + "I.T. 24959 4931\n", + "Marketing 19845 4004\n", + "R&D 16851 2754\n", + "Sales 1911 512" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "# Example data\n", + "unemployment = pd.DataFrame({\n", + " \"departement\" : [\"R&D\", \"I.T.\",\n", + " \"Accounting\", \"Marketing\",\n", + " \"Sales\", \"Finance\"]*2,\n", + " \"workers\" : [8738, 12701, 11390, 10228, 975, 1297,\n", + " 8113, 12258, 10897, 9617, 936, 1220],\n", + " \"engineers\" : [1420, 2530, 3986, 2025, 259, 254,\n", + " 1334, 2401, 3776, 1979, 253, 241]\n", + " })\n", + "\n", + "# unemployed workers (thereof engineers) by department\n", + "unemployment.loc[:,[\"departement\", \"workers\", \"engineers\"]].groupby(\"departement\").sum()" + ] + }, + { + "cell_type": "markdown", + "id": "90fa9614-b6be-4963-a6ea-61e5c2331364", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Data inspection" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "88da9bfd-ee49-43be-abbc-b2e8a6e5c841", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 322 entries, 0 to 321\n", + "Data columns (total 20 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 AtBat 322 non-null int64 \n", + " 1 Hits 322 non-null int64 \n", + " 2 HmRun 322 non-null int64 \n", + " 3 Runs 322 non-null int64 \n", + " 4 RBI 322 non-null int64 \n", + " 5 Walks 322 non-null int64 \n", + " 6 Years 322 non-null int64 \n", + " 7 CAtBat 322 non-null int64 \n", + " 8 CHits 322 non-null int64 \n", + " 9 CHmRun 322 non-null int64 \n", + " 10 CRuns 322 non-null int64 \n", + " 11 CRBI 322 non-null int64 \n", + " 12 CWalks 322 non-null int64 \n", + " 13 League 322 non-null category\n", + " 14 Division 322 non-null category\n", + " 15 PutOuts 322 non-null int64 \n", + " 16 Assists 322 non-null int64 \n", + " 17 Errors 322 non-null int64 \n", + " 18 Salary 263 non-null float64 \n", + " 19 NewLeague 322 non-null category\n", + "dtypes: category(3), float64(1), int64(16)\n", + "memory usage: 44.2 KB\n" + ] + } + ], + "source": [ + "Hitters.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "10f4f770-c354-4138-903d-0cf37de5ca0f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AtBatHitsHmRunRunsRBIWalksYearsCAtBatCHitsCHmRunCRunsCRBICWalksLeagueDivisionPutOutsAssistsErrorsSalaryNewLeague
02936613029141293661302914AE4463320NaNA
131581724383914344983569321414375NW6324310475.0N
2479130186672763162445763224266263AW8808214480.0A
3496141206578371156281575225828838354NE200113500.0N
43218710394230239610112484633NE80540491.5N
\n", + "
" + ], + "text/plain": [ + " AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns \\\n", + "0 293 66 1 30 29 14 1 293 66 1 30 \n", + "1 315 81 7 24 38 39 14 3449 835 69 321 \n", + "2 479 130 18 66 72 76 3 1624 457 63 224 \n", + "3 496 141 20 65 78 37 11 5628 1575 225 828 \n", + "4 321 87 10 39 42 30 2 396 101 12 48 \n", + "\n", + " CRBI CWalks League Division PutOuts Assists Errors Salary NewLeague \n", + "0 29 14 A E 446 33 20 NaN A \n", + "1 414 375 N W 632 43 10 475.0 N \n", + "2 266 263 A W 880 82 14 480.0 A \n", + "3 838 354 N E 200 11 3 500.0 N \n", + "4 46 33 N E 805 40 4 91.5 N " + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Show the first five rows - with all columns\n", + "Hitters.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "f32badc4-6348-4bd7-a59f-f3736a5ea7d1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AtBatHitsHmRunRunsRBIWalksYearsCAtBatCHitsCHmRunCRunsCRBICWalksLeagueDivisionPutOutsAssistsErrorsSalaryNewLeague
02936613029141293661302914AE4463320NaNA
131581724383914344983569321414375NW6324310475.0N
2479130186672763162445763224266263AW8808214480.0A
3496141206578371156281575225828838354NE200113500.0N
43218710394230239610112484633NE80540491.5N
\n", + "
" + ], + "text/plain": [ + " AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns \\\n", + "0 293 66 1 30 29 14 1 293 66 1 30 \n", + "1 315 81 7 24 38 39 14 3449 835 69 321 \n", + "2 479 130 18 66 72 76 3 1624 457 63 224 \n", + "3 496 141 20 65 78 37 11 5628 1575 225 828 \n", + "4 321 87 10 39 42 30 2 396 101 12 48 \n", + "\n", + " CRBI CWalks League Division PutOuts Assists Errors Salary NewLeague \n", + "0 29 14 A E 446 33 20 NaN A \n", + "1 414 375 N W 632 43 10 475.0 N \n", + "2 266 263 A W 880 82 14 480.0 A \n", + "3 838 354 N E 200 11 3 500.0 N \n", + "4 46 33 N E 805 40 4 91.5 N " + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Show the first five rows - with all columns\n", + "Hitters.iloc[[0, 1, 2, 3, 4]] # iloc[[ROW][COLUMN]] " + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "79dc9a63-4230-4fcb-adaa-cdd51c9bea88", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AtBatHitsHmRunRunsRBIWalksYearsCAtBatCHitsCHmRunCRunsCRBICWalksLeagueDivisionPutOutsAssistsErrorsSalaryNewLeague
02936613029141293661302914AE4463320NaNA
131581724383914344983569321414375NW6324310475.0N
2479130186672763162445763224266263AW8808214480.0A
3496141206578371156281575225828838354NE200113500.0N
43218710394230239610112484633NE80540491.5N
\n", + "
" + ], + "text/plain": [ + " AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns \\\n", + "0 293 66 1 30 29 14 1 293 66 1 30 \n", + "1 315 81 7 24 38 39 14 3449 835 69 321 \n", + "2 479 130 18 66 72 76 3 1624 457 63 224 \n", + "3 496 141 20 65 78 37 11 5628 1575 225 828 \n", + "4 321 87 10 39 42 30 2 396 101 12 48 \n", + "\n", + " CRBI CWalks League Division PutOuts Assists Errors Salary NewLeague \n", + "0 29 14 A E 446 33 20 NaN A \n", + "1 414 375 N W 632 43 10 475.0 N \n", + "2 266 263 A W 880 82 14 480.0 A \n", + "3 838 354 N E 200 11 3 500.0 N \n", + "4 46 33 N E 805 40 4 91.5 N " + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Show the first five rows - with all columns\n", + "Hitters.iloc[0:5,:]" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "ad677e67-c636-419c-b4b7-28528f0a48d5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countmeanstdmin25%50%75%max
AtBat322.0380.928571153.40498116.0255.25379.5512.00687.0
Hits322.0101.02484546.4547411.064.0096.0137.00238.0
HmRun322.010.7701868.7090370.04.008.016.0040.0
Runs322.050.90993826.0240950.030.2548.069.00130.0
RBI322.048.02795026.1668950.028.0044.064.75121.0
Walks322.038.74223621.6393270.022.0035.053.00105.0
Years322.07.4440994.9260871.04.006.011.0024.0
CAtBat322.02648.6832302324.20587019.0816.751928.03924.2514053.0
CHits322.0717.571429654.4726274.0209.00508.01059.254256.0
CHmRun322.069.49068386.2660610.014.0037.590.00548.0
CRuns322.0358.795031334.1058861.0100.25247.0526.252165.0
CRBI322.0330.118012333.2196170.088.75220.5426.251659.0
CWalks322.0260.239130267.0580850.067.25170.5339.251566.0
PutOuts322.0288.937888280.7046140.0109.25212.0325.001378.0
Assists322.0106.913043136.8548760.07.0039.5166.00492.0
Errors322.08.0403736.3683590.03.006.011.0032.0
Salary263.0535.925882451.11868167.5190.00425.0750.002460.0
\n", + "
" + ], + "text/plain": [ + " count mean std min 25% 50% 75% \\\n", + "AtBat 322.0 380.928571 153.404981 16.0 255.25 379.5 512.00 \n", + "Hits 322.0 101.024845 46.454741 1.0 64.00 96.0 137.00 \n", + "HmRun 322.0 10.770186 8.709037 0.0 4.00 8.0 16.00 \n", + "Runs 322.0 50.909938 26.024095 0.0 30.25 48.0 69.00 \n", + "RBI 322.0 48.027950 26.166895 0.0 28.00 44.0 64.75 \n", + "Walks 322.0 38.742236 21.639327 0.0 22.00 35.0 53.00 \n", + "Years 322.0 7.444099 4.926087 1.0 4.00 6.0 11.00 \n", + "CAtBat 322.0 2648.683230 2324.205870 19.0 816.75 1928.0 3924.25 \n", + "CHits 322.0 717.571429 654.472627 4.0 209.00 508.0 1059.25 \n", + "CHmRun 322.0 69.490683 86.266061 0.0 14.00 37.5 90.00 \n", + "CRuns 322.0 358.795031 334.105886 1.0 100.25 247.0 526.25 \n", + "CRBI 322.0 330.118012 333.219617 0.0 88.75 220.5 426.25 \n", + "CWalks 322.0 260.239130 267.058085 0.0 67.25 170.5 339.25 \n", + "PutOuts 322.0 288.937888 280.704614 0.0 109.25 212.0 325.00 \n", + "Assists 322.0 106.913043 136.854876 0.0 7.00 39.5 166.00 \n", + "Errors 322.0 8.040373 6.368359 0.0 3.00 6.0 11.00 \n", + "Salary 263.0 535.925882 451.118681 67.5 190.00 425.0 750.00 \n", + "\n", + " max \n", + "AtBat 687.0 \n", + "Hits 238.0 \n", + "HmRun 40.0 \n", + "Runs 130.0 \n", + "RBI 121.0 \n", + "Walks 105.0 \n", + "Years 24.0 \n", + "CAtBat 14053.0 \n", + "CHits 4256.0 \n", + "CHmRun 548.0 \n", + "CRuns 2165.0 \n", + "CRBI 1659.0 \n", + "CWalks 1566.0 \n", + "PutOuts 1378.0 \n", + "Assists 492.0 \n", + "Errors 32.0 \n", + "Salary 2460.0 " + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Hitters.describe().T #Descriptive table - wherein \".T\" transposes the table" + ] + }, + { + "cell_type": "markdown", + "id": "007b1da0-e486-4ebc-8456-805a9279cb43", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Missing data" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "1a98ddbd-dbf7-4c0f-b6f5-ce7ea35c6758", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AtBat 0\n", + "Hits 0\n", + "HmRun 0\n", + "Runs 0\n", + "RBI 0\n", + "Walks 0\n", + "Years 0\n", + "CAtBat 0\n", + "CHits 0\n", + "CHmRun 0\n", + "CRuns 0\n", + "CRBI 0\n", + "CWalks 0\n", + "League 0\n", + "Division 0\n", + "PutOuts 0\n", + "Assists 0\n", + "Errors 0\n", + "Salary 59\n", + "NewLeague 0\n", + "dtype: int64" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Hitters.isna().sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "id": "74dbea88-4a5e-46c6-9cd1-e998eaedf713", + "metadata": {}, + "outputs": [], + "source": [ + "# Remove missing values\n", + "Hitters = Hitters.dropna()" + ] + }, + { + "cell_type": "markdown", + "id": "dabc4d47-cb53-4207-9865-09e50e08ad20", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Mixing data" + ] + }, + { + "cell_type": "markdown", + "id": "c228dc64-2be3-40b8-a906-eba43fd91dd9", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "Generating further example dataframes:" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "867d91d2-e98e-4dba-ab17-9f16cbc781dd", + "metadata": {}, + "outputs": [], + "source": [ + "df1 = pd.DataFrame(\n", + " {\n", + " \"A\": [\"A0\", \"A1\", \"A2\", \"A3\"],\n", + " \"B\": [\"B0\", \"B1\", \"B2\", \"B3\"],\n", + " \"C\": [\"C0\", \"C1\", \"C2\", \"C3\"],\n", + " \"D\": [\"D0\", \"D1\", \"D2\", \"D3\"],\n", + " },\n", + " index=[0, 1, 2, 3],\n", + ")\n", + "\n", + "df2 = pd.DataFrame(\n", + " {\n", + " \"A\": [\"A4\", \"A5\", \"A6\", \"A7\"],\n", + " \"B\": [\"B4\", \"B5\", \"B6\", \"B7\"],\n", + " \"C\": [\"C4\", \"C5\", \"C6\", \"C7\"],\n", + " \"D\": [\"D4\", \"D5\", \"D6\", \"D7\"],\n", + " },\n", + " index=[4, 5, 6, 7],\n", + ")\n", + "\n", + "df3 = pd.DataFrame(\n", + " {\n", + " \"A\": [\"A4\", \"A5\", \"A6\", \"A7\"],\n", + " \"B\": [\"B4\", \"B5\", \"B6\", \"B7\"],\n", + " \"C\": [\"C4\", \"C5\", \"C6\", \"C7\"],\n", + " \"D\": [\"D4\", \"D5\", \"D6\", \"D7\"],\n", + " },\n", + " index=[0, 1, 2, 3],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "b9a10a38-c821-47b8-a89a-0518c9a847f2", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Concat" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "id": "f61c1d99-33de-4517-9261-eaf4c77eca00", + "metadata": {}, + "outputs": [], + "source": [ + "df12_concat = pd.concat([df1,df2])" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "id": "0b0b9b10-8987-43c2-8e20-392efe1dce6c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ABCD
0A0B0C0D0
1A1B1C1D1
2A2B2C2D2
3A3B3C3D3
4A4B4C4D4
5A5B5C5D5
6A6B6C6D6
7A7B7C7D7
\n", + "
" + ], + "text/plain": [ + " A B C D\n", + "0 A0 B0 C0 D0\n", + "1 A1 B1 C1 D1\n", + "2 A2 B2 C2 D2\n", + "3 A3 B3 C3 D3\n", + "4 A4 B4 C4 D4\n", + "5 A5 B5 C5 D5\n", + "6 A6 B6 C6 D6\n", + "7 A7 B7 C7 D7" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df12_concat" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "id": "ce61448c-e470-4ed6-909d-3a22eea1a15c", + "metadata": {}, + "outputs": [], + "source": [ + "df12_concat = pd.concat([df1,df3]) # Taking initial indexes into account - hence index duplicates" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "id": "060f38c5-b1d8-42a6-90ad-f07a2b2bea5b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ABCD
0A0B0C0D0
1A1B1C1D1
2A2B2C2D2
3A3B3C3D3
0A4B4C4D4
1A5B5C5D5
2A6B6C6D6
3A7B7C7D7
\n", + "
" + ], + "text/plain": [ + " A B C D\n", + "0 A0 B0 C0 D0\n", + "1 A1 B1 C1 D1\n", + "2 A2 B2 C2 D2\n", + "3 A3 B3 C3 D3\n", + "0 A4 B4 C4 D4\n", + "1 A5 B5 C5 D5\n", + "2 A6 B6 C6 D6\n", + "3 A7 B7 C7 D7" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df12_concat" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "id": "578c864a-3762-4cc6-994e-8a15fd69385a", + "metadata": {}, + "outputs": [], + "source": [ + "df12_concat = pd.concat([df1,df3], ignore_index=True) # Ignore initial indexes - straight concat" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "id": "0be1c8bf-9a8c-46a6-a191-897c2373c66d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ABCD
0A0B0C0D0
1A1B1C1D1
2A2B2C2D2
3A3B3C3D3
4A4B4C4D4
5A5B5C5D5
6A6B6C6D6
7A7B7C7D7
\n", + "
" + ], + "text/plain": [ + " A B C D\n", + "0 A0 B0 C0 D0\n", + "1 A1 B1 C1 D1\n", + "2 A2 B2 C2 D2\n", + "3 A3 B3 C3 D3\n", + "4 A4 B4 C4 D4\n", + "5 A5 B5 C5 D5\n", + "6 A6 B6 C6 D6\n", + "7 A7 B7 C7 D7" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df12_concat" + ] + }, + { + "cell_type": "markdown", + "id": "5e5b0d7d-c955-4139-b8bd-16babaf3063f", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Join" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "id": "ac1ff4bc-4de4-4d76-9b41-4f6bca2c06de", + "metadata": {}, + "outputs": [], + "source": [ + "df12_join = pd.concat([df1,df2], axis=1) # join='outer' !" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "id": "e229063e-3cd1-4727-8819-6f05615181b4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ABCDABCD
0A0B0C0D0NaNNaNNaNNaN
1A1B1C1D1NaNNaNNaNNaN
2A2B2C2D2NaNNaNNaNNaN
3A3B3C3D3NaNNaNNaNNaN
4NaNNaNNaNNaNA4B4C4D4
5NaNNaNNaNNaNA5B5C5D5
6NaNNaNNaNNaNA6B6C6D6
7NaNNaNNaNNaNA7B7C7D7
\n", + "
" + ], + "text/plain": [ + " A B C D A B C D\n", + "0 A0 B0 C0 D0 NaN NaN NaN NaN\n", + "1 A1 B1 C1 D1 NaN NaN NaN NaN\n", + "2 A2 B2 C2 D2 NaN NaN NaN NaN\n", + "3 A3 B3 C3 D3 NaN NaN NaN NaN\n", + "4 NaN NaN NaN NaN A4 B4 C4 D4\n", + "5 NaN NaN NaN NaN A5 B5 C5 D5\n", + "6 NaN NaN NaN NaN A6 B6 C6 D6\n", + "7 NaN NaN NaN NaN A7 B7 C7 D7" + ] + }, + "execution_count": 55, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df12_join" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "id": "3883d8f6-3cb7-4f39-957b-637e96b73959", + "metadata": {}, + "outputs": [], + "source": [ + "df13_join = pd.concat([df1,df3], axis=1).reindex(df1.index) # Careful with this one." + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "id": "841c6579-e5d8-48ad-894c-61727b409608", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ABCDABCD
0A0B0C0D0A4B4C4D4
1A1B1C1D1A5B5C5D5
2A2B2C2D2A6B6C6D6
3A3B3C3D3A7B7C7D7
\n", + "
" + ], + "text/plain": [ + " A B C D A B C D\n", + "0 A0 B0 C0 D0 A4 B4 C4 D4\n", + "1 A1 B1 C1 D1 A5 B5 C5 D5\n", + "2 A2 B2 C2 D2 A6 B6 C6 D6\n", + "3 A3 B3 C3 D3 A7 B7 C7 D7" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df13_join" + ] + }, + { + "cell_type": "raw", + "id": "553b8a38-4964-42f8-a9ca-977bfc6d432a", + "metadata": { + "tags": [] + }, + "source": [ + "\\newpage" + ] + }, + { + "cell_type": "markdown", + "id": "9dcd16c6-6048-417d-8a5e-a68d47ddc310", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "# Exercise 3: Creating Basic Variables" + ] + }, + { + "cell_type": "markdown", + "id": "d22cafc7-25de-4eed-909c-b5ef9b801405", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Create Vector" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "id": "f170d09f-15c4-460d-93ce-b358582c7a66", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a vector as a row\n", + "vector_row = np.array([1, 2, 3])\n", + "\n", + "# Create a vector as a column\n", + "vector_column = np.array([[1],\n", + " [2],\n", + " [3]])" + ] + }, + { + "cell_type": "markdown", + "id": "fbe96fc5-6e2d-409e-98b7-73c7a1440e0d", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Create Matrix" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "id": "25d87408-0ca1-47f9-b6ec-7ee8748792d6", + "metadata": {}, + "outputs": [], + "source": [ + "ma = np.array([[1, 0],\n", + " [0, 1]])\n", + "\n", + "mb = np.array([[4, 1],\n", + " [2, 2]])" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "id": "afba2f7c-3d58-4a77-a22c-90980466de93", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[4, 1],\n", + " [2, 2]])" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.matmul(ma, mb)" + ] + }, + { + "cell_type": "markdown", + "id": "011a1392-0900-4f26-825a-3e7e10344fa1", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Generating sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "id": "d087fc96-4c38-4f28-a89a-460df815e1ba", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "6" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "num_seq = range(5, 11)\n", + "len(num_seq)" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "id": "0024c143-ee04-4534-b472-bdbe9cdfdf69", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "5\n", + "6\n", + "7\n", + "8\n", + "9\n", + "10\n" + ] + } + ], + "source": [ + "for number in num_seq:\n", + " print(number)" + ] + }, + { + "cell_type": "markdown", + "id": "976f9ab1-a71b-4ede-90a1-cf2deff1e3e3", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Simulate random variables" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "id": "c8c7f9f4-c9c7-48a8-baad-7e5fdb23cafe", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[3, 76, 75]" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import random\n", + "n_random = random.sample(range(1, 100), 3)\n", + "n_random" + ] + }, + { + "cell_type": "markdown", + "id": "0fe105b3-6952-4de8-ab69-baa855870bc7", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Writing functions" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "id": "2b1f8c38-a59d-4235-9532-c6a238326b69", + "metadata": {}, + "outputs": [], + "source": [ + "def my_function():\n", + " print(\"Hello from a function\") " + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "id": "69822132-e851-401b-98ee-6427d97afcf4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Hello from a function\n", + "None\n" + ] + } + ], + "source": [ + "print(my_function())" + ] + }, + { + "cell_type": "raw", + "id": "3ed9a3a1-aa37-4e50-b3d9-e1a11e002e3d", + "metadata": { + "tags": [] + }, + "source": [ + "\\newpage" + ] + }, + { + "cell_type": "markdown", + "id": "a7fef820-5cbe-4045-b70c-5c20fda1dcd9", + "metadata": { + "user_expressions": [] + }, + "source": [ + "# Exercise 4: Operators, control flow statements and loops" + ] + }, + { + "cell_type": "markdown", + "id": "5563695d-663f-43b4-9446-6a856a2bb3ac", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Comparison of boolean (logical) operators" + ] + }, + { + "cell_type": "markdown", + "id": "1a04d098-e465-4464-85dd-206923bf0912", + "metadata": { + "user_expressions": [] + }, + "source": [ + "In Python, boolean comparisons can only result in two logical values: `True` and `False`. These values represent the outcome of logical operations and comparisons in Python. \\\n", + "\n", + " - `==` means “is equal”. \\\n", + " - The statement x == a framed as a question means: “Does the value of x equal the value\n", + "of a?” \\\n", + " - `!=` means “not equal”. \\ \n", + " - The statement x == b means: “Does the value of x not equal the value of b?” \\\n", + " - `<` means “less than”. \\\n", + " - The statement x < c means: “Is the value of x less than the value of c?” \\\n", + " - `<=` means “less than or equal”. \\\n", + " - The statement x <= d means: “Is the value of x less or equal to the value of d?” \\\n", + " - `>` means “greater than”. \\\n", + " - The statement x > e means: “Is the value of x greater than the value of e?” \\\n", + " - `>=` means “greater than or equal”. \\\n", + " - The statement x >= f means: “Is the value of x greater than or equal to the value of f?” \\" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "id": "0b39b653-4b15-44af-9552-f93a73e27cfc", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 66, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Example: Boolean operator comparison\n", + "x = 6\n", + "y = 5\n", + "\n", + "x == y" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "id": "c0579e4b-0bcb-4104-8c10-95283a20404e", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True\n" + ] + } + ], + "source": [ + "output = x >= y\n", + "\n", + "print(output)" + ] + }, + { + "cell_type": "markdown", + "id": "d1fa310d-49e4-45bd-b1e5-403fa4ed1b4d", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Control flow statements - `if`, `elif` & `else`" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "id": "edf56d58-d554-49ed-828c-865a80aeca73", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "skipping\n" + ] + } + ], + "source": [ + "%%script echo skipping\n", + "# If-Else Statements: Test whether a certain condition is satisfied; then do something\n", + "\n", + "if condition:\n", + " # body of if statement\n", + "else:\n", + " # body of else statement" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "id": "a7bc8fff-76a2-4352-b892-7405b55ef9fb", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "x is larger than y\n" + ] + } + ], + "source": [ + "if x > y:\n", + " z = print(\"x is larger than y\")\n", + "elif x == y:\n", + " z = print(\"x is equal to y\")\n", + "else: \n", + " z = print(\"x is smaller than y\")\n", + "\n", + "z" + ] + }, + { + "cell_type": "markdown", + "id": "c54d7530-f2b6-4e4b-8722-242454880246", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Loops - `for`" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "id": "1b87d51b-f280-4761-97d4-b63b44d2d3d9", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]\n" + ] + } + ], + "source": [ + "x = []\n", + "n = 10\n", + "\n", + "# Let's try to construct a vector that contains n entries: 1ˆ2, 2ˆ2, 3ˆ2, 4ˆ2,...\n", + "# Loops: repeat something i-times\n", + "for i in range(1, n+1): # 1 till 10 equals 9 -> n+1\n", + " x.append(i**2)\n", + "\n", + "print(x)" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "id": "cad6df35-3186-4af3-9af3-e8eb4758b1dd", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[ 1 4 9 16 25 36 49 64 81 100]\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "\n", + "# Shortcut:\n", + "x2 = np.arange(1, n+1)**2\n", + "\n", + "# Alternativ:\n", + "#x2 = np.linspace(start=1, stop=n, num=n)**2\n", + "\n", + "print(x2)" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "id": "13f870b5-3750-40e4-bf76-3a276abb8249", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[ 1.62434536 -0.61175641 -0.52817175 -1.07296862 0.86540763]\n" + ] + } + ], + "source": [ + "# Example: Use loop to find negative values in a vector\n", + "## Simulate random variables\n", + "\n", + "np.random.seed(1) # Fix the random seed for reproducibility\n", + "\n", + "x = np.random.normal(size=5)\n", + "\n", + "print(x)" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "id": "ee59c84d-15ab-4d5a-bb97-7ceb63d8392d", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[False, True, True, True, False]\n" + ] + } + ], + "source": [ + "# Initialize an empty list for storing boolean values\n", + "negative = []\n", + "\n", + "# Loop through each element in the array x\n", + "for i in range(len(x)):\n", + " if x[i] < 0:\n", + " negative.append(True)\n", + " else:\n", + " negative.append(False)\n", + "\n", + "print(negative)" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "id": "cf0df6cc-83a9-49c4-a2e4-86713496991c", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[False True True True False]\n" + ] + } + ], + "source": [ + "# Shortcut: Vectorized operation to check for negative values\n", + "negative = x < 0\n", + "\n", + "print(negative)" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "id": "c029a4c1-83b7-40a8-8509-cacddb0d421c", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1.62434536 0. 0. 0. 0.86540763]\n" + ] + } + ], + "source": [ + "# Example: Loop through each element in the array x and replace negative values with 0\n", + "for i in range(len(x)):\n", + " if x[i] < 0:\n", + " x[i] = 0\n", + "\n", + "print(x)" + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "id": "cf28ba7b-7c0f-4d4a-83ab-1518c7148f74", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1.62434536 0. 0. 0. 0.86540763]\n" + ] + } + ], + "source": [ + "# Shortcut: Replace negative values in x with 0 using a vectorized operation\n", + "x[x < 0] = 0\n", + "\n", + "print(x)" + ] + }, + { + "cell_type": "markdown", + "id": "512d7075-1df8-4342-9cfc-5d0909266ce6", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "# Extra" + ] + }, + { + "cell_type": "markdown", + "id": "ebc4f9f5-18a0-430c-afe6-e02df64f1d57", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "## Hiding warning message" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "id": "daa7d41d-a026-467c-8121-4665416f2871", + "metadata": {}, + "outputs": [], + "source": [ + "import warnings # importing the warning module\n", + "\n", + "# To hide all warning messages, add the following code line in the cell where you encountered the warning.\n", + "warnings.filterwarnings('ignore') \n", + "\n", + "# To hide a warning based on a category of warning messages, add the additional category parameter.\n", + "warnings.filterwarnings('ignore', category=UserWarning)" + ] + }, + { + "cell_type": "markdown", + "id": "5031ec8a-8c06-49d5-a67f-70f210a115e8", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "**Categories of Warning**\n", + "\n", + "- Deprecation Warning\n", + "- User Warning\n", + "- Syntax Warning\n", + "- Runtime Warning\n", + "- ..." + ] + }, + { + "cell_type": "markdown", + "id": "1de88e92-f7e4-42ee-b7f8-aee547006217", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "Suppressing warning messages can be useful for improving code readability, but it can also hide potentially important information. So, it's recommended to use this approach with caution and only when you are confident that the warnings can be safely ignored." + ] + } + ], + "metadata": { + "date": " ", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.7" + }, + "title": " ", + "toc-autonumbering": false, + "toc-showcode": false, + "toc-showmarkdowntxt": false, + "toc-showtags": false + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Machine Learning for Economics and Finance/python-intro/python-intro.pdf b/Machine Learning for Economics and Finance/python-intro/python-intro.pdf new file mode 100755 index 0000000..36a8fdc Binary files /dev/null and b/Machine Learning for Economics and Finance/python-intro/python-intro.pdf differ