Obviously, we knew beforehand that we should not try to divide 1 by 0.
The program crashed when we did so anyway - but python gives us more information what happened and where. This is called a ***traceback***. This traceback can also include multiple function calls or classes to guide us to where the error occurred.
Python uses special objects called *exceptions*. There are many pre-defined exceptions (and we can define our own) that help the user not only to understand what is happening, but also to handle exceptions in the code.
In our case here, we get the special exception: ```ZeroDivisionError``` with the added explanation: ```division by zero```.
## Intercepting Exceptions
If we suspect that an operation fails, we can use a try-except clause to prevent a crash.
The general syntax is
```
try:
# our statement here
except <exception we want to catch>:
# how to handle the error
except <other exceptions>:
# other recovery procedures
else:
# if we need to do something when everything worked
finally:
# whatever we need to do after the try-except clause completes.
```
When we define a try-except block we should at least catch one exception. However, in case we need more elaborate error handling, we can catch several exceptions of different type and act accordingly.
For the example above, we could try: \
(```\n``` creates a new line)
%% Cell type:code id: tags:
``` python
try:
print(1/0)
exceptZeroDivisionErrorase:
print('Do not try to divide by zero! \n Caught exeption: {}'.format(e))
```
%% Output
Do not try to divide by zero!
Caught exeption: division by zero
%% Cell type:markdown id: tags:
Here, we intercept the specific exception ```DivisionZeroError``` and just print out a warning. Note that the program no longer crashes: The exception is being raised, we intercept it and do something with it. In a more elaborate program we would then do some error handling and either continue with the code, or end the program in a controlled way if we find we cannot continue in a meaningful way.
Note that we intercepted the inception as a new variable: ```as e```. \
This allows us to work with and inspect this exception. Here we just print it and obtain the same information we had in the traceback earlier.
If we want to catch several types of exeption that we want to treat in the same way, we can do:
In many cases, we come to a point where we need to let the user know that a situation occurred that should not have occured. In the simplest way, we could write a ```print``` statement and print out an error message and then terminate the program. \
However, it would be better to pass an exception to the upstream code (e.g. the code calling a function or using a class) so they can inspect the error, handle it - or let the program crash if they do not.
To raise an exception, we use the keyword ```raise```.
Python already has a comprehensive list of [pre-defined exceptions](https://docs.python.org/3/library/exceptions.html#concrete-exceptions). If necessary, we can define our own as well.
***Example***
We define a function that divides two integer numbers. If the numbers are not integers, we raise an exception.
%% Cell type:code id: tags:
``` python
defmy_divide(a,b):
iftype(a)isnotintortype(b)isnotint:
raiseTypeError('At least one of the arguments is not an integer')
----> 3 raise TypeError('At least one of the arguments is not an integer')
4 return a/b
TypeError: At least one of the arguments is not an integer
%% Cell type:markdown id: tags:
Here, we can see a more elaborate traceback.
We note:
* We have augmented the pre-defined exception ```TypeError``` with an optional message that we can pass along to give more information about the exception or why we have raised it. It is a ```TypeError```, because we only want our function to work on specific types and one of the arguments is not of the specific type.
* The Tracebak shows the line that lead to the error
```
---> 11 result = my_divide('three',5)
```
and then the line where the error actually occured. If we have more functions or classes, then we would have a longer traceback, going through all steps.
We now use the try-except clause to intercept this case:
%% Cell type:code id: tags:
``` python
defmy_divide(a,b):
iftype(a)isnotintortype(b)isnotint:
raiseTypeError('At least one of the arguments is not an integer')
returna/b
# Avoid the crash:
try:
new_result=my_divide('three',5)
print('The result is: {}'.format(new_result))
exceptTypeErrorase:
print('Caught an exception: {}'.format(e))
print('The result is: {}'.format(new_result))
```
%% Output
Caught an exception: At least one of the arguments is not an integer
---> 14 print('The result is: {}'.format(new_result))
NameError: name 'new_result' is not defined
%% Cell type:markdown id: tags:
Now we note that the ```new_result``` is not defined outside the block because it was used inside the ```try``` block.
(Note: if we had used ```result```, it would not have caused an error because we have used the variable ```result``` in an earlier cell of the notebook. Since the notebook keeps its state, we would re-use that variable. As mentioned before, Jupyter notebooks are convenient - but very prone to side-effects and subtle bugs.)
Either we better create the variable first (like below) - or we need to stop the program gracefully if we intend to continue with the result but cannot do so without it.
%% Cell type:code id: tags:
``` python
result=0
try:
result=my_divide('three',5)
print('The result is: {}'.format(result))
exceptTypeErrorase:
print('Caught an exception: {}'.format(e))
print('The result is: {}'.format(result))
```
%% Output
Caught an exception: At least one of the arguments is not an integer
The result is: 0
%% Cell type:markdown id: tags:
Now result is defined - but we need to make sure that the default value is useful. Since we have now intercepted the exception, the code no longer stops. If ```result = 0``` is not useful, we need to handle this case now as well...
"Add a cosinus function to the plot such that the plot looks like below afterwards.\n",
"Add a cosine function to the plot such that the plot looks like below afterwards.\n",
"\n",
""
]
...
...
%% Cell type:markdown id: tags:
# Working with Modules
All the code we have seen so far was self-contained in the sense that all relevant functionality was defined within the file / Jupyter Notebook.
While it is great to have everything in one place - it clearly does not scale.
As soon as we have more complex code, the notebooks become very long and it becomes harder and harder to keep track of what was defined where.
It also means the code can only be re-used with some difficulty - we would have to copy & paste code from here to there, and so on. Once we find a bug, we would have to fix it everywhere (or, more likely, forget to do that.)
Python supports "modules" which are essentially files ending in ```.py``` that hold functions, classes, generators, decorators, etc. that we can then use and re-use in many other pieces of code or in Jupyter Notebooks.
> **Best Practice**: Code-refactoring
>
> When we have developed a useful function, class, etc. it is a good practice to put this into its own separate file.
> This allows to use it more widely, it is easier to debug, and keeps the code clutter-free. It also makes it easier to clean up the code, make it more readable, more efficient, ...
>
> This is part of *code refactoring*: Improving and cleaning up the code without adding new functionality.
Once we have defined a module (put the functions, classes, etc. into a file), we can make this usable in other code with the ```import``` statement.
This can be done in multiple ways. If our code was in a file with name ```my_module.py``` we can do:
*```import my_module```: This makes all functions, classes, etc available in our current code. To access these, we need to prepend ```my_module.``` before any function, etc.
*```import my_module as m```: The above can be a bit cumbersome, we can introduce a new name (e.g. an abbreviation) with the keyword ```as```. This allows us to change the name under which we can access the contents of the module. Instead of ```my_module.```, we can prepend the new name: ```m.```.
*```from my_module import my_func```: If we want to make only specific functions or classes available, we can import them explicitly. If we now want to use the imported functions, classes, etc. imported this way, we do not need to prepend the name of the module. We say that we have imported them into our current *namespace*.
> **Bad practice** - antipattern
>
> The last way to import methods also allows a generic statement to import *all* functions, classes, etc:
>
> ```from my_module import *```
>
> This imports everything (except those with a leading underscore).
> However, it then becomes very tricky very quickly to remember where each function or class was originally defined, or, worse, if functions or classes have the same name in different modules, which one we are actually using.
> Therefore, we should refrain from using this construct at all.
***Example***:
Suppose we have moved the code to compute the Fibonacci series into a module called ```fibonacci.py```
We can then use any of the following to make the code available:
One advantage of python is its rich ecosystem where many modules are made available publicly as open-source. Many packages can be obtained from the central [Python Package Index](https://pypi.org/). At the moment, there are more than 400.000 projects that allow you to use modules for scientific computing, machine learning, deep learning, statistical analysis, visulalisation of data, image processing, language processing, web development, working with databases, and many many more.
As a data scientist, the most generic and typical "toolbelt" includes (at least):
*[NumPy](https://numpy.org): Efficient computations on large, multi-dimensional matrices, large collection of high-level mathematical functions
*[Pandas](https://pandas.pydata.org): Powerful toolkit for the analysis and handling of structured data (i.e. organised in tables), and time-series
*[Matplotlib](https://matplotlib.org): Visualisation in Python
*[Seaborn](https://seaborn.pydata.org): Statistical data visualisation. Built on top of matplotlib to extend its functionality and also to provide a more intuitive interface as matplotlib with defaults that are visually pleasing.
*[PyTorch](https://pytorch.org) and [TensorFlow](https://www.tensorflow.org): Deep Learning and neural networks, developed by Facebook/Meta (PyTorch) and Google/Alphabet (TensorFlow). Each of them has a rich (open-source) community with many specialised packages that augment and extend these.
* common [statistical functions](https://numpy.org/doc/stable/reference/routines.statistics.html)(e.g. mean, etc.)
*[array and matrix](https://numpy.org/doc/stable/reference/arrays.html) operations
The latter is particularly important when we move to machine or deep learning as this is the foundation of storing and manipulating data.
> **Note**
>
> NumPy is heavily optimized for efficient and fast data handling and bulk operations.
>
> You can often gain siginificant speed boosts implementing operations using NumPy functionality compared to trying to build the functionality yourself. It is worth to spend some time getting familiar with NumPy and get used to "offload" computations to it.
We typically include Numpy in our code with:
```import numpy as np```
> **Note**
>
> Although ```np``` is an arbirary abbreviation, almost everyone does it like this. Whenever you see code with ```np.```, it almost always refers to NumPy funcitonality
>
The most fundamental data-type in NumPy is an ```array``` that can have one or more dimensions.
The ```rank``` is the number of dimensions, these area called ```axis``` in NumPy.
The ```shape``` defines the size of the sub-arrays in each dimension.
*Example* \
one dimensional array that we initialise with a list of integer values.
%% Cell type:code id: tags:
``` python
import numpy as np
# one dimensional array
my_array = np.array([1,2,3,4,5])
print('The array is {0}, with rank (#dimensions) {1} and shape {2}'.format(my_array, my_array.ndim, my_array.shape))
print('The array is {0}, with rank (#dimensions) {1} and shape {2}'.format(my_array, my_array.ndim, my_array.shape))
```
%% Output
The array is [1 2 3 4 5], with rank (#dimensions) 1 and shape (5,)
The array is [[1 2 3]
[4 5 6]
[7 8 9]], with rank (#dimensions) 2 and shape (3, 3)
%% Cell type:markdown id: tags:
Here we note:
* If we only have one dimension (or axis), we initialise with a single list, e.g. ```[1,2,3,4,5,6]```
* In multimple dimensions, we initialise with a list, and then each dimension has its own list, i.e. ```[ [dim_1], [dim_2], [...]]```
We can make more complex arrays by, e.g. nesting arrays:
%% Cell type:code id: tags:
``` python
my_array = np.array([ [[1,2,3], [4,5,6]],
[[1,2,3], [4,5,6]],
[[1,2,3], [4,5,6]] ] )
print('The array is {0}, with rank (#dimensions) {1} and shape {2}'.format(my_array, my_array.ndim, my_array.shape))
```
%% Output
The array is [[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]], with rank (#dimensions) 3 and shape (3, 2, 3)
%% Cell type:markdown id: tags:
Here, we see that there is now an additional dimension (axis): We repeat the block ```[[1,2,3], [4,5,6]]``` three times, each time we have a matrix with two rows ```1,2,3``` and ```4,5,6```.
We can add more dimensions and make the "shape" more complex, as the need arises. The same concept is then used as "tensors" in deep learning.
If we need to change the "shape" we can use the function ```reshape( new_dim_1, new_dim_2, ...)```
%% Cell type:code id: tags:
``` python
my_array = my_array.reshape(2,9)
print('The array is {0}, with rank (#dimensions) {1} and shape {2}'.format(my_array, my_array.ndim, my_array.shape))
```
%% Output
The array is [[1 2 3 4 5 6 1 2 3]
[4 5 6 1 2 3 4 5 6]], with rank (#dimensions) 2 and shape (2, 9)
%% Cell type:markdown id: tags:
We note that the number of elements and the elements themselves do not change, but the way they are arranged. Therefore, we can only re-shape the array such that the number of elements fits into the new shape.
Access to sub-parts of an array works similar to "slicing" we had encountered earlier for the native python datatypes.
***Exercise***
* print the sub-array with the negative numbers
* change the negative numbers to positive numbers. You can use ```np.abs(...)``` to compute the absolute value
* print the resulting array
%% Cell type:code id: tags:
``` python
my_array = np.array([ [[1,2,3], [4,5,6]],
[[-1,-2,-3], [-4,-5,-6]],
[[1,2,3], [4,5,6]] ] )
# ... your code here ....
```
%% Cell type:markdown id: tags:
## Visualisation
The most fundamental visualisation package is [Matplotlib](https://matplotlib.org) Other packages like [Seaborn](https://seaborn.pydata.org) build on this foundation, buth there are also others like [PlotLy](https://plotly.com/).
The most common "convention" to include matplotlib and seaborn is:
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
# In Jupyter notebooks we add the following "magic" line to make the plots appear in the notebook cells
%matplotlib inline
```
%% Cell type:markdown id: tags:
Both libraries are very complex and allow to customize plots and visualisations in a number of ways.
There are helpful ressources to summarise the options, e.g.,a ["cheat-sheet" for matplotlib](https://matplotlib.org/cheatsheets/) to help remember the most frequently used options.
As an example, we create a plot that shows a simple sinus oscillation and save the resulting figure, we need the following steps:
* create the $x$ and $y$ values
* create the figure as a matplotlib object
* add the data to the figure
* add labels to $x$ and $y$ axes, add a legend
* save the figure
As a first step, we need to create the $x$ values at which we want to evaluate and plot the function. In the end, we cannot have infinite values but we need to decide on a range and the number of evaluation points. The latter depends, in particular, how much the function we want to visualise changes, and how large the range is over which we want to show the function. \
Numpy provides a convenient function to then create these points: ```np.linspace(start, stop, number_points)```
At each point in $x$, we then evaluate the function and create a corresponding $y$ value.
Rather than doing the loop ourselves (which would be slow), NumPy can take the $x$ array, apply our function to each of the values in "bulk", and return a corresponding array with the $y$ values. These are called (in NumPy jargon): vector operations. These are much faster than if we were to implement this ourselves.
%% Cell type:code id: tags:
``` python
x = np.linspace(-np.pi, np.pi, 100)
y = np.sin(x)
```
%% Cell type:markdown id: tags:
In the next step, we set up the figure as a matplotlib object.
Especially when working with Jupyter notebooks, it's helpful to "clear" the figure (even if we think it doesn't exit yet) - to make sure we do not accidently work on an object that we have created previously.
Then, we create a new figure. In the simplest case here, we only have one figure to show, but we can, in principle, add a grid of multiple figures. \
We already note that the matplotlib syntax is not always "immediately obvious" so we add some "boilerplate" code here that you can use in your own work as a template.
After this we plot our data, add the axis labels and legend, print the figure to a file.
You can find more details on [plotting options](https://matplotlib.org/2.1.1/api/_as_gen/matplotlib.pyplot.plot.html) in the matplotlib documentation.
Add a cosinus function to the plot such that the plot looks like below afterwards.
Add a cosine function to the plot such that the plot looks like below afterwards.

%% Cell type:markdown id: tags:
Now we do the same in Seaborn.
For such a relatively simple plot, it does not save us much effort, the legend is placed automatically and a bit better so that it doesn't get in the way of the data.
Adding the labels, however, is the same.
The advantage becomes more apparent when we do more complex plots a bit later \
To do that, we first need to look at [pandas](https://pandas.pydata.org) to work with structured data.