Skip to content
Snippets Groups Projects
Commit 23c6b3ef authored by Ulrich Kerzel's avatar Ulrich Kerzel
Browse files

add solution to new exercises

parent c706f193
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Computing Gradients in PyTorch
[PyTorch](https://pytorch.org/) is a comprehensive library that is primarily used for machine learning. However, it can also be used as an effective way to handle matrix operations or gradients.
In particular for the latter, we can exploit the fact that training neural networks requires calculating gradients efficiently as this is the backbone of the algorithms for training the networks.
Therefore, if we can formute our problem at hand in such a way that we can use PyTorch, we can use the inbuilt methods to compute and obtain the gradients.
In PyTorch, this is done via [AutoGrad](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html).
In this example, we use a simple sine-function: Using such a simple function makes it easy for any neural network to learn the functional dependency. Moreover, we can compare this to the well known derivative: $\frac{d \sin(x)}{d x} = \cos(x)$, which makes it immediately obvious if we have learned the correct gradient.
%% Cell type:code id: tags:
```
import torch
from torch import nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
```
%% Output
Using cpu device
%% Cell type:markdown id: tags:
## Training Data
In this simple example, we will use $f(x) = \sin(x)$ to generate training data.
First of all, the releationship is very simple, i.e. even small networks will be able to learn this quickly. Additionally, we know what the gradient will look like: $\frac{dy}{dx} = \cos(x)$, i.e. we know immediately if the network has learned the correct gradient.
The function [torch.linspace](https://pytorch.org/docs/stable/generated/torch.linspace.html) is the equivalent to numpy version but produces a tensor directly.
The part ```.view(-1,1)``` re-shapes the resulting array such that we have one feature: torch.linspace creates a tensor with shape (100), i.e. a 1D tensor with 100 elements. The ```-1``` is a placeholder to tell PyTorch to infer the length automatically from the number of elements in the original tensor. The ```1``` tells PyTorch to reformat the data such that we have one feature. The resulting tensor has a shape of (100,1), i.e. 100 rows of 1 feature each.
%% Cell type:markdown id: tags:
**Exercise**
Create training data ```x_train``` and ```y_train``` for a $sin(x)$ function in the interval $x_{train} \in (0, 2\pi)$.
Plot the resulting training data.
%% Cell type:code id: tags:
```
##
## Your code here
##
```
%% Cell type:markdown id: tags:
**Solution**
%% Cell type:code id: tags:
```
# Generate training data based using sin(x)
x_train = torch.linspace(0, 2 * torch.pi, steps=100, device=device).view(-1, 1)
y_train = torch.sin(x_train)
sns.lineplot(x=x_train.numpy().flatten(),
y=y_train.numpy().flatten(), label='sin(x)')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
```
%% Output
%% Cell type:markdown id: tags:
## Network Definition and Training
We now define a very small neural network, for example a "shallow" network with just three fully connected layers.
- How many input nodes do we need?
- How many output nodes do we need?
Here, we need one input node, since we pass one value at the time to the network: $y = \sin(x)$.
Similarly, we only need one output node as we want the network to learn a single number.
%% Cell type:markdown id: tags:
**Exercise**
Write a class for a small neural network with three fully-connected (linear) layers and $\tanh(x)$ as activatin function.
Discuss how many input and output nodes the network needs.
%% Cell type:code id: tags:
```
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
##
## your code here
##
def forward(self, x):
##
## your code here
##
return x
model = NeuralNetwork().to(device)
print(model)
```
%% Cell type:markdown id: tags:
**Solution**
%% Cell type:code id: tags:
```
# A simple Neural Network
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(1, 50)
self.fc2 = nn.Linear(50, 50)
self.fc3 = nn.Linear(50, 1)
def forward(self, x):
x = self.fc1(x)
x = F.tanh(x)
x = self.fc2(x)
x = F.tanh(x)
x = self.fc3(x)
return x
model = NeuralNetwork().to(device)
print(model)
```
%% Output
NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(fc1): Linear(in_features=1, out_features=50, bias=True)
(fc2): Linear(in_features=50, out_features=50, bias=True)
(fc3): Linear(in_features=50, out_features=1, bias=True)
)
%% Cell type:code id: tags:
```
# Train model
# Define the optimizer and loss function
optimizer = optim.Adam(model.parameters(), lr=0.01)
# Training loop
num_epochs = 1000
loss_history = []
for epoch in range(num_epochs):
# Enable gradient tracking for time steps
x_train.requires_grad = True
# Forward pass: Predict
predictions = model(x_train)
# Compute the data loss (difference from sin(x))
# using the mean squared error as a loss-function for regression
data_loss = torch.mean((predictions - y_train) ** 2)
# Compute the gradient dt/dx using torch.autograd.grad
dy_train = torch.autograd.grad(
outputs=predictions,
inputs=x_train,
grad_outputs=torch.ones_like(predictions),
create_graph=True
)[0]
# Physics loss: Enforce the relationship dy/dx = cos(x)
physics_loss = torch.mean((dy_train- torch.cos(x_train)) ** 2)
# Total loss: Combine data and physics losses
total_loss = data_loss + physics_loss
# Backward pass and optimization step
optimizer.zero_grad()
total_loss.backward()
optimizer.step()
# Record the loss
loss_history.append(total_loss.item())
# Print progress every 100 epochs
if epoch % 100 == 0:
print(f"Epoch {epoch}/{num_epochs}, Total Loss: {total_loss.item():.6f}")
```
%% Output
Epoch 0/1000, Total Loss: 1.043527
Epoch 100/1000, Total Loss: 0.000421
Epoch 200/1000, Total Loss: 0.000122
Epoch 300/1000, Total Loss: 0.001010
Epoch 400/1000, Total Loss: 0.000005
Epoch 500/1000, Total Loss: 0.000433
Epoch 600/1000, Total Loss: 0.000004
Epoch 700/1000, Total Loss: 0.000003
Epoch 800/1000, Total Loss: 0.001428
Epoch 900/1000, Total Loss: 0.000008
%% Cell type:code id: tags:
```
sns.lineplot(loss_history, label='Training loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()
```
%% Output
%% Cell type:markdown id: tags:
# Plot the Gradient
We now check if the network has learned the correct gradient, i.e. $\frac{dy}{dx} = \cos(x)$
We generate some independent numbers on the same domain, obtain the predictions $\hat{y}$ and plot:
- the ground truth: $y = \sin(x)$,
- the predictions $\hat{y}$
- the gradient
%% Cell type:code id: tags:
```
# Prepare test data with requires_grad=True
x_test = torch.linspace(0, 2 * torch.pi, steps=200, device=device, requires_grad=True).view(-1, 1)
y_test = torch.sin(torch.tensor(x_test))
# predictions from the trained model
y_hat = model(x_test)
#gradient
dy_dx = torch.autograd.grad(
outputs=y_hat,
inputs=x_test,
grad_outputs=torch.ones_like(y_hat),
create_graph=True
)[0]
# detach from GPU and graph
x_test = x_test.detach().cpu().numpy().flatten()
y_test = y_test.detach().cpu().numpy().flatten()
y_hat = y_hat.detach().cpu().numpy().flatten()
dy_dx = dy_dx.detach().cpu().numpy().flatten()
# Plot predictions and gradients
sns.lineplot(x=x_test, y=y_test, label='sin(x) (Ground Truth)')
sns.lineplot(x=x_test, y=y_hat, label='Prediction')
sns.lineplot(x=x_test, y=dy_dx, label='Predicted Gradient')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
```
%% Output
<ipython-input-6-1f4b339fb796>:3: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
y_test = torch.sin(torch.tensor(x_test))
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:markdown id: tags:
# Newton's Method for Root Finding
In many applications, we need to find the root of a function, i.e. the point where the function crosses the $x$-axis: $f(x_r) = 0$
A variety of methods exist for this problem, in this examle we want to use Newton's method. The general idea is the following:
We start at some point, our initial guess $x_0$. Then, we calculate the value of the function at point $x_n$ (starting from the initial guess) $f(x_n)$, as well as the derivative $f'(x_n)$, the derivative is the slope of the tangent line to the function $f(x)$ at this point $x_n$:
$$ y = f(x_n) + f'(x_n)(x-x_n)$$
We now want to find the point where the tangent line intersects with the $x$-axis, i.e. we set $y=0$, leading to:
$$f'(x_n)(x-x_n) = -f(x_n)$$
Assuming $f'(x_n) \neq 0$, we can divide both sides by $f'(x_n)$, solve for $x$ and then iterate.
More concisely, the overall approach is:
1. Choose an initial guess $ x_0 $.
2. Iterate using the formula:
$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$$
where $ n = 0, 1, 2, \ldots $
The process continues until the difference between successive approximations is less than a predetermined tolerance level or until a maximum number of iterations is reached.
Note that if our initial guess $x_0$ is not suitable, the method may not converge.
One of the underrated features of modern deep learning frameworks is the automatic differentiation. In "conventional" deep learning, we use this as a tool behind the scenes to train a neural network and do not really interact with this. However, this method is useful in a range of applications, such as physics-informed neural networks or, indeed, this example of finding the root of a function efficiently.
While we perceive deep-learning frameworks such as [PyTorch](https://pytorch.org/) or [TensorFlow](https://www.tensorflow.org/) primarily as libraries for deep learning (and we do indeed use them for this purpose), they are, essentially, heavily optimised libraries for matrix operations and numerical handling of equations that can, in addition, levarage the computation power of GPUs.
Note that while we would ideally work with functions where we can caluclate the derivative analytically, this is not necessary.
We will use the example of a conic steel vessel discussed in the lecture "Numerical Models in Processing" by [PD Dr. W. Lenz](https://www.iob.rwth-aachen.de/habilitation-von-dr-wolfgang-lenz/). In this example, a numerical solution is derived which we will use as starting point.
First, we will start with a motivating generic example to get familiar with the method and general code structure before then turning to the concrete example.
%% Cell type:code id: tags:
```
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import torch
from datetime import datetime
```
%% Cell type:markdown id: tags:
## General Example
We start with a generic example using the function
$f(x) = \cos(x) -x$.
First, we plot the function.
Note that we directly use [torch.tensor](https://pytorch.org/docs/stable/tensors.html) as we will later on use the automatic differentiation to implement Newton's method for finding roots.
%% Cell type:code id: tags:
```
def f(x):
return torch.cos(x) - x
```
%% Cell type:markdown id: tags:
Let's first make a plot of this function.
Assuming that we already know that the root of the function is at $x=0.755$, we add a vertical line to indicate this root.
%% Cell type:code id: tags:
```
x_space = np.linspace(-10, 10, 500)
y_space = f(torch.tensor(x_space))
sns.lineplot(x=x_space, y=y_space, label='f(x)')
plt.axhline(y=0, color='black', linestyle='--', label='y=0')
# we use the value we have found below for illustration.
plt.axvline(x=0.755, color='red', linestyle='--', label='x=0.755')
plt.title('Plot of f(x) = cos(x) - x')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.legend()
plt.show()
```
%% Output
%% Cell type:code id: tags:
```
x = torch.tensor([0.1], requires_grad=True)
tolerance = 1e-6
max_iterations = 100
t_start = datetime.now()
for i in range(max_iterations):
y = f(x)
y.backward()
with torch.no_grad():
# Replacing in-place copy with out-of-place operation
x_new = x - y / x.grad
if torch.abs(x_new - x).item() < tolerance: #add .item() to get a python number
t_stop = datetime.now()
print(f'Converged after {i+1} iterations.')
print(f'Time taken: {t_stop - t_start}')
break
x = x_new.clone().detach().requires_grad_(True) # Create a new tensor with gradient enabled
print(f'Root approximated at x = {x.item()}')
print(f'Function value at root approximation f(x) = {f(x).item()}')
```
%% Output
Converged after 5 iterations.
Time taken: 0:00:00.017358
Root approximated at x = 0.7390851378440857
Function value at root approximation f(x) = 0.0
%% Cell type:markdown id: tags:
# With Optimiser
In the above code, we have implemented Newton's method directly.
However, modern deep learning packages include poweful optimisers that perform the calculation of the gradient, as well as the subsequent updates of the parameters.
As an exercise, re-write the code to use the [Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) optimiser.
*Hint*: You need to think of a suitable loss function;
*Note*: Depending on the problem at hand, using an optimiser and loss function may (or may not) improve convergence. You may find that the standard approach works sufficiently well for your problem.
%% Cell type:markdown id: tags:
**Exercise**
Modify the above code to use the Adam optimiser
%% Cell type:code id: tags:
```
##
## Your code goes here
##
```
%% Cell type:markdown id: tags:
**Solution**
%% Cell type:code id: tags:
```
# Optimization using Adam optimizer
x = torch.tensor([0.1], requires_grad=True, dtype=torch.float64)
tolerance = 1e-6
max_iterations = 10000
eps = 0.001 # small value to avoid dividig by zero
# Initialize the Adam optimizer with a smaller learning rate
optimizer = torch.optim.Adam([x], lr=0.001) # Adjusted learning rate
t_start = datetime.now()
for i in range(max_iterations):
if (i % 10 == 0 or i < 10):
print(f'Iteration {i+1}, x = {x.item()}')
optimizer.zero_grad() # Clear gradients
# Calculate the height at the desired time
y = f(x)
# Check for NaN or Inf in y
if torch.isnan(y) or torch.isinf(y):
print(f"NaN or Inf detected in y at iteration {i+1}. y = {y.item()}")
break
# Check for convergence
if torch.abs(y).item() < tolerance:
t_stop = datetime.now()
print(f'Converged after {i+1} iterations.')
print(f'Time taken: {t_stop - t_start}')
break
# Define a suitable loss function.
# Here, we aim for y = 0
# instead of using the normalised loss directly, we add a small
# contribution eps to make sure the loss function is always well behaved.
loss_1 = (y) ** 2
loss = loss_1 / (loss_1 + eps)
# Backpropagation
loss.backward()
# Check for NaN or Inf in gradients
if torch.isnan(x.grad) or torch.isinf(x.grad):
print(f"NaN or Inf detected in gradients at iteration {i+1}. x.grad = {x.grad.item()}")
break
# Optional: Gradient clipping
torch.nn.utils.clip_grad_norm_([x], max_norm=0.1)
# Update parameters
optimizer.step()
# Keep x within bounds using in-place clamping
with torch.no_grad():
x.clamp_(-5.0, 5.0) # Modify x in-place without breaking optimizer's reference
```
%% Output
Iteration 1, x = 0.1
Iteration 2, x = 0.10099999673261355
Iteration 3, x = 0.10200011044436759
Iteration 4, x = 0.10300041908334917
Iteration 5, x = 0.10400100037775635
Iteration 6, x = 0.1050019316830876
Iteration 7, x = 0.10600328983345642
Iteration 8, x = 0.10700515099814113
Iteration 9, x = 0.10800759054439886
Iteration 10, x = 0.10901068290747899
Iteration 11, x = 0.11001450146866591
Iteration 21, x = 0.12010782089419586
Iteration 31, x = 0.13034356776889294
Iteration 41, x = 0.1407691440633229
Iteration 51, x = 0.15141798444594942
Iteration 61, x = 0.16231463653195177
Iteration 71, x = 0.17347928741416305
Iteration 81, x = 0.1849307374093407
Iteration 91, x = 0.19668810368985853
Iteration 101, x = 0.20877177835967936
Iteration 111, x = 0.22120402809044454
Iteration 121, x = 0.23400945696152126
Iteration 131, x = 0.24721544794349035
Iteration 141, x = 0.26085264449495554
Iteration 151, x = 0.2749555107729198
Iteration 161, x = 0.289563002076966
Iteration 171, x = 0.30471937852469216
Iteration 181, x = 0.32047520108949706
Iteration 191, x = 0.3368885586694067
Iteration 201, x = 0.3540265872567576
Iteration 211, x = 0.3719673568265278
Iteration 221, x = 0.39080221606839755
Iteration 231, x = 0.4106386932852906
Iteration 241, x = 0.43160403728642205
Iteration 251, x = 0.45384940396289863
Iteration 261, x = 0.47755445160808896
Iteration 271, x = 0.5029314524461943
Iteration 281, x = 0.5302263422793498
Iteration 291, x = 0.5595582586060651
Iteration 301, x = 0.5885031662349146
Iteration 311, x = 0.6151798116487809
Iteration 321, x = 0.6396521603239107
Iteration 331, x = 0.6623189827162203
Iteration 341, x = 0.6835490453049341
Iteration 351, x = 0.7036280924837964
Iteration 361, x = 0.7227696519108118
Iteration 371, x = 0.7407727561124511
Iteration 381, x = 0.7418034202846798
Iteration 391, x = 0.7370450318486789
Iteration 401, x = 0.7395462808085576
Iteration 411, x = 0.7390046213514914
Iteration 421, x = 0.7388322109519164
Iteration 431, x = 0.7384914069776777
Iteration 441, x = 0.7389556408359775
Iteration 451, x = 0.7391372240953635
Iteration 461, x = 0.7389927799168022
Iteration 471, x = 0.738965346523297
Iteration 481, x = 0.7390324072129005
Iteration 491, x = 0.7393768757578784
Converged after 495 iterations.
Time taken: 0:00:00.332195
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment