Skip to content
Snippets Groups Projects
Commit 23c6b3ef authored by Ulrich Kerzel's avatar Ulrich Kerzel
Browse files

add solution to new exercises

parent c706f193
Branches
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Computing Gradients in PyTorch
[PyTorch](https://pytorch.org/) is a comprehensive library that is primarily used for machine learning. However, it can also be used as an effective way to handle matrix operations or gradients.
In particular for the latter, we can exploit the fact that training neural networks requires calculating gradients efficiently as this is the backbone of the algorithms for training the networks.
Therefore, if we can formute our problem at hand in such a way that we can use PyTorch, we can use the inbuilt methods to compute and obtain the gradients.
In PyTorch, this is done via [AutoGrad](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html).
In this example, we use a simple sine-function: Using such a simple function makes it easy for any neural network to learn the functional dependency. Moreover, we can compare this to the well known derivative: $\frac{d \sin(x)}{d x} = \cos(x)$, which makes it immediately obvious if we have learned the correct gradient.
%% Cell type:code id: tags:
```
import torch
from torch import nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
```
%% Output
Using cpu device
%% Cell type:markdown id: tags:
## Training Data
In this simple example, we will use $f(x) = \sin(x)$ to generate training data.
First of all, the releationship is very simple, i.e. even small networks will be able to learn this quickly. Additionally, we know what the gradient will look like: $\frac{dy}{dx} = \cos(x)$, i.e. we know immediately if the network has learned the correct gradient.
The function [torch.linspace](https://pytorch.org/docs/stable/generated/torch.linspace.html) is the equivalent to numpy version but produces a tensor directly.
The part ```.view(-1,1)``` re-shapes the resulting array such that we have one feature: torch.linspace creates a tensor with shape (100), i.e. a 1D tensor with 100 elements. The ```-1``` is a placeholder to tell PyTorch to infer the length automatically from the number of elements in the original tensor. The ```1``` tells PyTorch to reformat the data such that we have one feature. The resulting tensor has a shape of (100,1), i.e. 100 rows of 1 feature each.
%% Cell type:markdown id: tags:
**Exercise**
Create training data ```x_train``` and ```y_train``` for a $sin(x)$ function in the interval $x_{train} \in (0, 2\pi)$.
Plot the resulting training data.
%% Cell type:code id: tags:
```
##
## Your code here
##
```
%% Cell type:markdown id: tags:
**Solution**
%% Cell type:code id: tags:
```
# Generate training data based using sin(x)
x_train = torch.linspace(0, 2 * torch.pi, steps=100, device=device).view(-1, 1)
y_train = torch.sin(x_train)
sns.lineplot(x=x_train.numpy().flatten(),
y=y_train.numpy().flatten(), label='sin(x)')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
```
%% Output
%% Cell type:markdown id: tags:
## Network Definition and Training
We now define a very small neural network, for example a "shallow" network with just three fully connected layers.
- How many input nodes do we need?
- How many output nodes do we need?
Here, we need one input node, since we pass one value at the time to the network: $y = \sin(x)$.
Similarly, we only need one output node as we want the network to learn a single number.
%% Cell type:markdown id: tags:
**Exercise**
Write a class for a small neural network with three fully-connected (linear) layers and $\tanh(x)$ as activatin function.
Discuss how many input and output nodes the network needs.
%% Cell type:code id: tags:
```
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
##
## your code here
##
def forward(self, x):
##
## your code here
##
return x
model = NeuralNetwork().to(device)
print(model)
```
%% Cell type:markdown id: tags:
**Solution**
%% Cell type:code id: tags:
```
# A simple Neural Network
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(1, 50)
self.fc2 = nn.Linear(50, 50)
self.fc3 = nn.Linear(50, 1)
def forward(self, x):
x = self.fc1(x)
x = F.tanh(x)
x = self.fc2(x)
x = F.tanh(x)
x = self.fc3(x)
return x
model = NeuralNetwork().to(device)
print(model)
```
%% Output
NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(fc1): Linear(in_features=1, out_features=50, bias=True)
(fc2): Linear(in_features=50, out_features=50, bias=True)
(fc3): Linear(in_features=50, out_features=1, bias=True)
)
%% Cell type:code id: tags:
```
# Train model
# Define the optimizer and loss function
optimizer = optim.Adam(model.parameters(), lr=0.01)
# Training loop
num_epochs = 1000
loss_history = []
for epoch in range(num_epochs):
# Enable gradient tracking for time steps
x_train.requires_grad = True
# Forward pass: Predict
predictions = model(x_train)
# Compute the data loss (difference from sin(x))
# using the mean squared error as a loss-function for regression
data_loss = torch.mean((predictions - y_train) ** 2)
# Compute the gradient dt/dx using torch.autograd.grad
dy_train = torch.autograd.grad(
outputs=predictions,
inputs=x_train,
grad_outputs=torch.ones_like(predictions),
create_graph=True
)[0]
# Physics loss: Enforce the relationship dy/dx = cos(x)
physics_loss = torch.mean((dy_train- torch.cos(x_train)) ** 2)
# Total loss: Combine data and physics losses
total_loss = data_loss + physics_loss
# Backward pass and optimization step
optimizer.zero_grad()
total_loss.backward()
optimizer.step()
# Record the loss
loss_history.append(total_loss.item())
# Print progress every 100 epochs
if epoch % 100 == 0:
print(f"Epoch {epoch}/{num_epochs}, Total Loss: {total_loss.item():.6f}")
```
%% Output
Epoch 0/1000, Total Loss: 1.043527
Epoch 100/1000, Total Loss: 0.000421
Epoch 200/1000, Total Loss: 0.000122
Epoch 300/1000, Total Loss: 0.001010
Epoch 400/1000, Total Loss: 0.000005
Epoch 500/1000, Total Loss: 0.000433
Epoch 600/1000, Total Loss: 0.000004
Epoch 700/1000, Total Loss: 0.000003
Epoch 800/1000, Total Loss: 0.001428
Epoch 900/1000, Total Loss: 0.000008
%% Cell type:code id: tags:
```
sns.lineplot(loss_history, label='Training loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()
```
%% Output
%% Cell type:markdown id: tags:
# Plot the Gradient
We now check if the network has learned the correct gradient, i.e. $\frac{dy}{dx} = \cos(x)$
We generate some independent numbers on the same domain, obtain the predictions $\hat{y}$ and plot:
- the ground truth: $y = \sin(x)$,
- the predictions $\hat{y}$
- the gradient
%% Cell type:code id: tags:
```
# Prepare test data with requires_grad=True
x_test = torch.linspace(0, 2 * torch.pi, steps=200, device=device, requires_grad=True).view(-1, 1)
y_test = torch.sin(torch.tensor(x_test))
# predictions from the trained model
y_hat = model(x_test)
#gradient
dy_dx = torch.autograd.grad(
outputs=y_hat,
inputs=x_test,
grad_outputs=torch.ones_like(y_hat),
create_graph=True
)[0]
# detach from GPU and graph
x_test = x_test.detach().cpu().numpy().flatten()
y_test = y_test.detach().cpu().numpy().flatten()
y_hat = y_hat.detach().cpu().numpy().flatten()
dy_dx = dy_dx.detach().cpu().numpy().flatten()
# Plot predictions and gradients
sns.lineplot(x=x_test, y=y_test, label='sin(x) (Ground Truth)')
sns.lineplot(x=x_test, y=y_hat, label='Prediction')
sns.lineplot(x=x_test, y=dy_dx, label='Predicted Gradient')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
```
%% Output
<ipython-input-6-1f4b339fb796>:3: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
y_test = torch.sin(torch.tensor(x_test))
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
This diff is collapsed.
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment