Skip to content
Snippets Groups Projects
Commit 8ab2d161 authored by Ulf Liebal's avatar Ulf Liebal
Browse files

add Index.ipynb

parent 0bdefe85
Branches
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Corona Vaccine Recombinant Expression Simulation
***
## Background
The Corona pandemic is paralyzing societies all over the world and people wait for vaccines to ease the situation. A number of vaccination strategies is based on protein subunits ([Jeyanathan et al., 2020](https://doi.org/10.1038/s41577-020-00434-6)), which are produced by bacterial recombinant expression ([He et al., 2017](https://doi.org/10.1139/cjm-2016-0528), [Strizova et al.,2021](https://doi.org/10.1159/000514225)). You are member of a small biotech company, specialized in recombinant expression systems and your task is to develop a bacterial host to produce the highest possible amount of viral protein subunits. The expression of the viral protein is controlled by important sequences in the promoter and the final RNA vaccine production rate depends on reaching a high biomass by cultivating in optimal temperatures. Alas, you have only a limited amount of money. You have to aquire the starting material equipment and each experiment costs resources.
<img src="Figures/Jeyanathan_Covid19-Vaccines_20_Fig1.png" width="500">
<figure>
<img src="Figures/Jeyanathan_Covid19-Vaccines_20_Fig1.png" width="500">
<figcaption>The largest share of Corona vaccines use recombinantly expressed protein subunits (<a href='https://doi.org/10.1038/s41577-020-00434-6'>Jeyanathan et al., 2020</a>). </figcaption>
</figure>
In this project, selected steps of a biotechnological project for recombinant expressions are simulated. The experimental work in your biotech company is highly automated: you are setting the parameters of the experiments and focus on computational data analysis. The goal is to optimize the production rate of the protein subunits to be competitive compared to your peers. To achieve this goal, virtual experiments have to be performed to optimize growth conditions and promoter sequence for the production. Finally, a comparison will be performed how the GC-content of the promoter affects the promoter activity. The data analysis can be performed either separately with Excel, or with some guided coding steps within this script.
You start with a budget of 10.000 Eur. The initial laboratory setup and each subsequent experiment is associated with an investment. To optimize your initial host-promoter combination towards more effective production, a number of 4-6 strains might be necessary. Initially, you decide on how much money to spend on the laboratory equipment, investing too little will result in a higher failure rate of experiments. Some steps are difficult to perform. The exact parameters for effective cloning are unknown and depend on various complex factors.
|Initial Budget: 10.000 Eur|
| --- |
| Experiment | Cost in Eur |
| --- | --- |
| Equipment | 10-20% of budget |
| Temperature growth | 100 |
| Cloning | 200 |
| Promoter Strength | 100 |
| Production run | 500 |
## Introduction to Jupyter Notebooks
This simulation is based on a Jupyter Python Notebook. All the time you should work in this notebook named *1-Laboratory.ipynb*. You just have to work through it from top to bottom. Notebooks are becoming popular to distribute data science solutions and to make coding flexible and user friendly. The Notebooks are composed of a sequence of cells that can be either text, like this introduction, or python code cells to be run. Code cells have a grey background, and after execution the output is shown directly beneath. A blue stripe on the left edge of the screen marks the currently active cell. More information on how to use Jupyter Notebooks are [here](https://medium.com/edureka/jupyter-notebook-cheat-sheet-88f60d1aca7). In general, two input modes exist:
* Command mode: evaluation of code, `Escape` or click outside cell activates Command mode
* Edit mode: writing into cells, `Enter` or double-click inside cell activates Edit mode
Navigating in Jupyter:
* activate cell above/below: `arrow up/down keys` in Command mode.
* execute cell: `Ctrl + Enter` in Command mode.
* hide/show navigation panel on the left: click on folder symbol or `Ctrl + b`
* here **`None`** designates places for user input
A code cell can also be edited and executed multiple times. Next to the code cells in the upper left corner the status of the cell is displayed in square brackets. If you have not executed a cell yet, the brackets are empty. If the computer is currently executing the code, an small star appears there. If the cell has been executed, a number corresponding to the execution order of the cells is shown.
The following information is important if you get stuck in the workflow or if you have changed the code to become dysfunctional. If the resources run out or you are unsatisfied with your choices, restart the kernel: click on Kernel in the top bar and select restart. All information on your strain and all you executions are resetted and you get a slightly different system, whose properties you have to find out again. If you have inserted your own code to the extend that the simulation is dysfunctional, or you have deleted original code accidentally and you wish to come back to the original state you have to delete the biolabsim folder. To do so, use the last cell in this notebook. You will have to shut down and reload the Jupyter Notebook instance as described in the last cell.
The additional workbook named *2-Assistance.ipynb* contains guidance for some of the experiments. You can open it by double-click on the name in the left navigation panel. Then the notebook will open in a new window like you know it from a browser. To switch back to the original notebook, just select the other tab again or close the new one with the 'x'.
The workflow starts by preparing the computational system followed by strain characterization cultivations, experiments for promoter sequence selection and finally the experiment to measure the achieved expression rate.
## Laboratory Tasks
In this section all aspects of the laboratory are handled. As in every laboratory you only have a limited amount of resources. This means, for example, that the money available for the experiments and the required personnel, material and space are limited. You have a total of 40 of such resources at your disposal.
## Workflow
**1 Set-up of simulation environment**
**2 Lab setup**
*2.1 Choose your host organism*
*2.2 Choose Equipment investment*
**3 Culture characterization**
*3.1 Experiment set-up*
*3.2 Data analysis growth experiment*
**4 Promoter sequence selection**
*4.1 Promoter and expression experiments*
*4.2 Data analysis of promoter strength*
**5 Evaluation by cross-group integration**
## 1 Set-up of simulation environment
Loading libraries and fixing visualization. No user input necessary.
%% Cell type:code id: tags:
``` python
# Loading of important functionalities for the notebook:
# Loading numpy, a library fo manipulation of numbers:
import os
import numpy as np
# Loading matplotlib, a library for visualization:
import matplotlib.pyplot as plt
%matplotlib inline
# Initialization, loading of all laboratory functionalities and stored models and information of the organisms:
from BioLabSimFun import Mutant
print('System ready')
```
%% Cell type:markdown id: tags:
### 2 Lab setup
In this stage, you decide on which host organism to use for your recombinant expression system and the investment to the laboratory equipment. You have the choice between two organisms for recombinant expression, namely *E. coli* (abbr. Ecol) and *P. putida* (abbr. Pput). A high investment in the laboratory equipment increases the probability of successfull experiments,
*E. coli* is a Gram-negative, facultative anaerobe and nonsporulating bacterium of the genus *Escherichia*. It is commonly found in the lower intestine of warm-blooded organisms. *E. coli* can be grown and cultured easily and inexpensively in a laboratory setting, and has been intensively investigated for over 60 years. The bacterium is the most widely studied prokaryotic model organism, and an important species in the fields of biotechnology and microbiology, where it has served as the host organism for the majority of work with recombinant DNA. Under favorable conditions, it takes as little as 20 minutes to reproduce. You can find more information here: [*Escherichia coli*](https://en.wikipedia.org/wiki/Escherichia_coli)
*Pseudomonas putida* is a Gram-negative road-shaped, saprotrophic soil bacterium occurring in various environmental niches, due to its metabolic versatility and low nutritional requirements. Initiated by the pioneering discovery of its high capability to degrade rather recalcitrant and inhibiting xenobiotics, extensive biochemical analysis of this bacterium has been carried out in recent years. In addition, *P. putida* shows a very high robustness against extreme environmental conditions such as high temperature, extreme pH, or the presence of toxins or inhibiting solvents. Additionally, it is genetically accessible and grows fast with simple nutrient demand. Meanwhile, *P. putida* is successfully used for the production of bio-based polymers and a broad range of chemicals, far beyond its initial purpose for the degradation of various toxic compounds. You can find more information here: [DOI: 10.1007/s00253-012-3928-0](https://doi.org/10.1007/s00253-012-3928-0)
To choose the host type the abbreviation into the `Mutant`-command like it is shown in the example below. In the following all characteristics and models of your organism are thereby stored under `myhost`. With the help of `myhost`, all experiments are carried out (in the form `myhost.experiment`). In addition, all generated measurement results, stored information and the remaining resources can be displayed.
Example: `myhost = Mutant('Pput')`
**Resource cost:**
* **None**
* **Free**
**Input:**
* **`Mutant`: 'Ecol' or 'Pput' (string)**
* **`BuyEquipment`: 10-20% of total budget (integer)**
%% Cell type:code id: tags:
``` python
# User input is required in the following code lines:
# To choose the host organism replace None in the 'Mutant'-command with the abbreviation:
myhost = Mutant(None)
# Enter here the investment for the equipment, higher investment results in fewer experiment failures
myhost.BuyEquipment(None)
# host organism and remaining resources are displayed:
myhost.show_BiotechSetting()
```
%% Cell type:markdown id: tags:
## 3 Culture characterization
### 3.1 Experiment set-up
You have to identify the optimal growth temperature, the corresponding maximum growth rate and the maximum biomass of your strain by cultivating the cells at different temperatures.
For the optimal temperature and the maximum biomass a random value from a certain interval was seleced during the initialization of the system. These intervals contain all possible values, specific to the organisms, that the parameters can take.
The optimal growth temperature is randomly initiated based on the common temperature boundaries of **mesophilic microorganisms**, see the following website for more information (see page 23): [Schmid, Rolf D., and Claudia Schmidt-Dannert. Biotechnology: An illustrated primer. John Wiley & Sons, 2016.](https://application.wiley-vch.de/books/sample/3527335153_c01.pdf)
Like in biological conditions, you will recognize random fluctuations of biomass measurements. Occasionally, a culture will not grow at all. It might be helpful to have biological replicates for further data processing later on, but be aware that each cultivation costs resources.
The biomass concentrations are only displayed in an comma separated value (csv) file called `Strain_characterization_experiments_1.csv`. The file is created automatically after all experiments have been performed. You will find the csv-file in the left navigation panel of the folder you are in. To view and edit it, you first have to download it. To download the file to your computer, right click on it and choose 'Download'.
Also pay attention to the maximum biomass concentration that can be reached. This can also be found in the Excel file. It is the second important parameter for the final experiment by which the production rate is determined.
The third important parameter is the maximum growth rate. You have to calculate the corresponding growth rates directly in your Excel file called `Strain_characterization_experiments_1.csv` to determine the maximum value and the optimal growth temperature more accurately. If you get stuck with the calculation or are looking for an example, you will find that in the other notebook *2-Assistance.ipynb*.
If you want to do another set of experiments afterwards, or if you want to repeat individual experiments, you should make sure that you change the ID of your set of experiments (experiments_ID), otherwise results already generated may be overwritten. By default the experiments_ID has the value 1 like it is shown in the code cell below. For example, you were able to restrict the temperature range in which the optimum temperature should be. Thatswhy you want to do further cultivations in this smaller interval. All you have to do is to change the experiments_ID variable in the code cell below by replacing 1 with for example 2.
Example: `temperature = np.array([35, 35, 42])`
**Resource cost:**
* **1 for each temperature**
* **100 Eur for each Experiment**
**Input:**
* **`temperatures`: Temperature array (integer list)**
* **`experiments_ID`: variable name (integer)**
%% Cell type:code id: tags:
``` python
# User input is required in the following code lines:
# When you have thought about the temperatures you want to test, type them one by one into the following 'temperatures'-vector by replacing 'None' with the vector as shown in the example above.
temperatures = np.array(None)
# If you want to change the experiemnts_ID, replace 1 with another number. If this is not the case, no user input is required here.
experiments_ID = 1 # Definition of an ID for the set of experiments
# No user input necessary in all subsequent lines of code:
# cultivations of your strain at the different temperatures one by one:
myhost.Make_TempGrowthExp(temperatures, experiments_ID)
# host organism and remaining resources are displayed:
myhost.show_BiotechSetting()
```
%% Cell type:markdown id: tags:
### 3.2 Data analysis growth experiment
The data of the experiment was stored in a coma-separated-value (`csv`) file in the local adress. The data has to be analysed to extract optimal temperature, growth rate and maximum biomass. You have the choice to either analyse the data via a spreadsheet application on your local computer, e.g. Excel, or via a programming approach with Python.
In Excel you have to import the csv-file to get the data into separate columns. You then apply a natural logarithm to the data and plot the value versus time. The plot allows to extract the temperature supporting fastest growth with highest slope and the time until which the increase of logarithmic biomass is linear, i.e. it displays exponential growth. Then you apply a linear regression on the linear section of the fastest logarithmic biomass increase to extract the growth rate as the regression slope. You determine the maximum biomass by averaging over a number of measurements on the plateau of the measured biomass (real values, not logarithm).
For the Python approach, there are scambled solutions of lines of code which you have to organize correctly. Even without programming experience this should take an equivalent time compared to the Excel approach. Please record the time it takes for you to conduct the data analysis and proceed to the item `Promoter sequence selection`.
#### 3.2.2 Python based growth analysis
In the following, data analysis via Python can be performed. The procedure is separated in two steps: 1. visualization of the results, 2. extracting growth rate and maximum biomass for the optimal temperature. For each process the corresponding lines of code are provided in [this link](http://parsons.problemsolving.io/puzzle/72e246b0b1e14325b062625296a5f180), but you have to arrange them in right order. This is done by drag and drop of the corresponding lines in the graphic interface. Following arranging the right order, you have to retype the code sequence in the empty code cell beneath to extract your solution.
##### 3.2.2.1 Visual analysis of exponential growth
The visual indication of exponential growth is a linear slope in the plot of the logarithm of biomass versus time. The following commands are involved (scrambled order):
* assign variables for time and biomass data
* loading of the cvs-file into a numpy data array ([genfromtxt](https://stackoverflow.com/questions/3518778/how-do-i-read-csv-data-into-a-record-array-in-numpy))
* plotting biomass versus time ([plt.scatter](https://matplotlib.org/2.0.2/users/pyplot_tutorial.html))
* store natural logarithm of the biomass in new variable ([np.log](https://numpy.org/doc/stable/reference/generated/numpy.log.html))
%% Cell type:code id: tags:
``` python
# Insert the correct code sequence for plotting in this cell.
# %load Snippets/snip_GrowthPlot.py
```
%% Cell type:markdown id: tags:
##### 3.2.2.2 Determine maximum biomass and growth rate
After having the graphical insight, select the optimal temperature with fastest growth using the experiment index number in Python. Be carefull, Python starts counting with zero! Thus, the first experiment has the index `0`. Also extract from the plots the time as nearest integer number until which a linear slope, i.e. exponential growth, takes place. This value will be used to estimate the growth rate via linear regression. The process is described in the following scrambled lines:
* extract the mean biomass after the linear slope as max biomass ([np.mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html))
* identify the latest time (integer number) of linear slope
* conduct linear regression in the linear region ([np.polyfit](https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html))
Like in the previous coding example, [this link](http://parsons.problemsolving.io/puzzle/23d6118f9a4e4e83b29eb9eac7f8ebfb) contains the scambled code lines, that you have to bring in right order. Note, that there are two correct versions of command sequences, however, the web interface accepts only one sequence as correct.
**Input:**
* **1st None: Optimum Temperature**
* **2nd None: max time for linear slope**
%% Cell type:code id: tags:
``` python
# Insert the correct code sequence for calculating the growth and biomass parameters here.
# For None enter the corresponding values of experiment index with fastest growth and latest time of linear growth (integer number).
# %load Snippets/snip_GrowthPars.py
```
%% Cell type:markdown id: tags:
## 4 Promoter sequence selection
In bacteria, the initiation of transcription at promoters requires the sigma to bind the RNA polymerase core to form the holoenzyme. Sigma factors recognize and open the promoter DNA and perform the initial steps in RNA synthesis. Particularly important DNA recognition sites are the -10 box and -35 box positions. The number of sigma factors varies between bacterial species. Sigma factors are distinguished by their characteristic molecular weights. The primary, housekeeping sigma factor of gram-negative rod-shaped bacteria is sigma70 and has a molecular weight of 70 kDa. The housekeeping sigma factors direct the bulk of transcription during active growth. The following article introduces the sigma70 transcription factor and you should focus on extracting the optimal recognition sequence for -10 and -35 box that are responsible for gene expression in this project [https://doi.org/10.3390/biom5031245](https://doi.org/10.3390/biom5031245).
The total length of the promoters must be 40 nt. The following template serves as an aid for the creation of promoter sequences that meet the conditions for successfull promoter design. Replace the **X** with nucleotides and identify the best sequence, here the six **X** at start and end represent -35 and -10 box, respectively:
##### GCCCA**XXXXXX**A**X**GC**XXX**C**X**CGT**XXX**GG**XXXXXX**TGCACG
Create some promoters, and test them, but be aware that each testing costs resources. To test the sequences, you have to clone each of them, introduce the resulting construct into the organism and then perform an expression test by measuring the promoter strength. Finally, you can decide to continue to a production experiment with your clone to quantify the overall vaccine production efficiency.
The data of the promoter and expression experiments are stored in a csv-file. Your additional analysis task is to visualize the correlation between the GC-content of the promoter and the promoter strength. You can perform this task either by importing the file into Excel, or by rearranging scrambled python code.
### 4.1 Promoter and expression experiments
#### 4.1.1 Promoter choice and cloning
In order to perform a successful cloning, you have to design a suitable primer for each promoter. In addition, a melting temperature matching the primer sequence must be used.
First create the primers matching your promoter sequences and the following characteristics and write them down in your Excel file. The primers should always start at the first nucleotide of the promoter sequences and are composed of complementary bases. You have to identify the optimal primer length of your strain. Like the optimal growth temperature this parameter was also randomly initiated based on a certain interval containing all values that the parameter can take so that cloning still works. Possible values for the optimal primer length are between 15 - 30 nucleotides and successfull primers need to be within 20% length distance.
Then calculate the melting temperature for each primer and write it into your Excel sheet. On the following website you will find formulas for calculating the melting temperature. A sodium concentration of 100 mM is assumed. The deviation from the optimal melting temperature should be within 10%. Formulas for calculating the melting temperature: [genelink manual](https://www.genelink.com/Literature/ps/R26-6400-MW.pdf)
Finally perform a cloning with each pair of promoter and primer followed directly by the measurement of the promoter strength to test the expression. Each cloning experiment gets a specific Clone_ID like for exmaple : 'Clone_1'. This automatically designates the corresponding generated clone that contains the corresponding promoter sequence. To perform a cloning experiment, all you have to do is to execute the `Make_Cloning(Clone_ID, Promoter, Primer, Tm)`-command. For the parameters in this commmand you can enter either the appropriate variable or the respective name/value/sequence.
**Important note/warning:** For successful cloning, the primer length is not allowed to deviate too much from the optimal one specific for the strain, as described above. Furthermore, cloning will not work if the melting temperature deviates too much from the optimal one. This means it is quite common that the entire cloning step is a tricky and time consuming task.
If the cloning fails, because one of the mentioned necessary conditions was not fulfilled, the sequence could not be multiplied and introduced into the organism. Then the melting temperature, the total primer length or the complementarity of bases may need to be adjusted. If you need help with this task or with the subsequent measurement of the promoter strength, you will find a template for the table to be created with an example sequence for a promoter in the other notebook *2-Assistance.ipynb*.
**Resource cost:**
* **1 for each clone**
* **200 Eur each cloning experiment**
**Input:**
* **`Clone_ID`: small identifier for the sequence (string)**
* **`Promoter1`: 40nt from [ACGT] (string)**
* **`Primer1`: 15-30nt complementary to sequence (string)**
* **`Tm`: number (integer)**
%% Cell type:code id: tags:
``` python
# For each cloning, the Clone_ID, the promoter sequence, the corresponding primer and the melting temperature must be given.
# For simplicity, you should define variables for these four parameters before and then put the variable names into the 'Make_Cloning()'-command later on.
# The following lines of code are an example of these variables.
# In this order the parameters must then be entered in the 'Make_Cloning()'-command for the actual cloning like it is shown below.
# User input is required in the following code lines:
# You have to replace None with the corresponding sequences (must be entered as strings) or the melting temperature.
# To be able to clone further sequences, you have to define another set of all variables including the Clone_ID for each cloning.
# To define a further Clone_ID you for example have to replace the number 1 in Clone_ID1 and Clone_1 with a different number.
Clone_ID1 = 'Clone_1'
Promoter1 = None
Primer1 = None
Tm = None # melting temperature
# cloning:
myhost.Make_Cloning(Clone_ID1, Promoter1, Primer1, Tm)
# No user input necessary in all subsequent lines of code:
# displays the generated clones and their properties:
myhost.show_Library()
# host organism and remaining resources are displayed:
myhost.show_BiotechSetting()
```
%% Cell type:markdown id: tags:
#### 4.1.2 Measurement of the promoter strength
The promoter strength represents expression per cell. Later, it is multiplied by the growth rate and the biomass concentration in order to determine the expression rate. The expression rate of the product should be maximized. For further information and an illustration of the context, you can look at the last section of the second *2-Assistance.ipynb*-notebook. After successful cloning determine the promoter strength in a subsequent experiment by entering the ID of your clone into the `Make_MeasurePromoterStrength(Clone_ID)`-command like it is shown in the cell below. Document the promoter strengths together with the automatically determined GC contents in your Excel table. You will need both later when you evaluate the results. If an incorrect sequence is used, for example containing typing errors, an error may occur when measuring the promoter strength and also when measuring the expression rate. The error message is: "Key Error 'X'".
**Resource cost:**
* **1 each experiment**
* **100 Eur each experiment**
**Input:**
* **In `Make_MeasurePromoterStrength`: Clone_ID variable name (variable, string)**
%% Cell type:code id: tags:
``` python
# User input is required in the following code lines:
# To determine the promoter strength replace None with the Clone_ID variable name (for example Clone_ID1) or the corresponding name of the clone like for example 'Clone_1'.
myhost.Make_MeasurePromoterStrength(None)
# No user input necessary in all subsequent lines of code:
# displays the generated clones,their properties and their performance:
myhost.show_Library()
# host organism and remaining resources are displayed:
myhost.show_BiotechSetting()
```
%% Cell type:markdown id: tags:
#### 4.1.3 Measurement of the final vaccine expression rate
Now that you have tested some promoter sequences, perform the production experiment with the promoter sequence (Clone_ID) and use the determined optimal growth temperature (integer only), the corresponding maximum growth rate and the maximum possible biomass (integer only) that you can determine from the strain growth characterization. The values for the biomass and the growth rate to be used should differ less than 10% from the maximum values. The expression experiment is started with the function `Make_ProductionExperiment(Clone_ID, Temp, µ, Biomass)`-with the corresponding arguments of clone to use, optimal cultivation temperature, growth rate and maximum biomass. If you have little resources left, use the most promising clones.
Finally, the experimental results are exported to a csv file for further data analysis. The function works without further user input. The output file is `Production_Experiments.csv`.
**Resources:**
* **500 each experiment**
* **500 Eur each experiment**
**Input:**
* **in `Make_ProductionExperiment`: Clone_ID (string), Opt. Temp (int), Opt. Growth rate (float), Opt. Biomass (int)**
%% Cell type:code id: tags:
``` python
# To perform the production experiment replace None with the Clone_ID variable name of your best performing clone, the optimal growth temperature, the corresponding maximum growth rate and the maximum biomass (in this order).
myhost.Make_ProductionExperiment(None, None, None, None)
# No user input necessary in all subsequent lines of code:
# displays the generated clones, their properties and their performance:
myhost.show_Library()
# host organism and remaining resources are displayed:
myhost.show_BiotechSetting()
# Export data to csv file for analysis
myhost.ExportExperiments()
```
%% Cell type:markdown id: tags:
## 4.2 Data analysis of promoter strength
The biotechnological goal is the construction of a host strain with high productivity. Moreover, scientifically we like to investigate the relationship between promoter strength and GC content, does the GC content predict the promoter strength? First, you will examine your own data in a plot for correlation, subsequently, all groups will enter their results for the species specific promoter strength versus GC content plot in an online plot. The online plot shows the results of all groups and allows a more solid conclusion.
### 4.2.1 Visualization of the results
Summarize your results from the laboratory workflow in a graph by plotting the promoter strengths of the final relative expression rates against the respective GC contents of the promoter sequences. You can perform the data analysis in Excel by importing the data file `Production_Experiments.csv` and generating a scatter plot of relative expression rate versus GC-content. Alternatively, you can use the Python code with scrambled lines [in this link](http://parsons.problemsolving.io/puzzle/69d446760d214adfb32cb55d215bf7f3).
Summarize your results from the laboratory workflow in a graph by plotting the promoter strengths of the final relative expression rates against the respective GC contents of the promoter sequences. You can perform the data analysis in Excel by importing the data file `Production_Experiments.csv` and generating a scatter plot of relative expression rate versus GC-content. Alternatively, you can use the Python code with scrambled lines [in this link](http://parsons.problemsolving.io/puzzle/2eca8d526b8e428ea944b9caf54e772b).
Enter the correct code sequence from the quizz in place of `None` in the cell below. The code cell that extracts the data columns for GC content and relative expression strength is missing the column number for the corresponding data. Add these column number in the fields labeled with `None` and note that Python starts counting from zero, thus the first column has index `0`.
Enter the correct code sequence from the quizz in the cell below. The code cell that extracts the data columns for GC content and relative expression rate is missing the column number for the corresponding data. Add these column number in the fields labeled with `None` and note that Python starts counting from zero, thus the first column has index `0`.
**Input:**
* **`my_data`: GC-content column number (integer)**
* **`my_data`: expression rate column number (integer)**
%% Cell type:code id: tags:
``` python
# %load Snippets/rev_ExprPlot.py
# Insert the correct code sequence for plotting in this cell.
# %load Snippets/snip_ExprPlot.py
```
%% Cell type:markdown id: tags:
### 5 Evaluation by cross-group integration
To get a more statistical sound result on the correlation between promoter strength and GC content all groups enter their results in an online feedback system:
https://arsnova.eu/mobile/#id/20126509
After following the link, click on 'Presentation' to enter. The feedback system has two questions, one for the results in *E. coli* and one question for *P. putida*. Enter you results in your corresponding question and click on 'Abstention' in the other question. You enter your values by clicking on the corresponding fields in the displayed plot and subsequent saving of the input. Once all groups have entered their results, you can check the distribution of all promoter strength versus GC content measurements by accessing the questions again and clicking on the orange colored bar chart button on the upper right.
%% Cell type:markdown id: tags:
## Feedback
It would be great if you could help us improving the simulation for the next generation of your fellows. Please provide a feedback in the following form:
[https://forms.gle/b3dTEu4LsJh8w7eX7](https://forms.gle/b3dTEu4LsJh8w7eX7)
%% Cell type:markdown id: tags:
## Delete local BioLabSim instance
Execute the code below to delete a local BioLabSim instance. Do this if you have changed the code and you want to get back the original version. You will have to load a new BioLabSim simulation. The code is commented by default, activate the code by removing the `# ` in the first line. To enable the reloading of the initial BioLabSim-Version, you have to go through the following sequence:
1. click on `File/Datei` in the Jupyter Notebook Window
2. click on `Hub Control Panel` (second from below)
3. click on `Stop My Server` and wait a little for the process to finish
4. click on `Start My Server`
5. click on `Launch Server`
6. select `[BioLabSim] Blockpraktikum Mikrobengenetik` (page bottom) and click on start
6. select `[BioLabSim] Biotechnologie` (slightly below page middle)
7. click on start on page bottom
%% Cell type:code id: tags:
``` python
# !cd ..; rm -rf biolabsim
mylist = os.listdir()
if '1-Laboratory.ipynb' not in mylist:
print('BioLabSim deleted')
else:
print('BioLabSim available')
```
%% Cell type:markdown id: tags:
#### Package dependencies
%% Cell type:code id: tags:
``` python
%load_ext watermark
%watermark -v -m -p IPython,ipywidgets,matplotlib,numpy,pandas,openpyxl,sklearn,scipy,joblib,watermark
```
......
This diff is collapsed.
Figures/iAMB-rwth-logo.png

43.9 KiB

%% Cell type:markdown id:bulgarian-church tags:
<img src='Figures/iAMB-rwth-logo.png' width=600>
<h1 align="center"> BioLabSim Powered Simulation Notebooks</h1>
<h2 align="center"> Ulf Liebal, Rafael Schimassek, Lars Blank </h2>
<h3 align="center"> <a href='ulf.liebal@rwth-aachen.de'> ulf.liebal@rwth-aachen.de </a> </h3>
---
## Introduction
The following list shows educational notebooks that run the BioLabSim Python environment for cellular simulations. All workflow run in the Jupyter Notebook environment, an intruction is provided [here](./JupyterIntro_Palkovits.ipynb).
### RecExpSim
RecExpSim is a simulation of selected steps in recombinant protein expression and the associated data analyses. The simulated experiments are strain characterization of growth parameters (optimal temperature, growth rate and maximum biomass), selection of promoter sequences, cloning steps, and evaluation of final expression strength.
- Internal link: [RecExpSim](./1-Laboratory.ipynb)
- RWTHjupyter link: [![](https://jupyter.pages.rwth-aachen.de/documentation/images/badge-launch-rwth-jupyter.svg)](https://jupyter.rwth-aachen.de/hub/spawn?profile=biolabsim&next=/user-redirect/lab/tree/biolabsim/1-Laboratory.ipynb)
- Duration: 3h
**Additional material**
* Why Jupyter is data scientists’ computational notebook of choice ([Nature 563, 145-146, 2018](https://doi.org/10.1038/d41586-018-07196-1))
* Jupyter easy technical [description](https://www.earthdatascience.org/courses/intro-to-earth-data-science/open-reproducible-science/jupyter-python/)
* Python basics [cheat sheet](https://i.redd.it/ahetz5jtbzq11.jpg)
* Propterties of mesophilic organisms [overview](https://application.wiley-vch.de/books/sample/3527335153_c01.pdf) (esp. p. 18 of the pdf)
* Bacterial promoter architecture [review](https://doi.org/10.3390/biom5031245) (relevant are -10 and -35 box)
* GC-content [calculations](https://www.genelink.com/Literature/ps/R26-6400-MW.pdf)
### Genome scale model analysis with cobrapy
This workflow is an introduction to genome scale model constraint based and reconstruction analysis (COBRA). The content covers loading existing models and examining their properties, flux simulation with flux balance analysis and resetting the objective to new metabolic objectives.
Link:
Duration: 2h
### FermProSim (planned)
To be added.
### MetEngSim (planned)
To be added.
DataFile = 'Production_Experiments.csv'
my_data = np.genfromtxt(DataFile, delimiter=',', skip_header=1)
my_data = np.array([np.genfromtxt(DataFile, delimiter=',', skip_header=1)])
GCcont, Express = my_data[:,2], my_data[:,6]
plt.plot(GCcont,Express, linestyle = '--', marker = 'x', color = 'grey')
plt.gca().set(xlabel='GC-cont', ylabel='rel. expression', xlim=(.4,.8), ylim=(0,1))
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment