Howto: EnKF run
This is a step-wise explanation of using SHEMAT-Suite for permeability estimation by the ensemble Kalman filter (EnKF).
Reference for the EnKF: Evensen, G., The ensemble Kalman filter: theoretical formulation and practical implementation, Ocean Dynamics, 53(4), 343–367 (2003). http://dx.doi.org/10.1007/s10236-003-0036-9
There following steps of an EnKF-run with SHEMAT-Suite will be discussed
- Compilation
- Extra Input needed
- Output and Results
Compilation
For the compilation, it is important to compile the code including the
stochastic module, i.e. from the branches master-sm
or
master-all
.
Additionally, please check that the compilation mode is set to
sm
. Usually, the property module for EnKF-updates will be const
,
so that there is no interference between physically and stochastically
changed parameter updates.
An minimal example compilation script could look like this:
#!/bin/zsh
# Compile SHEMAT-Suite executable with cmake and copy it to model_dir
#------------------------------------------------------------------
#-----------------------Variables ---------------------------------
#------------------------------------------------------------------
model_dir="${HOME}/SHEMAT-Suite_Models/sm_const_wavereal_enkf/" # "${HOME}/SHEMAT-Suite_Models/fw_const_Example"
make_dir="${HOME}/SHEMAT-Suite" # "${HOME}/SHEMAT-Suite"
shem_type="sm" # "sm", "fw"
mode="head" # "head", "pres"
props="const"
user="none"
compiler_name="gfortran" # "gfortran", "ifort"
flags="-Dhdf=ON -Domp=OFF -j16" # Flags: "-Domp=ON","-Dhdf=ON","-Ddetails=ON"
# Go to make_dir
pushd ${make_dir}
#New make-directory
mkdir build_${props}
pushd build_${props}
#CMake command
FC=${compiler_name} cmake -DPROPS=${props} -DUSER=${user} -Dphys_base=${mode} "${flags}" ..
#Compilation command
gmake ${shem_type} -j16
# Copy executable
cp shem_${shem_type}.x ${model_dir}
For general information on compilation, see Compilation.
Input
For an EnKF run the following additional input is needed compared to a forward run:
- Additional input in the general input file
WAVEREAL
- Input files or sequential Gaussian simulation (SGSim):
sgsim_k_wavereal.par
,logk_wavereal.dat
- An EnKF input file
WAVEREAL.enkf
(EnKF Input File) - Tecplot files of the (possibly synthetic) true parameter values
(
TrueWAVEREAL.plt
TrueWAVEREAL_chem.plt
) - Observation data in
observations_WAVEREAL.dat
Additional input in general input file
The stochastic mode is invoked in the input file under # runmode
,
EnKF has the runmode 3
:
# runmode
3
Additionally, some general EnKF input is given under the following keywords
!=========>>>>> <<<<<==========!
!=========>>>>> SIMULATION <<<<<==========!
!=========>>>>> <<<<<==========!
# simulate
10
# enkf iteration
1
!never more than 3
# enkf postcompute none
!none mean samples
Under # simulate
the size of the ensemble is specified. In
WAVEREAL
this size is 10. This is usually a too small ensemble size
for meaningful updates. In the testmodel this ensemble size is to
reduce the computational effort. Later, we will see some results for
the same model and ensemble size 1000.
The keyword # enkf iteration
specifies the number of full EnKF
runs. This should normally be 1
as it is here. Only change this to a
larger number if you really want to start the EnKF over with the
updates of a previous EnKF run.
Finally under # enkf postcompute
a final forward run can be
simulated with updated parameter values (either for the mean
or for
the realizations/samples
). Here, we are not interested in this, so
the value is set to none
.
Next to these general EnKF-inputs, we specify some input for the geostatistical simulation of parameter values.
!==========>>>>> <<<<<==========!
!==========>>>>> ROCK MATRIX <<<<<==========!
!==========>>>>> <<<<<==========!
# parameter group, records=1
1 parfile=sgsim_k_wavereal.par
1 4 log
# standard deviation
0.00000d+000 0.00000d+000 0.00000d+000 2.302585093d+000 0.00000d+000 0.00000d+000 0.00000d+000 1.00000d+000 0.00000d+000 0.00000d+000 0.00000d+000 0.00000d+000 0.00000d+000 0.00000d+000 0.00000d+000 0.00000d+000 0.000000000d+000
# units
0.1d0 1.d0 1.d0 3.1632d-10 1.d-9 1.d0 1.d0 3.35d0 0.0d-06 2300000.d0 5.0d0 0.d0 0.d0 0.0d0 0.0d0 0.0d0 0.0d0
# split units, records=1
1
# uindex
961*1
The first entry (# parameter group
) specifies a SGSim input file
(sgsim_k_wavereal.par
, see next
section, the 4
means
that the fourth parameter of the SHEMAT-Suite parameter order in # units
is simulated (permeability, see Input
File).
# standard deviation
is a dummy input still existing for historical
reasons. It is important that this value is correct for the updated
parameter, in our case the fourth value is set to 2.302585093
, this
is \log_{10}(e)
and turns the input (exponents of 10) into natural
exponents.
# units
is the same as for forward computation, only the updated
parameter is not read in, since it is specified by # parameter group
.
# split units
splits unit 1
into as many units as there are cells
in this unit. This is needed for Gaussian simulation since in Gaussian
simulation every cell will get its own parameter value.
# uindex
is the same as in a forward computation, but the change by
# split units
have to be kept in mind.
Input for sequential Gaussian simulation
The input file for the sequential Gaussian simulation
sgsim_k_wavereal.par
is specified under # parameter group
in the
general input file WAVEREAL
.
sgsim_k_wavereal.par
look as follows
Parameters for SGSIM
********************
START OF PARAMETERS:
well_frac_data.gslib \file with data
1 2 3 4 0 0 \ columns for X,Y,Z,vr,wt,sec.var.
-1.0e21 1.0e21 \ trimming limits
1 \transform the data (0=no, 1=yes)
sgsim.trn \ file for output trans table
1 \ consider ref. dist (0=no, 1=yes)
logk_wavereal.dat \ file with ref. dist distribution
1 0 \ columns for vr and wt
-16.0 -8.0 \ zmin,zmax(tail extrapolation)
1 -16.0 \ lower tail option, parameter
1 -8.0 \ upper tail option, parameter
0 \debugging level: 0,1,2,3
sgsim.dbg \file for debugging output
sgsim.out \file for simulation output
1 \number of realizations to generate
31 0.0 2.0 \nx,xmn,xsiz
31 0.0 2.0 \ny,ymn,ysiz
1 0.0 2.0 \nz,zmn,zsiz
1111111 \random number seed
0 12 \min and max original data for sim
12 \number of simulated nodes to use
1 \assign data to nodes (0=no, 1=yes)
0 3 \multiple grid search (0=no, 1=yes),num
0 \maximum data per octant (0=not used)
1000.0 1000.0 50.0 \maximum search radii (hmax,hmin,vert)
90.0 0.0 0.0 \angles for search ellipsoid
1 0.60 1.0 \ktype: 0=SK,1=OK,2=LVM,3=EXDR,4=COLC
../data/ydata.dat \ file with LVM, EXDR, or COLC variable
4 \ column for secondary variable
1 0.0 \nst, nugget effect
1 1.0 0.0 0.0 0.0 \it,cc,ang1,ang2,ang3
50.0 50.0 5.0 \a_hmax, a_hmin, a_vert
For general information about these inputs, see the SGSim Input file wikipage.
Here, we look at the most important inputs including the file
logk_wavereal.dat
that contains the reference
distribution. logk_wavereal.dat
looks like this
Target Histogram
1
log permeability, -1.25e+01 +- 5.00e-01
-1.34591909e+01
-1.25655339e+01
-1.28843158e+01
-1.13050241e+01
-1.24613772e+01
-1.23121963e+01
-1.23022179e+01
-1.25562495e+01
-1.34145184e+01
-1.14542739e+01
...
The first three lines are comments, then there is a large-enough
number of values from a probability distribution, in this case -12.5 +- 0.5
.
Additionally to this input, cut-off values for the distribution are specified (-16.0 and -8.0).
The number and size of the cells (31, 2.0) has to be compatible with the input file, otherwise the correlation lengths will be wrong.
The random number seed is 1111111
.
The search radii (1000.0 and 1000.0) are chosen so large that they do not affect the field.
The correlation lengths in horizontal directions are 50.0.
EnKF input file
For general information on keywords for EnKF input, see EnKF Input
File, the Doxygen documentation of the
variables or directly in the source code
(/simul/mod_enkf.f90
).
The first general section of the EnKF input file WAVEREAL.enkf
looks
as follows
!================================================================================================!
!================================ EnKF Inputfile ==================================!
!================================================================================================!
!------------------------------------------!
! Head ! Temp ! Conc ! Perm ! Tcon ! Poro !
!------------------------------------------!
! 1 ! 2 ! 3 ! 4 ! 5 ! 6 !
!------------------------------------------!
# nrobs_int
1
# state vector activity
1 0 1 1 0 0
# observation variances
0.0d+0 0.0d+0 50.0d-6 0.0d+0 0.0d0 0.0d0
# system variances
0.0d+0 0.0d+0 0.0d+0 0.0d+0 0.0d+0 0.0d+0
# assimod
sequ
# generalseed
T 50 75
Under # nrobs_int
, we specify that only one update is
executed. Later we will look at results after 10 updates, since one
update is usually not enough to significantly change prior probability
distributions.
The # state vector activity
is specified for Head
, Conc
and
Perm
. So we update the dynamic variables hydraulic head and tracer
concentration, and we estimate the parameter permeability. These
activities have to be compatible with the computed variables specified
in the general input file.
Then, the uncertainties of the measurements (# observation variances
, in this case, we have concentration measurements) and of
the forward computation (# system variances
, here none) are
specified.
Assimilation type is sequential (# assimod
), so the updates will be
run one after the other for head, concentration and permeability. This
often has to be done for large models.
Finally a random seed is specified under # generalseed
.
The next section of the EnKF input file WAVEREAL.enkf
contains
special EnKF-methods, but in this example we run the normal EnKF, so
all specialized input is turned off:
!=========>>>>> <<<<<==========!
!=========>>>>> ENKF METHODS <<<<<==========!
!=========>>>>> <<<<<==========!
# damping factors
0.0d0 1.0d0
# normal score
F 3.0d0
### covariance localisation
F 150.0d0 150.0d0 150.0d0
# dualenkf
F
# hybridenkf
F 0.5d0
# iterativeenkf
F 1 F
# stochbc, records=0
45 7
0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0
# pilot point, records=51
F get 4
4 4 1
8 4 1
12 4 1
16 4 1
20 4 1
24 4 1
...
Next the output switches are specified:
!=========>>>>> <<<<<==========!
!=========>>>>> OUTPUT SWITCHES <<<<<==========!
!=========>>>>> <<<<<==========!
# err_cov
F
# assimstp output
F
# enkf log output
F
# analysis matrix output
F
# compute realisation output
F
# compute mean
T
# vtk output enkf
T
# vtk output stddev
T
# vtk output resid
T
# vtk output covs
F
# vtk output realisations
T 10
In particular, we output full mean-fields before and after the EnKF
update (# compute mean
and # vtk output enkf
). Additionally, there
is overall standard deviation output (# vtk output stddev
) and
residual output comparing to the true (# vtk output resid
for input
of the true values see Section True parameter
values). Finally, we output realization and
restrict their number to 10 (# vtk output realizations
) which is not
really a restriction in this case since the ensemble size is 10.
The rest of the output is left at standard values for this example.
The input section follows:
!=========>>>>> <<<<<==========!
!=========>>>>> INPUT <<<<<==========!
!=========>>>>> <<<<<==========!
# checktrue
T
# true file names
TrueWAVEREAL.plt
TrueWAVEREAL_chem.plt
# ex_unit
0
# observation file names
observations_WAVEREAL.dat
# reference distribution, records=0
Here, it is most interesting that the input files for true parameter
values (# true file names
) and observations (# observation file names
) are specified.
All other inputs in WAVEREAL.enkf
are specialized update step
manipulations documented in the code.
True parameter values
The true parameter values are specified in the files
TrueWAVEREAL.plt
and TrueWAVEREAL_chem.plt
. These Tecplot files
look as follows:
title = "samples_output/WAVEREAL_TRUE_E0_1"
variables = "i", "j", "k", "x", "y", "z", "uindex", "obsz","obst","cindex", "head", "temp", "pres", "satn", "epot", "vx", "vy", "vz", "por", "kx", "ky", "kz", "lx", "ly", "lz", "rhof", "visf", "qxc", "qyc", "qzc", "sigma", "lc", "src", "jxc", "jyc", "jzc"
zone i= 31, j= 31, k= 1, f=point
1 1 1 1.00000 1.00000 1.00000 3 0 0 0 0.11000000E+02 0.10000000E+02 0.19807057E+00 0.00000000E+00 0.000000E+00 -0.000000E+00 0.167741E-06 0.000000E+00 0.100000E+00 -0.120809E+02 -0.120809E+02 -0.120809E+02 0.284336E+01 0.284336E+01 0.284336E+01 0.999700E+03 0.130800E-02 -0.000000E+00 -0.000000E+00 0.000000E+00 0.681000E-02 0.000000E+00 0.800000E-01 -0.000000E+00 -0.000000E+00 0.000000E+00
2 1 1 3.00000 1.00000 1.00000 4 0 0 0 0.11000000E+02 0.10000000E+02 0.19807057E+00 0.00000000E+00 0.000000E+00 -0.000000E+00 0.102553E-06 0.000000E+00 0.100000E+00 -0.123369E+02 -0.123369E+02 -0.123369E+02 0.284336E+01 0.284336E+01 0.284336E+01 0.999700E+03 0.130800E-02 -0.000000E+00 -0.000000E+00 0.000000E+00 0.681000E-02 0.000000E+00 0.800000E-01 -0.000000E+00 -0.000000E+00 0.000000E+00
3 1 1 5.00000 1.00000 1.00000 5 0 0 0 0.11000000E+02 0.10000000E+02 0.19807057E+00 0.00000000E+00 0.000000E+00 -0.000000E+00 0.100291E-06 0.000000E+00 0.100000E+00 -0.123969E+02 -0.123969E+02 -0.123969E+02 0.284336E+01 0.284336E+01 0.284336E+01 0.999700E+03 0.130800E-02 -0.000000E+00 -0.000000E+00 0.000000E+00 0.681000E-02 0.000000E+00 0.800000E-01 -0.000000E+00 -0.000000E+00 0.000000E+00
4 1 1 7.00000 1.00000 1.00000 6 0 0 0 0.11000000E+02 0.10000000E+02 0.19807057E+00 0.00000000E+00 0.000000E+00 -0.000000E+00 0.689801E-07 0.000000E+00 0.100000E+00 -0.126104E+02 -0.126104E+02 -0.126104E+02 0.284336E+01 0.284336E+01 0.284336E+01 0.999700E+03 0.130800E-02 -0.000000E+00 -0.000000E+00 0.000000E+00 0.681000E-02 0.000000E+00 0.800000E-01 -0.000000E+00 -0.000000E+00 0.000000E+00
5 1 1 9.00000 1.00000 1.00000 7 0 0 0 0.11000000E+02 0.10000000E+02 0.19807057E+00 0.00000000E+00 0.000000E+00 -0.000000E+00 0.572896E-07 0.000000E+00 0.100000E+00 -0.126995E+02 -0.126995E+02 -0.126995E+02 0.284336E+01 0.284336E+01 0.284336E+01 0.999700E+03 0.130800E-02 -0.000000E+00 -0.000000E+00 0.000000E+00 0.681000E-02 0.000000E+00 0.800000E-01 -0.000000E+00 -0.000000E+00 0.000000E+00
...
They can for example be generated by a synthetic true run using the
parameter configuration that is supposed to be estimated by the EnKF
run. If the true involves concentration, then there has to be a
TrueWAVEREAL_chem.plt
including tracer output:
title = "samples_output/WAVEREAL_TRUE_E0_1"
variables = "x", "y", "z","uindex", "i", "j", "k", "obsz","obst","cindex", "head", "temp", "pres", "satn", "epot", "v_abs", "tsal", "conc0001"
zone i= 31, j= 31, k= 1, f=point
1.00000 1.00000 1.00000 3 1 1 1 0 0 0 0.11000000E+02 0.10000000E+02 0.19807057E+00 0.00000000E+00 0.00000000E+00 0.16774068E-06 0.80000000E-01 0.80000000E-01
3.00000 1.00000 1.00000 4 2 1 1 0 0 0 0.11000000E+02 0.10000000E+02 0.19807057E+00 0.00000000E+00 0.00000000E+00 0.10255282E-06 0.80000000E-01 0.80000000E-01
5.00000 1.00000 1.00000 5 3 1 1 0 0 0 0.11000000E+02 0.10000000E+02 0.19807057E+00 0.00000000E+00 0.00000000E+00 0.10029102E-06 0.80000000E-01 0.80000000E-01
7.00000 1.00000 1.00000 6 4 1 1 0 0 0 0.11000000E+02 0.10000000E+02 0.19807057E+00 0.00000000E+00 0.00000000E+00 0.68980115E-07 0.80000000E-01 0.80000000E-01
9.00000 1.00000 1.00000 7 5 1 1 0 0 0 0.11000000E+02 0.10000000E+02 0.19807057E+00 0.00000000E+00 0.00000000E+00 0.57289632E-07 0.80000000E-01 0.80000000E-01
...
Observation data
The observation file observations_WAVEREAL.dat
looks as follows (it
has ten observation time entries, but only the first will be used if
# nrobs_int
specified only one update):
i j k h T conc kz lz por
2 1.0000000000E-01 2 0 0 1 0 0 0
22 16 1 1.0527612000E+01 1.0000000000E+01 6.0005844000E-02 2.4612355000E-12 3.E+00 1.E-01
10 16 1 1.0432877000E+01 1.0000000000E+01 5.9997001000E-02 8.5719887000E-13 3.E+00 1.E-01
4 3.0000000000E-01 2 0 0 1 0 0 0
22 16 1 1.0527492000E+01 1.0000000000E+01 6.0105278000E-02 2.4612355000E-12 3.E+00 1.E-01
10 16 1 1.0432577000E+01 1.0000000000E+01 5.9996787000E-02 8.5719887000E-13 3.E+00 1.E-01
6 5.0000000000E-01 2 0 0 1 0 0 0
22 16 1 1.0527492000E+01 1.0000000000E+01 6.0471824000E-02 2.4612355000E-12 3.E+00 1.E-01
10 16 1 1.0432577000E+01 1.0000000000E+01 5.9997090000E-02 8.5719887000E-13 3.E+00 1.E-01
8 7.0000000000E-01 2 0 0 1 0 0 0
22 16 1 1.0527492000E+01 1.0000000000E+01 6.1204739000E-02 2.4612355000E-12 3.E+00 1.E-01
10 16 1 1.0432577000E+01 1.0000000000E+01 5.9998213000E-02 8.5719887000E-13 3.E+00 1.E-01
10 9.0000000000E-01 2 0 0 1 0 0 0
22 16 1 1.0527492000E+01 1.0000000000E+01 6.2271343000E-02 2.4612355000E-12 3.E+00 1.E-01
10 16 1 1.0432577000E+01 1.0000000000E+01 6.0001763000E-02 8.5719887000E-13 3.E+00 1.E-01
12 1.1000000000E+00 2 0 0 1 0 0 0
22 16 1 1.0527492000E+01 1.0000000000E+01 6.3563976000E-02 2.4612355000E-12 3.E+00 1.E-01
10 16 1 1.0432577000E+01 1.0000000000E+01 6.0010948000E-02 8.5719887000E-13 3.E+00 1.E-01
14 1.3000000000E+00 2 0 0 1 0 0 0
22 16 1 1.0527492000E+01 1.0000000000E+01 6.4965145000E-02 2.4612355000E-12 3.E+00 1.E-01
10 16 1 1.0432577000E+01 1.0000000000E+01 6.0030634000E-02 8.5719887000E-13 3.E+00 1.E-01
16 1.5000000000E+00 2 0 0 1 0 0 0
22 16 1 1.0527492000E+01 1.0000000000E+01 6.6380268000E-02 2.4612355000E-12 3.E+00 1.E-01
10 16 1 1.0432577000E+01 1.0000000000E+01 6.0066915000E-02 8.5719887000E-13 3.E+00 1.E-01
18 1.7000000000E+00 2 0 0 1 0 0 0
22 16 1 1.0527492000E+01 1.0000000000E+01 6.7745039000E-02 2.4612355000E-12 3.E+00 1.E-01
10 16 1 1.0432577000E+01 1.0000000000E+01 6.0126336000E-02 8.5719887000E-13 3.E+00 1.E-01
20 1.9000000000E+00 2 0 0 1 0 0 0
22 16 1 1.0527492000E+01 1.0000000000E+01 6.9021348000E-02 2.4612355000E-12 3.E+00 1.E-01
10 16 1 1.0432577000E+01 1.0000000000E+01 6.0215055000E-02 8.5719887000E-13 3.E+00 1.E-01
This can be generated from a synthetic true forward computation
run. In this example the first columns and the conc
column are
important. The first two columns specify the observation times (every
second time step of the full simulation in our case), the conc
column is so important, since we are using concentration measurements
as specified by 0 0 1 0 0 0
(the third place is for concentration).
Output and Results
The main output is divided up into two directories:
-
enkf_output
: This is output of the parameter fields before and after given updates (as specified inWAVEREAL.enkf
, name:assim_variables_E1_aft_0001.vtk
), additionally residuals between updated parameters (residual_E1.vtk
) and the true values and standard deviations of the updated parameter fields (stddev_E1.vtk
) can be output. -
samples_output
: InitialWAVEREAL_E0_init_101.h5
or finalWAVEREAL_E1_101.h5
completely updated realization fields (one ensemble member of the EnKF).
When we run the above example, but with 10 observations instead of one
we can look at permeability distributions collected from the final
realization fields in samples_output
:
In residual_E1.vtk
, we can check that the updated permeability field
is closer to the reference as before the update.
SCALARS rms_kz_aft float 1
LOOKUP_TABLE default:mean
0.623002E+00 0.621193E+00 0.622028E+00 0.618521E+00 0.628792E+00 0.614868E+00 0.558618E+00 0.538802E+00 0.507718E+00 0.440239E+00
The RMSE (normalized to the units of the distributions, in this case
the exponent of the permeability) went from 0.623002
to 0.440239
,
reduced by almost a third.