Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
title: "r2ogs6 Ensemble Guide"
#output: pdf_document
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{r2ogs6 Ensemble Guide}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
knitr::opts_chunk$set(
  eval = nzchar(Sys.getenv("r2ogs6_ensemble_guide_eval")),
  fig.width = 6,
  fig.align = "center",
  collapse = TRUE,
  comment = "#>",
  message = FALSE
)

devtools::load_all(".")
library(r2ogs6)
library(ggplot2)
library(dplyr)

Introduction

Hi there! This is a practical guide on the OGS6_Ensemble class from the r2ogs6 package. I will show you how you can use this class to set up ensemble runs for OpenGeoSys 6, extract the results and plot them.

If you want to follow this tutorial, you'll need to have r2ogs6 installed and loaded. The prerequisites for r2ogs6 are described in detail in the r2ogs6 User Guide vignette.

Additionally, you'll need the following benchmark files:

General instructions on how to download OpenGeoSys 6 benchmark files can be found here.

Theis' problem

We will consider the following parameters for our sensitivity analysis:

  • permeability

  • porosity

  • storage

Setup

First, we create a simulation object to base our ensemble on and read in the .prj file.

# Change sim_path to fit your system
ogs6_obj <- OGS6$new(sim_name = "axisym_theis",
                     sim_path = "D:/ogs6_theis/axisym_theis_sim")

# Change prj_path to fit your system
prj_path <- system.file("examples/Theis_problem/benchmark_files/",
                        "axisym_theis.prj", package = "r2ogs6")

read_in_prj(ogs6_obj, prj_path)

Single-parameter sequential run

Let's create a small ensemble where we only alter the value of storage. Say we don't want to hardcode the values, but instead examine the effects of changing storage by 1%, 10% and 50%. We can use the percentages_mode parameter of OGS6_Ensemble for this. It already defaults to TRUE, below I'm merely being explicit for demonstration purposes.

# Assign percentages
percentages <- c(-50, -10, -1, 0, 1, 10, 50)

# Define an ensemble object
ogs6_ens <- 
  OGS6_Ensemble$new(
    ogs6_obj = ogs6_obj,
    parameters = list(list(ogs6_obj$media[[1]]$properties[[4]]$value,
                           percentages)),
    percentages_mode = TRUE)

Now you can start the simulation.

ogs6_ens$run_simulation()
lapply(ogs6_ens$ensemble, ogs6_read_output_files)

Plot results for single simulation

Our simulations (hopefully) ran successfully - great! Now it'd be nice to see some results. Say we're interested in the pressure data.

# This will get a combined dataframe
storage_tbl <- 
  ogs6_ens$get_point_data(point_ids = c(0, 1, 2),
                          keys = c("pressure"))

You may leave out the point_ids parameter. It defaults to all points in the dataset - I specify it here because there's about 500 which would slow down building this vignette.

# Let's look at the first 10 rows of the dataset
head(storage_tbl, 10)

You can see there's one row per timestep for each point ID. Say we want to plot the average pressure for each timestep, that is, the mean of the pressure values where the rows have the same timestep. We'll consider only the first simulation for now.

# Get average pressure from first simulation
avg_pr_first_sim <- storage_tbl[storage_tbl$sim_id == 1,] %>% 
  group_by(timestep) %>%
  summarize(avg_pressure = mean(pressure))

# Plot pressure over time for first simulation
ggplot(data = avg_pr_first_sim) + 
  geom_point(mapping = aes(x = as.numeric(row.names(avg_pr_first_sim)),
                           y = avg_pressure)) + xlab("Timestep")

Plot results for multiple simulations

# Get average pressure for all simulations
avg_pr_df <- storage_tbl %>%
  group_by(sim_id, timestep) %>%
  summarise(avg_pressure = mean(pressure))

# Plot pressure over time for all simulations
ggplot(avg_pr_df,
       aes(x = as.numeric(as.factor(timestep)),
           y = avg_pressure)) +
  geom_point(aes(color = as.factor(sim_id))) +
  geom_line(aes(color = as.factor(sim_id))) + 
  xlab("Timestep") + 
  labs(color = "sim id") +
  facet_grid(rows = vars(sim_id))

Now we have the average pressure over time for each of the simulations (rows). Since they're pretty similar, let's put them in the same plot for a better comparison.

ggplot(avg_pr_df, aes(x = as.numeric(as.factor(timestep)),
                      y = avg_pressure,
                      group = sim_id)) +
  geom_point(aes(color = as.factor(sim_id))) +
  geom_line(aes(color = as.factor(sim_id))) + 
  labs(color = "sim id") +
  xlab("Timestep")

Multi-parameter sequential run

So far we've only considered the storage parameter. Now we want to have a look at how the other two parameters influence the simulation, so let's put them into an ensemble together. For permeability we can reuse the percentages we already have. For porosity, this doesn't work - the original value is 1 and its values can only range between 0 and 1, so we'll supply a shorter vector.

This time it's important we set the OGS6_Ensemble parameter sequential_mode to TRUE as this will change the supplied parameters sequentially which means in the end we have 18 (7 + 7 + 4) simulations which is equal to the sum of elements in the value vectors you supply (I've named them values below for clarity, naming them is optional though).

The default FALSE would give an error message because our value vectors do not have the same length and even if they had, it wouldn't do what we want - the number of simulations would equal the length of one value vector (thus requiring them to be of the same length). Generally, set sequential_mode to TRUE if you want to examine the influence of parameters on a simulation independently. If you want to examine how the parameters influence each other as in wanting to test parameter combinations, the default mode is the way to go.

# Change sim_path to fit your system
ogs6_obj$sim_path <- "D:/ogs6_theis/axisym_theis_sim_big"

ogs6_ens_big <-
  OGS6_Ensemble$new(
    ogs6_obj = ogs6_obj,
    parameters = list(per = list(ogs6_obj$media[[1]]$properties[[1]]$value, 
                                 values = percentages),
                      por = list(ogs6_obj$media[[1]]$properties[[3]]$value, 
                                 c(-50, -10, -1, 0)),
                      sto = list(ogs6_obj$media[[1]]$properties[[4]]$value, 
                                 values = percentages)),
    sequential_mode = TRUE)

Now you can start the simulation.

ogs6_ens_big$run_simulation()
lapply(ogs6_ens_big$ensemble, ogs6_read_output_files)

This will take a short time. As soon as the simulations are done, we can extract the point data much like we did before. This time we want to plot the point x coodinates on the x axis so we're leaving out point_ids to get all points. Also we just want the data from the last timestep.

# Get combined dataframe
per_por_sto_df <- 
  ogs6_ens_big$get_point_data(
    keys = c("pressure"),
    start_at_timestep = ogs6_ens_big$ensemble[[1]]$pvds[[1]]$last_timestep)

Plotting time! Since we set sequential_mode to TRUE, the dataframe we just created contains a name column which allows us to group by parameters. Because we've also set percentages_mode to TRUE, it also has a column perc which allows us to group by percentages. Now we can simply use a facet grid to plot.

# Make plot
ggplot(per_por_sto_df,
       aes(x = x,
           y = pressure,
           group = perc)) +
  geom_point(aes(color = as.factor(perc))) +
  xlab("x point coordinate") +
  labs(color = "%") +
  facet_grid(cols = vars(name))

Ta-Daa! We can see permeability has the most influence on the pressure. Though they may seem suspicious, porosity and storage are being plotted correctly - the points are just being placed right on top of each other. Since porosity can't go over the value 1 which was the original value, our value vector only went from -50% to 0% which is why the line colors of porosity and storage differ.

Theis solution for well pumping

We will consider the following parameters for our sensitivity analysis:

  • permeability

  • porosity

  • slope

Setup

First, we create a simulation object to base our ensemble on and read in the .prj file. This time we want to specify that an output file only gets written at the last timestep.

# Change sim_path to fit your system
ogs6_obj <- OGS6$new(sim_name = "theis",
                     sim_path = "D:/ogs6_theis/theis_sim")

# Change prj_path to fit your system
prj_path <- system.file("examples/Theis_well_pumping/benchmark_files/",
                        "theis.prj", package = "r2ogs6")

read_in_prj(ogs6_obj, prj_path)

# Increase each_steps
ogs6_obj$time_loop$output$timesteps$pair$each_steps <- 200

Multi-parameter sequential run

# Assign percentages
percentages <- c(-50, -10, -1, 0, 1, 10, 50)

ogs6_ens_theis_2 <-
  OGS6_Ensemble$new(
    ogs6_obj = ogs6_obj,
    parameters =
      list(
        per = list(ogs6_obj$parameters[[3]]$values,
                   values = percentages),
        por = list(ogs6_obj$parameters[[2]]$value,
                   values = percentages),
        slo = list(
          ogs6_obj$media[[1]]$phases[[1]]$properties[[1]]$independent_variable[[2]]$slope,
          values = percentages)
      ),
    sequential_mode = TRUE
  )

Now you can start the simulation.

ogs6_ens_theis_2$run_simulation()
lapply(ogs6_ens_theis_2$ensemble, ogs6_read_output_files)

When the simulations have run, we can extract and plot the results like before. To avoid cluttering the plot, we only extract the pressure values for a single line. For this, we get the IDs of all points on the x axis.

# Extract point ids
get_point_ids_x <- function(points){
  x_axis_ids <- numeric()
  
  for(i in seq_len(dim(points)[[1]])) {
    if (points[i, ][[2]] == 0 && points[i, ][[3]] == 0) {
      x_axis_ids <- c(x_axis_ids, (i - 1))
    }
  }
  
  return(x_axis_ids)
}

point_ids_x <- get_point_ids_x(
  ogs6_ens_theis_2$ensemble[[1]]$pvds[[1]]$OGS6_vtus[[1]]$points)

# Get combined dataframe
per_por_slo_df <- 
  ogs6_ens_theis_2$get_point_data(
    point_ids = point_ids_x,
    keys = c("pressure"),
    start_at_timestep = ogs6_ens_theis_2$ensemble[[1]]$pvds[[1]]$last_timestep)
# Make plot
ggplot(per_por_slo_df,
       aes(x = x,
           y = pressure,
           group = perc)) +
  geom_point(aes(color = as.factor(perc))) +
  xlab("x point coordinate") +
  labs(color = "%") +
  facet_grid(cols = vars(name))

Let's take a closer look at permeability.

per_df <- subset(per_por_slo_df, name == "per")

# Make plot
ggplot(per_df,
       aes(x = x,
           y = pressure)) +
  geom_point(aes(color = as.factor(perc))) +
  xlab("x point coordinate") +
  labs(color = "%")

Maybe we want to try and use a logarithmic approach for slope. This won't work with the built-in functionality of OGS6_Ensemble so we'll set up our Ensemble a little differently.

# Calculate log value
log_val <- log(as.numeric(
  ogs6_obj$media[[1]]$phases[[1]]$properties[[1]]$independent_variable[[2]]$slope),
  base = 10)

# Apply changes to log value
log_vals <- vapply(percentages, function(x){
    log_val + (log_val * (x / 100))
}, FUN.VALUE = numeric(1))

# Transfer back to non-logarithmic scale
back_transf_vals <- 10^log_vals

# Change sim_path to fit your system
ogs6_obj$sim_path <- "D:/ogs6_theis/theis_sim_log_slope"

# Set up new ensemble
ogs6_ens_slo <-
    OGS6_Ensemble$new(
        ogs6_obj = ogs6_obj,
        parameters =
            list(
                slo = list(
                    ogs6_obj$media[[1]]$phases[[1]]$properties[[1]]$independent_variable[[2]]$slope,
                    values = back_transf_vals)
            ),
        percentages_mode = FALSE,
        sequential_mode = TRUE
    )

As before, we can run the simulation right away.

ogs6_ens_slo$run_simulation()
lapply(ogs6_ens_slo$ensemble, ogs6_read_output_files)

Let's check if we can observe any influence of slope on pressure now.

# Filter point ids
point_ids_x <- get_point_ids_x(
  ogs6_ens_slo$ensemble[[1]]$pvds[[1]]$OGS6_vtus[[1]]$points)

# Get combined dataframe
slo_df <- 
    ogs6_ens_slo$get_point_data(
        point_ids = point_ids_x,
        keys = c("pressure"),
        start_at_timestep = ogs6_ens_slo$ensemble[[1]]$pvds[[1]]$last_timestep)

# Supply percentages manually since we couldn't use `percentages_mode`
percs <- vapply(slo_df$sim_id,
                function(x){percentages[[x]]},
                FUN.VALUE = numeric(1))

ggplot(slo_df,
       aes(x = x,
           y = pressure)) +
    geom_point(aes(color = as.factor(percs))) +
    xlab("x point coordinate") +
    labs(color = "%")

Summary

The OGS6_Ensemble class is a useful tool to set up ensemble runs for sensitivity analyses. In this vignette, we learned how to create OGS6_Ensemble objects. We looked at how the parameters sequential_mode and percentages_mode influence how our ensemble object is initialised. We started simulations via OGS6_Ensemble$run_simulation() and extracted information from the output files to plot them.