user_workflow_vignette.Rmd

---
title: "r2ogs6 User Guide"
author: "Anna Heinrich"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{r2ogs6 User Guide}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

devtools::load_all(".")
```

```{r setup}
library(r2ogs6)
```

## Prerequisites

After loading `r2ogs6`, we can set the package options so it knows where to look for OpenGeoSys 6 and Python.

```r
# Set path for OpenGeoSys 6
options("r2ogs6.default_ogs6_bin_path" = "your_ogs6_bin_path")

# Set path for Python
options("r2ogs6.use_python" = "your_python_path")
```


## Creating your simulation object
...
To represent a simulation object, `r2ogs6` uses an `R6` class called `OGS6`. If you're new to `R6` objects, don't worry. Creating a simulation object is easy. We call the class constructor and provide it with some parameters:

* `sim_name` The name of your simulation

* `sim_id` A simulation ID (defaults to 1, this is used for chaining simulations)

* `sim_path` All relevant files for your simulation will be in here

Usually, you will only ever define `sim_name`, `sim_path`.

```{r}
ogs6_obj <- OGS6$new(sim_name = "my_simulation",
                     sim_path = "my_sim_path")
```

And that's it, we now have a simulation object. 


## Loading an OpenGeoSys 6 simulation from a project file
The quickest and easiest way to load a simulation is by using an already existing benchmark. If you take a look at the [OpenGeoSys documentation](https://www.opengeosys.org/docs/benchmarks/elliptic/elliptic-dirichlet/), you'll find plenty of benchmarks to choose from along with a link to their project file on GitLab at the top of the respective page.

For demonstration purposes, I will use a project from the `HydroMechanics` benchmarks, which can be found [here](https://gitlab.opengeosys.org/ogs/ogs/-/tree/master/Tests/Data/HydroMechanics/IdealGas/flow_free_expansion).

NOTE: `r2ogs6` has not been tested with every existing benchmark. Due to the large number of input parameters, you might encounter cases where the import fails.


## Setting up your own OpenGeoSys 6 simulation
(...)


### Check the status of your OGS6 simulation object
Since there's plenty of required and optional input parameters, you might get lost while setting up your simulation. To get a brief overview of your simulation, you can use the `OGS6` function `get_status()`. This tells you which input parameters are missing before you can run a simulation.

```{r}
# Call on the OGS6 object (note the R6 style)
ogs6_obj$get_status()
```

Since we haven't defined anything so far, you'll see a lot of red there. 

### Knowing what kind of data to add to your OGS6 simulation object

The results of `get_status()` already gave us a hint what we can add. We'll go from there and try to find out more about the possible input data. Say we want to find out more about `process` objects.

```r
# To take a look at the documentation, use ? followed by the name of a class
?prj_process
```

As a rule of thumb, classes are named with the prefix `prj_` followed by their XML tag name in the `.prj` file. The only exceptions to this rule are subclasses where this would lead to duplicate class names. The class `prj_time_loop` for example contains a subclass representing a `process` child element which is not to be confused with the `process` children of the first level `processes` node directly under the root node of the `.prj` file. Because of this, that subclass is named `prj_tl_process`.

(...)


Let's try adding something now.

### Adding input data via OGS6$add()

To add data to our simulation object, we use `OGS6$add()`. 

```{r}
    ogs6_obj$add(prj_parameter(
        name = "pressure0",
        type = "Constant",
        value = 1
    ))
```


## Running the simulation

As soon as we've added all necessary parameters, we can try starting our simulation by calling `ogs6_run_simulation(ogs6_obj, write_logfile = TRUE)`. This will run a few additional checks and then start OpenGeoSys 6. If `write_logfile` is set to `FALSE`, the output from OpenGeoSys 6 will be shown on the console. 

## Running multiple simulations

If we want to run not one but multiple simulations, we can use the simulation object we just created as a blueprint for an ensemble or chain run.

### Ensemble runs

To set up an ensemble run, we first need a base simulation object. Conveniently, we already have `ogs6_obj`, so we can go from there. We will pass this object to another one, namely an object of class `OGS6_Ensemble`. Additionally, we have to define which parameters should vary between the different simulations and provide their respective values. The syntax for this is as follows:

```{r}
    ogs6_ens <- OGS6_Ensemble$new(
        ogs6_obj = ogs6_obj,
        parameters = list(list(ogs6_obj$parameters[[1]]$value, c(2, 3, 4)))
    )
```

Internally, the `OGS6_Ensemble` object clones the `OGS6` object provided to it and for these clones, it overwrites the parameters we defined with the values we provided. The parameters we define must belong to the same `OGS6` object we passed to the ensemble object as a blueprint via the `ogs6_obj` argument. 

Note that for our example, I'm altering the first object in `ogs6_obj$parameters` because so far, one `prj_parameter` is the only thing we have added to our simulation. Don't let the `parameters` argument of the `OGS6_Ensemble` constructor confuse you though - you can define all kinds of parameters here and aren't limited to variables of `prj_parameter`. If we had defined a `prj_process` already, we could have passed the variable `ogs6_obj$processes[[1]]$reference_temperature` along with the value vector `c(20, 30, 40)` to the `OGS6_Ensemble` constructor.

We can check if the initialization of our ensemble object worked like this:

```{r}
    ogs6_ens$ensemble[[2]]$parameters[[1]]$value
```

`ensemble` returns the list of all `OGS6` objects we want to run the simulation on. Since we provided the vector `c(2, 3, 4)` when initializing the ensemble object, `ensemble` will have a length of four (since it contains the original `OGS6` object plus three almost-identical clones). So when we reference the second `OGS6` object in the list and inspect the `value` variable of its first `parameter` object, the return value is `2` because that's the value we defined for this parameter in our vector. 

Note that the class variables of `OGS6_Ensemble` objects are read-only, so be sure to define all parameters during initialization.

To start an ensemble run, we call `ogs6_ens$ogs6_run_simulation(parallel = FALSE)`. This calls `ogs6_run_simulation()` on each of the simulation objects in `ogs6_ens$ensemble`. Depending on the size of our ensemble and the available system resources, it might make sense to set the `parallel` parameter to `TRUE`. 

NOTE: Parallelization depends on the OS: A fork cluster is used on UNIX-like Systems while on Windows systems, a socket cluster is used. Parallelization hasn't been tested on Windows yet.


### Chain runs

Chaining simulations works in a similar manner to creating ensembles. The main difference is how we define the relevant parameters. Like with an ensemble, for chains we use a special class object to pass our base simulation object to, only this time, the class we use is called `OGS6_Chain`. And while we define parameters along with their values for `OGS6_Ensemble` objects, the `parameter` argument of `OGS6_Chain` only refers to the parameter definitions, not their values (which will be calculated along the chain). 

...

To start a chain run, we call `ogs6_chain$ogs6_run_simulation()`. This calls `ogs6_run_simulation()` on the base object and then reads in the information required to start the next simulation from the output files produced by OpenGeoSys 6 (based on the parameters we defined). Since chain runs can't be parallelized, this might take a while.