Your R Custom Science Application can be created in multiple ways (as described below). There are no known limitations to the architecture of your R code. We recommend that you use our library. It provides useful functions for working with our environment. Please note:
row.names = FALSEoption (otherwise KBC cannot read the file because it contains unnamed column).
To install a package, use
install.packages('packageName'). It is not necessary to specify the repository. If you wish to install a package from source,
devtools::install_github() (and friends). The R version is the same as for R transformations.
Here is our current
list of pre-installed packages.
You can load them with
library() command. If you know of another useful standard package to pre-install,
we would like to hear about it.
The KBC R extension package provides functions to:
The library is a standard R package that is available by default in the production environment.
It is available on Github, so it can be installed locally with
devtools::install_github('keboola/r-docker-application', ref = 'master').
To use the library to read the user-supplied configuration parameter ‘myParameter’:
The library contains a single RC class
DockerApplication; a parameter of the constructor is the path to the data directory.
readConfig() to actually read and parse the configuration file. The above would read the
myParameter parameter from the user-supplied configuration:
You can obtain inline help and the list of library functions by running the
In the Quick start tutorial, we have shown applications which have names of their input/output tables hard-coded. This example shows how to read the input and output mapping specified by the end-user, which is accessible in the configuration file. It demonstrates how to read and write tables and table manifests. File manifests are handled the same way. For a full authoritative list of items returned in table list and manifest contents, see the specification
Note that the
destination label in the script refers to the destination from the mappers perspective. The input mapper takes
source tables from user’s storage, and produces
destination tables that become the input of the extension. The output tables of the extension are consumed by the output mapper whose
destination are the resulting tables in Storage.
The above code is located in a sample repository, so you can use it with the runtime settings. Supply any number of input tables.
To test the code, set an arbitrary number of input/output mapping tables. Keep in mind to set the same number of inputs and outputs. The names of the CSV files are arbitrary.
In the simplest case, you can use the code from an R transformation to create a simple R script. It must be named
To see a sample R script, go to our repository.
Despite the fact that this approach is the simplest and quickest to do, it offers limited options for testing and is generally good only for
one-liners (i.e. you have an existing library which does all the work, all you need to do is execute it).
In the example below, we supply value
/data/ to the constructor as the data directory, as that will be always true in
our production environment.
This example shows how an R package can be made in order to interact with our environment, the code is available in a git repository. We strongly recommend this approach over the previous simple example.
Wrapping the application logic into an R package makes testing and portability much easier, specifically:
The application EntryPoint is
main.R in the package root folder.
This installs the package from the
/home/ directory. It includes the package defined
in the DESCRIPTION file and
doSomething() function. The package name is arbitrary, but it must match the one defined in the
The availability of the
doSomething() function is determined by the contents of the
NAMESPACE file. The
NAMESPACE file is generated
automatically by Roxygen when you Check the
package in RStudio.
With this approach, you can organize your code and name your functions as you please. In the sample repository, the
actual code is contained in the
doSomething() function in
R/myPackage.R file. The code
itself is identical to the previous example.
Test the sample code with this runtime setting:
Tests are organized in the /tests/ directory which contains:
data/which contains pregenerated sample data from sandbox.
config.Rfile which can be used to set environment for running the tests; it can be created by copying config_template.R
test_that/which contains the actual testthat tests
You can run the tests locally from RStudio,
or you can set them to run automatically using the Travis continuous integration server every time you push into your git repository. For that use the provided travis.yml file. See below for more information about continuous integration.
For a more thorough tutorial on developing R packages, see the R packages book.
This example defines a subclass of the
DockerApplication RC class from the KBC R package.
RC classes are a type of classes in R. This approach is fully comparable with the
previous package example. There are no major differences or (dis)advantages. The repository, again, has
to have the file
main.R in its root. The difference is that we create the RS class
CustomApplicationExample and call
The name of the class
CustomApplicationExample is completely arbitrary and is defined in
`R/myApp.R’. The application
code itself is formally different as all the methods are in the class. So, instead of
within the body of
Test the sample code with this runtime setting:
When using the Package or Subclass approach, you can use standard R testing methods.
We like the
testthat package. Since it is important to run tests automatically,
set them up to run every time you push a commit into your repository.
Travis offers an easy way of setting up a continuous integration with Github repositories. To setup the integration,
.travis.yml file in the root of your repository, and
then link the repository to Travis.
Travis offers R support. Only add the KBC Package, if using it, and set the data directory using
KBC_DATADIR environment variable, which will be automatically picked up by the KBC package:
The above option is easy to set up, but it has two disadvantages:
To fix both, take an advantage of the fact that we will run your application code in a Docker container. By using Docker Compose, you can set the testing environment in the exact same way as the production environment. Take a look at the sample repository described below.
To run your tests in our Docker Container, you need to create a
docker-compose.yml file in the root of your repository:
image option defines what Docker Image is used for running the tests –
quay.io/keboola/docker-custom-r:1.0.4 refers to
our image we use to run Custom Science extensions on our production servers.
1.0.4 part refers to an image tag, which changes from time to time. You should generally use the highest version.
volumes option defines that the current directory will be mapped to
/src/ directory inside the image.
command option defines the command for running the tests
It will be run inside the Docker image, so you don’t need to have shell available on your machine.
This leads us to the
tests.sh file, which
should be also created in the root of your repository:
The above simple Shell script will first try to build your package using
R CMD build and then check
it, meaning running the tests, with
R CMD check. This assumes you are using the
package approach. If using other approaches, modify these commands to run your tests.
Don’t forget that the
/src/ directory maps to the root directory of your repository (we have defined this in
To run the tests in the Docker container, have Docker installed on your machine, and execute the following command line (in the root of your repository):
docker-compose run --rm -e KBC_DATADIR=/src/tests/data/ tests
Docker-compose will process the
docker-compose.yml and execute the
tests service as defined on
its 4th line.
This service will take our Docker
docker-custom-r image and map the current directory into a
/src/ directory inside the image.
Then it will execute the shell script
/src/tests.sh inside that image. Where
/src/tests.sh refers to the
tests.sh script in the root of your repository.
This will build and check the R package. The option
-e KBC_DATADIR=/src/tests/data/ sets environment variable
KBC_DATADIR to the data directory,
so that it refers to the
tests/data/ directory in the root of your repository.
To run the tests in a Docker container automatically, automate them, again, to run on every push to your git repository.
Now, you are not limited to the CI services with R support, but you can use any CI service with Docker support.
You can also use Travis, as we will show you in the following
Most of the configuration is related to setting up Docker, the only important part is two last lines.
docker-compose build tests will build
the Docker image, which will be skipped in case you are not using your own Dockerfile.
docker-compose run --rm -e KBC_DATADIR=/src/tests/data/ tests command is the most important as it actually runs docker-compose and,
subsequently, all the tests. It is the same command you can use locally.
Also, create an
.Rbuildignore file to avoid receiving
warnings for unrecognized files in the root of your package repository:
^.*\.Rproj$ ^\.Rproj\.user$ ^main\.R$ ^\.travis\.yml$ ^docker-compose\.yml$ ^tests\.sh$ ^\.git$ ^\.gitignore$
All the above configuration is available in the sample repository.