Configuration files are one of the possible channels for exchanging data between components and Keboola Connection (KBC).
To create a sample configuration file (together with the data directory), use the Debug API call via the Docker Runner API. You will get a zip archive containing all the resources you need in your component.
All configuration files are always stored in
Each configuration file has the following root nodes:
storage: Contains both the input and output mapping for both files and tables. This section is important if your component uses a dynamic input/output mapping. Simple components can be created with a static input/output mapping. They do not use this configuration section at all (see Tutorial).
parameters: Contains arbitrary parameters passed from the UI to the component. This section can be used in any way you wish. Your component should validate the contents of this section. For passing sensitive data, use encryption. This section is not available in Transformations.
image_parameters: Configured in the component settings. Contains arbitrary parameters passed to the component. They cannot be modified by the end-user. The typical use for this section are global component parameters (such as token, URL, version of your API).
authorization: Contains Oauth2 authorization contents.
action: Name of the action to execute; defaults to
run. All actions except
runhave a strict execution time limit of 30 seconds. See actions for more details.
Your application should implement validation of the
parameters section, which is passed without modification from the UI.
Your application might also implement validation of the
storage section, if you have some specific requirements on the
input mapping or output mapping setting (e.g. certain number of tables, certain names). If you chose to do any validation
parameters section, it must always be forward compatible – i.e. benevolent. While we maintain backward compatibility
very carefully, it is possible for new keys to appear in the configuration structure as we introduce new features.
The state file is used to store the component state for the next run. It provides a two-way communication between
KBC configuration state storage and the component. The state file only works if the API call
references a stored configuration (
config is used, not
The location of the state file is:
/data/in/state.jsonloaded from a configuration state storage
/data/out/state.jsonsaved to a configuration state storage
The component reads the input state file and writes any content to the output state file (valid JSON) that will be available to the next API call. A missing or an empty file will remove the state value. A state object is saved to configuration storage only when actually running the app (not in sandbox API calls. The state must be a valid JSON file.
Because the state is stored as part of a Component configuration, the value of the state object is somewhat limited (should not generally exceed 1MB). It should not be used to store large amounts of data.
Also, the end-user cannot easily access the data through the UI. The data can be, however, modified outside of the component itself using the Component configuration API calls.
Important: The state file is not thread-safe. If multiple instances of the same configuration are run simultaneously in the same project, the one writing data later wins. Use the state file more as an HTTP Cookie than as a Database. A typical use for the state file would be saving the last record loaded from some API to enable incremental loads.
Unlike the state file, the usage file is one way only and has a pre-defined structure. The usage file is used to pass information from the component to Keboola Connection. Metrics stored are used to determine how much resources the job consumed and translate the usage to KBC credits; this is very useful when you need your customers to pay using your component or service.
The usage file is located at
/data/out/usage.json. It should contain an array of objects
keeping information about the consumed resources. The objects have to contain only two keys,
value, as in the example bellow:
This structure is processed and stored within a job, so it can be analyzed, processed and aggregated later.
To keep track of the consumed resources in the case of an component failure, it is recommended to write the usage file regularly during the component run, not only at the end.
Note: As the structure of the state file is pre-defined, the content of the usage file is strictly validated and a wrong format will cause an component failure.
To create an example configuration, use the Debug API call. You will get a
stage_0.zip archive in your Storage — File uploads, which will contain the
You can also use these configuration structure to create an API request for
actually running a component.
If you want to manually pass configuration options in the API request, be sure to wrap it around in the
A sample configuration file might look like this:
Tables from the input mapping are mounted to
Input mapping parameters are similar to the Storage API export table options .
destination is not set, the CSV file will have the same name as the table (without adding
The tables element in a configuration of the input mapping is an array and supports the following attributes:
days(internally converted to
The output mapping parameters are similar
to the Transformation API output mapping .
destination is the only required parameter. If
source is not set, the CSV file is expected to have the same name
The tables element in a configuration of the output mapping is an array and supports the following attributes:
In an API request, this would be passed as:
Download 2 days of data from the
in.c-storage.StoredData table to
Upload CSV file
/data/out/tables/data.csv that does not have headers on the first line to table
with a compound primary key set on columns
Delete data from the
destination table before uploading the CSV
file (only makes sense with
Another way of downloading files from file uploads is to use an Elasticsearch query or filtering with tags. Note that the results of a file mapping are limited to 10 files (to prevent accidental downloads). If you need more files, use multiple file mappings.
All files matching the search will be downloaded to the
The name of each file has the
fileId_fileName format. Each file will also contain a
manifest with all information about the file.
This will download with files with matching
.zip and having the
docker-demo tag. Depending on the contents of your
File uploads in Storage, this may produce something like:
/data/in/files/75807542_fooBar.zip /data/in/files/75807542_fooBar.zip.manifest /data/in/files/75807657_fooBarBaz.zip /data/in/files/75807657_fooBarBaz.zip.manifest
filter_by_run_id option to select only files which are related to the job
currently being executed. If
filter_by_run_id is specified, we will download only files which
satisfy the filter (either
query) and were uploaded by a parent job (a job with same
or parent runId). This allows you to further limit downloaded files only to those related to a
current chain of jobs.
This will download only files with the
fooBar tag that were produced by a parent job to
the currently running Docker.
Define additional properties for uploaded files in the output mapping configuration.
If that file is not present in the
/data/out/files folder, an error will be thrown.
Docker containers may be used to process unknown files incrementally. This means that when a container is run,
it will download any files not yet downloaded, and process them. To achieve this behavior, it is necessary
to select only the files which have not been processed yet and tag the processed files.
To achieve the former, use a proper
The latter is achieved using the
processed_tags setting. The
processed_tags setting is an array of tags
which will be added to the input files once they are downloaded. A sample contents of
The above request will download every file with the
toprocess tag except for the files having the
downloaded tag. It will mark each such file with the
downloaded tag; therefore the query will exclude them on the next run.
This allows you to set up an incremental file processing pipeline.