Processors are additional components which may be used before or after running an arbitrary component
(extractor, writer, …). When Docker Runner runs a docker image, a processor
may be used to pre-process the inputs (files or tables) supplied to that image, or it may be used to post-process
the image outputs. For example, if an extractor extracts CSV data in a non-UTF8 encoding, you can use the
iconv processor as a post-processor to
convert the CSV to UTF-8 as expected by Storage.
Processors are technically supported in any configuration. However, the option may not always be available in the UI. To manually configure processors, you have to use the Component Configuration API. By running the Get Configuration Detail request for a specific component ID and configuration ID, you obtain the actual configuration contents, for example:
From this, the actual configuration is the contents of the
configuration node. Therefore:
Processors are configured in the
processors section in the
before array or the
after array (rarely both).
The above configuration defines that a
keboola.processor-headers processor (the headers processor fills missing
columns in a CSV file) will run after this particular configuration of an FTP extractor is finished,
but before its results are loaded into Storage. After the processor is finished, its outputs are loaded
into Storage as if they were the outputs of the extractor itself.
To save the configuration, you need to use the Update Configuration API call.
It is also advisable to minify the JSON to avoid whitespace issues.
Also note that if the configuration contains literal
+, it has to be urlencoded as
You can obtain a list of available processors using the
Developer Portal UI or using the List apps public API
of the Developer portal. By sending a
GET request to
https://apps-api.keboola.com/apps, you’ll obtain a list of all
public KBC components. Processors are components with the type
processor, for example:
The important parts are
id, which is required for configuration, and
documentationUrl, which describes
additional parameters of the processor.
A processor may allow (or require) parameters. These are entered in the
The below configuration sets the value for two parameters —
The names and allowed values of parameters are fully up to the processor interpretation and validation.
Implementing a processor is in principle the same as implementing any other docker extension. However, processors are designed to be Single Responsibility components. This means, for example, that processors should require no or very little configuration, should not communicate over a network and should be fast. To maintain the implementation of processors as simple as possible, simple scalar parameters can be injected into the environment variables. For instance, the parameters:
will be available in the processor as the environment variables
KBC_PARAMETER_ENCLOSURE. This simplifies the implementation in that it is not necessary to process the
configuration file. This parameter
injection works only if the values of the parameters are scalar. If you need non-scalar values, you have to pass them through the config file (and disable
injectEnvironment component setting).
The process of processor registration is the same as the registration of any other component. However, many of the fields do not apply. The following fields are important: