Processors are additional components which may be used before or after running an arbitrary component
(extractor, writer, …). When Docker Runner runs a docker image, a processor
may be used to pre-process the inputs (files or tables) supplied to that image, or it may be used to post-process
the image outputs. For example, if an extractor extracts CSV data in a non-UTF8 encoding, you can use the
iconv processor as a post-processor to
convert the CSV to UTF-8 as expected by Storage.
Processors are technically supported in any configuration. However, the option may not always be available in
the UI. To manually configure processors, you have to use the Component Configuration API. By running the
Get Configuration Detail
request for a specific component ID and configuration ID, you obtain the actual configuration contents, for example:
From this, the actual configuration is the contents of the configuration node. Therefore:
Processors are configured in the processors section in the before array or the after array (rarely both).
The above configuration defines that a keboola.processor-headers processor (the headers processor fills missing
columns in a CSV file) will run after this particular configuration of an FTP extractor is finished,
but before its results are loaded into Storage. After the processor is finished, its outputs are loaded
into Storage as if they were the outputs of the extractor itself.
To obtain a list of available processors, use the List apps public API
of the Developer portal. By sending a GET request to https://apps-api.keboola.com/apps, you’ll obtain a list of all
public KBC components. Processors are components with the type processor, for example:
The important parts are id, which is required for configuration, and documentationUrl, which describes
additional parameters of the processor.
A processor may allow (or require) parameters. These are entered in the parameters section.
The below configuration sets the value for two parameters — delimiter and enclosure:
The names and allowed values of parameters are fully up to the processor interpretation and validation.
Implementing a processor is in principle the same as implementing any other
docker extension. However, processors are designed to be
Single Responsibility components. This
means, for example, that processors should require no or very little configuration, should not communicate
over a network and should be fast. To maintain the implementation of processors as simple as possible,
simple scalar parameters can be injected into the environment variables. For instance, the parameters:
will be available in the processor as the environment variables KBC_PARAMETER_DELIMITER and
KBC_PARAMETER_ENCLOSURE. This simplifies the implementation in that it is not necessary to process the
configuration file. This parameter
injection works only if the values of the parameters are scalar. If you need non-scalar values, you have to pass them through the config file (and disable injectEnvironment component setting).