Generic Extractor is normally run from within the Keboola user interface. It can be found in the Extractors section and all you need to do is provide its configuration JSON. No other settings are necessary.
Because creating the configuration JSON can be a non-trivial task, there are some things which can help you in developing the configuration.
Debug mode can be turned on by setting "debug": true
in the config
section of the configuration, e.g.:
In debug mode, the extractor displays all API requests it sends, helping you understand what is really happening, why something is skipped, etc.
Warning: If the API sends sensitive data (e.g. authorization token) in the URL, these may become visible in the events. Also, debug mode considerably slows the extraction. Therefore it should never be turned on in production configurations.
If you are working on a complicated configuration, or developing a new component based on Generic Extractor, running every configuration from the Keboola UI may be slow and tedious. You may run Generic Extractor locally, provided that you have access to Docker. The following is not necessary to run or configure Generic Extractor in Keboola.
Create an empty directory somewhere and in it create a config.json
file with a
configuration you want to execute. For example:
Then run Generic Extractor in the current directory by executing the following command on *nix systems:
docker run -v ($pwd):/data quay.io/keboola/generic-extractor:latest
or on Windows:
docker run -v %cd%:/data quay.io/keboola/generic-extractor:latest
You should see:
DEBUG: Using NO Auth [] []
DEBUG: Using automatic conversion of single values to arrays where required. [] []
DEBUG: GET /orgs/keboola/members HTTP/1.1 Host: api.github.com User-Agent: Guzzle/5.3.1 curl/7.38.0 PHP/7.0.17 [] []
DEBUG: Analyzing members {"rowsAnalyzed":[],"rowsToAnalyze":7} []
DEBUG: Processing results for __kbc_default. [] []
INFO: Extractor finished successfully. [] []
along with the output tables created in /out/tables
sub-directory of the current directory.
It is recommended to remove the contents of the out/tables
directory before running the extractor again.
Important: Generic Extractor itself is not able to decrypt encrypted values. That means that when you
supply the configuration directly in the config.json
file, you must always provide decrypted values — e.g.:
When you store such configuration in the Keboola UI, it will automatically be encrypted:
The above configuration then cannot be run locally. Read more about encryption.
To build the container from source:
git clone https://github.com/keboola/generic-extractor.git
.cd generic-extractor
.docker compose build
.docker compose run --rm extractor composer install
.mkdir data
.To run the built container:
config.json
in the data folder.docker compose run --rm extractor
.out/tables
sub-directory of the data folder.Before running the extractor again, it is recommended to clear the out
directory by
running docker compose run --rm extractor rm -rf data/out
.
All examples referenced in this documentation are actually runnable against the proper API. Because
it is difficult to find the specific API for the case (and gain access to it), you can test
these configurations against a mock server.
Each example contains a set of requests (*.request
file) and responses (*.response
) and
optionally their headers (*.requestHeaders
and *.responseHeaders
).
To run the examples:
git clone https://github.com/keboola/generic-extractor.git
.cd generic-extractor/doc
.docker compose run -e "KBC_EXAMPLE_NAME=001-simple-job" extractor
.examples/001-simple-job/out/tables
../run-samples.sh
.If you want to create your own example, follow the instructions in the mock server repository.