We advise you to follow the guidelines for the Python transformation.
The build-in CSV functions for Python work well except when the data in the CSV file contain a null character. This is
usually fixed by
lazy_lines = (line.replace('\0', '') for line in in_file). The expression
is a generator which makes sure that
null characters are properly handled.
It is also important to use
encoding='utf-8' when reading and writing files.
Note that we open both the input and output files simultaneously; as soon as a row is processed, it is immediately written to the output file. This approach keeps only a single row of data in the memory and is generally very efficient. It is recommended to implement the processing in this way because data files coming from KBC can be quite large (i.e., dozens of gigabytes).
The KBC Python component package provides functions to
Additionally, it also defines KBC’s CSV dialect
to shorten up the CSV manipulation code.
The library is a standard Python package that is available by default in the production environment.
It is ready for use on GitHub, so it can be installed
pip3 install https://github.com/keboola/python-docker-application/zipball/master.
A generated documentation
is available for the package, and an actual working example can be found in our
Also note that the library does no special magic, it is just a mean to simplify things a bit for you.
To read the user-supplied configuration parameter ‘myParameter’, use the following code:
The library contains a single class
Config; the optional parameter of the constructor is the path to the data directory.
If not provided,
KBC_DATADIR environment variable will be used.
The above would read the
myParameter parameter from the user-supplied configuration:
The following piece of code shows how to read parameters:
Note that we have also simplified reading and writing of the CSV files using the
dialect='kbc' option. The dialect is
registered automatically when the
Config class is initialized.
In the tutorial and the above examples, we show applications which have names of their input/output tables hard-coded. The following example shows how to read an input and output mapping specified by the end user, which is accessible in the configuration file. It demonstrates how to read and write tables and table manifests. File manifests are handled the same way. For a full authoritative list of items returned in table list and manifest contents, see the specification.
Note that the
destination label in the script refers to the destination from the
The input mapper takes
source tables from the user’s storage and produces
destination tables that become
the input of your component. The output tables of your component are consumed by the output mapper
destination are the resulting tables in Storage.
In Python components, the output is buffered, but the buffering may be switched off. The easiest solution is to run your script with the
-u option: you would use
CMD python -u ./main.py in your
See a dedicated article if you want to
implement a GELF logger.
The following piece of code is a good entry point:
In this case, we consider everything derived from
ValueError to be an error which should be shown to the end user.
Every other error will lead to a generic message, and only developers will see the details.
If you maintain that any user error is a
ValueError, then whatever happens in the
my_component.run will follow
the general error handling rules.
You can, of course, modify this logic to your liking.