A manifest file contains additional information about tables and files injected to the
/data/in
folders.
It also provides a way to specify options for tables and files transferred back to Storage from /data/out
folders. Manifest files have the .manifest
suffix to the original file.
All files in /data/in
have the manifest file generated by us. For files generated by your code
in /data/out
, the manifest file is optional. Also, keep in mind that all manifests have a lower priority
than input and output mapping.
The format of the manifest file is always JSON. The manifest
file always has the .manifest
extension. This applies to files with multiple extensions as well, so the following
filenames are expected:
Data File Name | Manifest File Name |
---|---|
myfile | myfile.manifest |
myfile.csv | myfile.csv.manifest |
myfile.csv.gz | myfile.csv.gz.manifest |
/data/in/tables
manifestsAn input table manifest stores metadata about a downloaded table. For example, a table
with the ID in.c-docker-demo.data
will be downloaded into
/in/tables/in.c-docker-demo.data.csv
(unless stated otherwise in the
input mapping and a manifest file
‘/in/tables/in.c-docker-demo.data.csv.manifest’ will be created with the following
contents:
The name
node refers to the name of the component configuration.
The metadata
and column_metadata
fields contains
Metadata for the table and its columns.
/data/out/tables
manifestsAn output table manifest sets options for transferring a table to Storage. The following examples list available
manifest fields; all of them are optional. The destination
field overrides the table name generated
from the file name; it can (and commonly is) overridden by the end-user configuration. The columns
option defines
the columns of the imported table. If the columns
option is provided, then the CSV files are assumed to be headless.
If you the component is producing Sliced tables, then they are always
assumed to be headless and you have to use the columns
option.
Additionally, the following options can be specified:
The options will cause the specified rows to be deleted from the source table before the new table is imported. See an example. Using this option makes sense only with incremental loads.
The metadata
and column_metadata
fields allow you to set
Metadata for the table and its columns.
The metadata
field corresponds to the Table Metadata API call.
The column_metadata
field corresponds to the Column Metadata API call.
In both cases, the key
and value
are passed directly to the API; the provider
value is
filled by the Id of the running component (e.g., keboola.ex-db-snowflake
).
/data/in/files
manifestsAn input file manifest stores metadata about a downloaded file.
/data/out/files
manifestsAn output file manifest sets options for transferring a file to Storage. The following example lists available manifest fields; all of them are optional.
These parameters can be used (taken from Storage API File Import):
is_permanent
is false, the file will be automatically deleted after 15 days.notify
is true, the members of the project will be notified that a file has been uploaded to the project.When using AWS S3 for direct data exchange,
the manifest files will contain an additional s3
section with
credentials for downloading the actual file data.
If the file is sliced and you need to merge it into a single file, read through the guide to
working with sliced files.
In that case, the key
points to another manifest, which contains a list of sliced files.
Note: Exchanging data via AWS S3 is currently available only for input mapping.
When using Azure Blob Storage for direct data exchange,
the manifest files will contain an additional abs
section with
credentials for downloading the actual file data.
If the file is sliced and you need to merge it into a single file, read through the guide to
working with sliced files.
In that case, the name
points to another manifest, which contains a list of sliced files.
Note: Exchanging data via Azure ABS is currently available only for input mapping.