Note: This is a preview feature and as such may change considerably in the future. The project must have an artifacts
feature enabled.
Artifacts are additional files that can be produced or consumed by a component.
See Tutorial for step-by-step example.
In some cases it’s useful if a component not only extracts, transforms or uploads data, but also generate some other output, metadata or other runtime-discovered data. These could be for example:
These additional information can be stored in artifacts and processed by another component or 3rd party tool.
Artifacts are stored in Keboola File Storage.
There are three types of artifacts for now runs
, custom
and shared
.
The type specifies which components will have access to the artifact or which artifacts to download for the component to process.
Types are used in a configuration of a consumer component to specify which artifacts to download.
runs - artifacts from previous runs of the same configuration
custom - artifacts from previous runs of a different configuration. The configuration which produced the artifacts will be defined in the consumer configuration (configurationId, componentId, branchId)
shared - artifacts shared within an orchestration
runs
and custom
types are the same from the producer point of view. To produce a shared
artifact, it has to be written into a shared
folder. Read more in File structure section.
Artifact is a unique set of files associated with a successful job, component and configuration. A component can either produce or consume artifacts or both.
To produce an artifact, store one or more files in the following output
directories. Subdirectories are also supported.
/data/artifacts/out/current
to create an artifact of type runs
/ custom
./data/artifacts/out/shared
to create an artifact of type shared
, which can be accessed by any component within the same orchestration.After the component job is finished all files and directories inside current
and shared
folders will be compressed into an archive and uploaded to File Storage with corresponding tags as a artifact
.
To consume created artifacts you have to specify, in the configuration of a component, which artifacts (type) to download.
runs
to download artifacts produced by the same configuration and component. These will be stored in /data/artifacts/in/runs/jobs/job-%job_id%
directory.custom
to download artifacts produced by another configuration or component. These will be stored in /data/artifacts/in/custom/jobs/job-%job_id%
directory.shared
to download artifacts created within the same orchestration by any artifact producing component that has already finished. These will be stored in /data/artifacts/in/shared/jobs/job-%job_id%
directory.Each type of artifact has a separate node in configuration. All the types can be used simultaneously. Each type node has an attribute “enabled”, which enables or disables download of the corresponding artifact type.
enabled [true | false] - enable or disable download of this artifact type |
enabled [true | false] - enable or disable download of this artifact type |
enabled [true | false] - enable or disable download of this artifact type |
Full configuration example with all artifact types:
{
"parameters": {},
"artifacts": {
"runs": {
"enabled": true,
"filter": {
"date_since": "-7 days",
"limit": 5
}
},
"custom": {
"enabled": true,
"filter": {
"component_id": "keboola.python-transformation",
"config_id": "12345",
"branch_id": "default",
"date_since": "-7 days",
"limit": 5
}
},
"orchestration": {
"enabled": true
}
}
}
Job runner checks if the project has enabled artifacts
feature.
Job runner checks the configuration of the component.
If artifacts are enabled, it downloads artifacts to corresponding folders as configured (i.e. runs
, custom
, shared
) and unzips them.
Component process start and the component can:
access and process the downloaded artifacts in shared or custom directory
write artifacts to current
or shared
directory
Component finishes and job runner does:
gzip the content of runs/current
tag the gzipped file with jobId, componentId, configId, runId, branchId and other tags if needed
upload the file to File Storage
All the artifacts produced by a job shouldn’t be bigger than 1 GB.