Most operations, such as extracting data or running an application are executed in KBC as background, asynchronous jobs. When an operation is triggered, for example, you run an extractor, a job is created and pushed into a queue. The job waits in the queue until it is picked up by a worker server, which actually executes it. The job queuing and execution are fully automatic. So, if you are working with asynchronous parts of your API, you need to
Components differ in their upper limits on how long can a job run, from a couple of seconds to several hours.
When a job is created, JobId is assigned to it. When the job is put into a queue, it gets its own RunId. An executing job can spawn child jobs (sub-jobs) and become their parent-job. Usually, a parent job waits until all its child jobs have finished.
A JobId refers to the job definition, to what should be done. RunId refers to the actual job execution. That is why one JobId may, though very rarely, have multiple RunIds.
Jobs can be hierarchically organized.
In such case, a child job’s RunId contains its parent’s RunId as a prefix.
For example, assume that a job with ID 123 is executed and assigned RunId 789.
When it spawns a child job, that child job will have its JobId, for instance,
234, and its RunId will have
789. as a prefix,
789.876. Jobs may be nested without limits, but in practice they do not go beyond three levels.
A job can have different statuses:
To create a Job, use our Docker Runner API described on Apiary.io. Docker Runner has API calls to
You also need a Syrup Queue API to poll Job status.
The first API requires a component parameter; use the Component API to get a list of components. The second API is generic for all components. To work with the API, use our Syrup PHP Client. In case you want to implement things yourself, copy the part of Job Polling.
Note that there are other special cases of asynchronous operations which are in principle the same, but may differ in little details. The most common ones are:
Apart from running predefined configurations with a
run action, each component may
provide additional options to create an asynchronous background job, or it may also support synchronous actions.
The following diagram shows a typical flow of creating a job. Note that it is also possible to create a job without an existing
configuration — using the
You need to know the component Id and configuration Id to create a job. To obtain a list of all components available in the project, and their configuration, you can use the corresponding API call. See an example. A snippet of the response is below:
From there, the important part is the
id field and
configurations.id field. For instance, in the
above, there is a database extractor with the
keboola.ex-db-snowflake and a
configuration with the id
Then use the create a job API call and pass the configuration ID in request body:
See an example. When a job is created, you will obtain a response similar to this:
This means that the job was created (and
waiting in the queue) and will automatically start executing.
From the above response, the most important part is
url, which gives you the URL of the resource for
Job status polling.
You will receive a response similar to this:
From the above response, the most important part is the
status field (
processing, in this case)
at this time. To obtain the Job result, send the above API call once the job status changes
to one of the finished states or until
isFinished is true.