Keboola Overview

Keboola is an open system of many components orchestrated together through (mostly REST) APIs. Although quite complex, it is modular and therefore you rarely need to work with more than a few components.

Note: Initially, the Keboola platform was referred to as Keboola Connection (KBC). While it is now simply known as Keboola, references to “Connection” or the abbreviation “KBC” might still appear in various places.

Keboola Architecture

The following chart shows how Keboola is structured. All Keboola parts are briefly described here.

Overview of Keboola Components

Working with Keboola

Everything you can do in the Keboola UI can be done programatically using the API of the corresponding component. All of our components have API documentation on Apiary and most of them have a public Github repository. Our Docker components are built either on DockerHub, Quay or privately on AWS ECR.

This means that there are virtually endless possibilities of what can be done with Keboola programmatically.

Important Components

There are some components which are probably more important than others:

  • Storage component which is used to store all data in your Keboola projects (data in tables, file uploads, configurations and logs)
  • Docker Runner component which is used internally to run almost all components; therefore all extractors, writers and applications share its features
  • Transformations component which encapsulates all types of transformations (SQL with various backends, R, Python)
  • Orchestrator component which takes care of grouping different tasks together and running them regularly at scheduled times

Component Common Features

All components share some common behaviour such as Component Configuration Running Jobs, which allows each component to be run in Orchestrations. This means that once worked your way through one component, you have seen them all. Most of our components are open source. If you are interested in their code, have a look at our repositories. Apart from that common features, some components define additional synchronous actions. This (and many other information) can be retrieved using the Developer Portal API (specifically the Get app detail call which lists all components available in Keboola.

Running Jobs

What each component does is defined purely by that component, and so is the content of the configuration. Each component has a /run API call that accepts either a reference to component configuration (config field) or full component configuration (configData field) in JSON body, and queues an asynchronous job.

For more details, see full API description.

Components Configuration

All components store their configuration in Storage. Management of the configurations is done through Storage Components Configurations API. Stored configurations can be referenced in /run API calls.

Configuration can be defined with a JSON schema stored within the Component detail. Docker Components without their own schemas can use a generic Docker Component schema.

Specific Components

Apart from the above common API, some components offer other API calls: