Component Registration

As described in the architecture overview, Keboola Connection (KBC) consists of many different components. Only those components that are registered in our Component List are generally available in KBC. The list is provided by our Storage Component API in the dedicated Components section. The list of Components is managed using the Keboola Developer portal.

While a Custom Science extension requires registration only when offered to all KBC users, registering a Docker extension is mandatory at all times (although the application may still be hidden).

That being said, any KBC user can use any registered component, unless

  • the KBC user (or their token) has a limited access to the component.
  • the component itself limits where it can run (in what projects and for which users).

Obtaining Account

To register your application, you need to have an account in the Keboola Developer Portal, which manages the list of components available in KBC.

The portal uses different credentials than KBC. Creating an account is free and quick; it requires a working email address (to which a confirmation email will be sent) and a mobile phone for a mandatory two-factor authorization.

When you log in to the developer portal, you have to join a vendor — an organization of developers. Every KBC application has to have a vendor assigned. If you join an existing vendor, an administrator of that vendor has to approve your request. If you do not work for a company, create a vendor with your name (even a single component has to be assigned to a vendor).

In order to create a new vendor, a Keboola Administrator has to approve your request, and you will receive a development project in KBC. Also, you need to provide us with a channel for receiving internal errors from your applications. Basically anything supported by Papertrail notifications is available, though e-mail or a Slack channel is most commonly used.

Screenshot -- Join a vendor

When you are confirmed as a member of a vendor, you may proceed to creating your own applications.

Creating Application

To add an application, use the Create App button, and fill in the application name and ID:

Screenshot -- Create application

Important: Do not use the words ‘extractor’, ‘writer’ or ‘application’ in the application name.

When creating an application, you will get a Component ID (in the form vendor.app-id, for instance, ujovlado.ex-wuzzzup). Once you have the Component ID, you can create configurations of the application in KBC. You can also review the application in KBC by visiting the following URL:

https://connection.keboola.com/admin/projects/{PROJECT_ID}/extractors/{COMPONENT_ID}

Note that the configuration will not be runnable until you configure the Repository section of the application. When you register an application, it will have assigned a memory limit of 64MB and run timeout of 1 hour. If you need to change those limits, please contact our support.

Important: Changes made in the Developer Portal take up to 5 minutes to propagate to all Keboola Connection instances in all regions.

Application Repository

The Application Repository is a crucial part of the application registration, because it actually defines what Docker image will be used when running the application. We offer free hosting of your docker images in the Amazon Container Registry (AWS ECR) under our own account. All repositories in AWS ECR are private. When registering your component, you will receive credentials for deployment to the repository, and you can either push the images manually or use an automated script to do it.

We also support the DockerHub and Quay.io registries, both public and private. However, we recommend that you use AWS ECR unless you require DockerHub or Quay for some reason (e.g., you want the image to be public). The main benefit of our AWS ECR is its reliability, as Quay.io and DockerHub are more prone to outages and are beyond our control.

Generic Extractor

For registering a component based on the Generic Extractor, use the following repository:

147946154733.dkr.ecr.us-east-1.amazonaws.com/developer-portal-v2/ex-generic-v2

For a list of available tags, see the Generic Extractor Github repository or Generic Extractor Quay Repository, both of which contain the same tags as the above AWS ECR repository. It is also possible to use the latest tag, which points to the highest available tag. However, we recommend that you register your component with a specific tag and update it manually to avoid problems with breaking changes in future Generic Extractor releases. For more details on registering components based on Generic Extractor, see the dedicated page.

Custom Science

When registering Custom Science applications, one of our images should be used. The registration of Custom Science applications is not yet supported in the Developer Portal, so please contact our support. If you are registering a Custom Science extension and want to use a private git repository, provide us with encrypted credentials to the git repository.

UI Options

Each extension needs to specify how its user interface (UI) will look. Without any configuration, the component cannot be configured via the UI (it can still be configured using the API though). The most basic configuration is genericDockerUI. The generic UI will always show a text field for entering the component configuration in JSON format. Other parts of the UI are turned on using other flags (for example, genericDockerUI-tableInput, genericDockerUI-tableOutput). All of the flags may be combined freely.

genericDockerUI

This provides a basic text area for setting extension parameters as a JSON; the text area has JSON validation and syntax highlighting.

Generic configuration screenshot

Defining a configuration schema will replace the JSON text area with a form.

genericDockerUI-tableInput

This flag provides a UI for setting the table input mapping. With this UI, you can set:

  • Source — the name of the table in Storage
  • Destination file name — the name of the .csv file passed to the application
  • Columns — select only some columns of the source table
  • Days — load only rows modified in the specified number of days; useful for incremental loads; set to 0 to load all data
  • Data filter — a simple filter for selecting specified rows only

Table input screenshot

Table input detail screenshot

Table input result screenshot

genericDockerUI-tableOutput

This flag provides a UI for setting the table output mapping. This UI part should not be used if the component is using the default bucket setting.

With this UI, you can set:

  • Source — the name of the .csv file retrieved from the application
  • Destination — the name of the table in Storage, the destination bucket should exist already
  • Incremental — if checked, the loaded data will be appended to the contents of the destination table
  • Primary key — set the primary key for your destination table — multiple columns are allowed
  • Delete rows — delete some rows from the destination table using a simple filter

Table output screenshot

Table output detail screenshot

Table output result screenshot

genericDockerUI-processors

This flag provides a UI for the Processor configuration. It provides a basic text area for setting the processors and their parameters as a JSON; the text area has JSON validation and syntax highlighting.

Processors screenshot

genericDockerUI-fileInput

This flag provides a UI for setting the file input mapping. With this UI, you can set:

File input screenshot

File input detail screenshot

File input result screenshot

genericDockerUI-fileOutput

This flag provides a UI for setting the file output mapping. With this UI, you can set:

  • Source — the name of the file produced by the application
  • File tags — the file tags assigned to the produced file
  • Is public — the file is accessible to anyone knowing its URL
  • Is permanent — the file will not be deleted after 180 days

File output screenshot

File output detail screenshot

File output result screenshot

genericDockerUI-authorization

This flag provides a UI for setting OAuth2 Authorization. However, to actually activate OAuth for your component, you have to contact our support.

Authorization screenshot

Authorization detail screenshot

genericTemplatesUI

This flag is used to provide UI for components based on the Generic Extractor. It allows the end-user to select a Generic Extractor template.

genericDockerUI-runtime

This flag provides a UI for setting parameters for Custom Science. We recommend that you contact our support when registering a Custom Science application.

Runtime configuration screenshot

Runtime configuration screenshot

Publishing the Extension

When you register an extension (a Docker extension, Custom Science extension, and Generic Extractor), it is not published. A non-published component can be used without limitations, but it is not offered in the KBC UI. It can only be used by directly visiting a link with the specific component ID or via the API. Unpublished components are also not part of the Public Component list. An existing configuration of a non-public component is accessible the same way as a configuration of any other component.

Before your application can be published, it must be approved by Keboola. Request the approval from the application list in the Keboola Developer portal. A member of our staff will review your application and either publish it or contact you with the required changes.

Approval screenshot

Application Review

The goal of the application review is to maintain reasonable end-user experience and application reliability. Before applying for application registration, make sure that the same application does not exist already. If there is a similar one (e.g., an extractor for the same service), clearly state the differences in the new application’s description. During our application review, the best practices in the next sections are followed.

Application name and description

  • Names should not contain words like extractor, application, and writer.
    OK: Cloudera Impala
    WRONG: Cloudera Extractor
  • The short description describes the service (helping the user find it) rather than the component.
    OK: Native analytic database for Apache Hadoop
    WRONG: This extractor extracts data from Cloudera Impala
  • The long description provides additional information about the extracted/written data: What will the end-user get? What must the end-user provide? Configuration instructions should not be included, because the long description is displayed before the end-user starts configuring the component. However, if there are any special requirements (external approval, specific account setting), they should be stated.
    OK: This component allows you to extract currency exchange rates as published by the European Central Bank (ECB). The exchange rates are available from a base currency (USD, EUR) to 30 destination currencies (AUD, BGN, BRL, CAD, CNY, CZK, EUR, GBP, HKD, HRK, HUF, CHF, IDR, ILS, INR, JPY, KRW, MXN, MYR, NOK, NZD, PHP, PLN, RON, RUB, SEK, SGD, THB, TRY, ZAR). The rates are available for all working days from 4 January 1999 up to present.
  • Application Icons must be of representative and reasonable quality. Make sure the icon license allows you to use it.
  • Applications must correctly state the data flow — UI flags appInfo.dataOut (typically writers), appInfo.dataIn (typically extractors).
  • Licensing information must be valid, and the vendor description must be current.

Application Configuration

  • Use only the necessary flags (i.e., if there are no output files, do not use genericDockerUI-fileOutput).
  • For extractors, always use the default bucket — do not use the genericDockerUI-tableOutput flag.
  • Use encryption to store sensitive values. No plain-text passwords!
  • Use a Configuration Schema.
    • List all properties in the required field.
    • Always use propertyOrder to explicitly define the order of the fields in the form.
    • Use your short title without a colon, period, etc.
    • Use a description to provide an explanatory sentence if needed.
      OK: Good Schema
      WRONG: Bad Schema
  • Use configuration description only if the configuration is not trivial / self-explainable. Provide links to resources (for instance, when creating an Elastic extractor, not everyone is familiar with the ElasticSearch query syntax). The configuration description supports markdown. Your markdown should not start with a header and should use only level 3 and level 4 headers (level 2 header is prepended before the configuration description).
    OK:
    some introduction text

    ### Input Description
    description of input tables

    #### First Table
    some other text
    WRONG:
    ## Configuration Description
    some introduction text

    #### Input Description
    description of input tables

Application Internals

  • Make sure that the amount of consumed memory does not depend on the amount of processed data. Use streaming or processing in chunks to maintain a limited amount of consumed memory. If not possible, state the expected usage in the Application Limits.
  • The application must distinguish between User and Application errors.
  • The application must validate its parameters; an invalid configuration must result in a user error.
  • The events produced must be reasonable. Provide status messages if possible and with a reasonable frequency. Avoid internal messages with no meaning to the end-user. Also avoid flooding the event log or sending data files in the event log.
  • Set up Continuous Deployment so that you can keep the application up to date.
  • Use semantic versioning to mark and deploy versions of your application. Using other tags (e.g., latest, master) in production is not allowed.