The initial configuration of your local directory can be done using the init command. This command initializes the directory and pulls configurations from the project.
The Storage API token for your project is stored in the .env.local
file under the KBC_STORAGE_API_TOKEN
directive.
Currently, you must use a master token.
To maintain security, .env.local
is automatically included in the .gitignore file to prevent it from being committed to your Git repository.
Manifest - Naming defines directory names. Typically, this setting does not need to be changed. Each object (branch, configuration, row) is guaranteed to have a unique directory, even if objects share the same name.
Below is an example of a default project directory structure. Some files and directories are specific to the component type. For example, transformations are represented by native files. A more detailed description can be found in the chapters below.
π« .gitignore - excludes ".env.local" from the Git repository
π« .env.local - contains the Storage API token
π« .env.dist - template for ".env.local"
π .keboola - project metadata directory
β£ π¦ manifest.json - contains object IDs, paths, naming and other configuration details
β£ π¦ project.json - project cache for local commands, including backends and features
β π« .kbcignore - optional file listing paths to configurations to exclude from CLI sync
π© description.md - project description
π [branch-name] - branch directory (e.g., "main")
β£ π¦ meta.json
β£ π© description.md
β£ π _shared - shared code directory
β β π [target-component] - target component (e.g., "keboola.python-transfomation")
β β π codes
β β π[code-name] - shared code directory
β β£ π« code.[ext] - native file (e.g., ".sql" or ".py")
β β£ π¦ config.json
β β£ π¦ meta.json
β β π© description.md
β π [component-type] - e.g., extractor, app, ...
β π [component-id] - e.g., keboola.ex-db-oracle
β π [config-name] - configuration directory (e.g., "raw-data")
β£ π¦ config.json
β£ π¦ meta.json
β£ π© description.md
β£ π rows - only if the configuration has some rows
β β π [row-name] - configuration row directory (e.g., "prod-fact-table")
β β£ π¦ config.json
β β£ π¦ meta.json
β β π© description.md
β£ π blocks - only if the configuration is a transformation
β β π 001-block-1 - block directory
β β£ π¦ meta.json
β β π 001-code-1 - code directory
β β£ π« code.[ext] - native file (e.g., ".sql" or ".py")
β β π¦ meta.json
β£ π phases - only if the configuration is an orchestration
β β π 001-phase - phase directory
β β£ π¦ phase.json
β β π 001-task - task directory
β β π¦ task.json
β£ π schedules - only if the configuration has some schedules
β β π [schedule-name] - schedule directory
β β£ π¦ config.json
β β£ π¦ meta.json
β β π© description.md
β π variables - only if the configuration has some variables defined
β£ π¦ config.json - variable definition, name, and type
β£ π¦ meta.json
β£ π© description.md
β π values - multiple sets of values can be defined
β π default - default values directory
β£ π¦ config.json - default values
β£ π¦ meta.json
β π© description.md
The tool works with development branches by default. You can specify which branches from the project
you want to work with locally during the init command. Alternatively, you can ignore the development branches concept and work exclusively
with the main branch. However, note that all configurations will then be stored in the main
directory.
The main branch directory is simply named main
and does not include the branch ID. This makes it easily distinguishable from the other branches.
Each branch directory contains:
description.md
: Use this file to write a branch description formatted in Markdown.meta.json
: Contains the name of the branch and a flag indicating whether it is the default branch.Example of meta.json
:
{
"name": "Main",
"isDefault": true
}
Within the branch directory, configurations are organized into thematic directories: extractor
, other
, transformation
, and writer
.
Example of a branch folder with components configurations:
Each configuration directory contains the following files:
config.json
: Includes parameters specific to the component.description.md
: A description file formatted in Markdown.meta.json
: Contains the name of the configuration.Example of config.json
for the Generic extractor:
{
"parameters": {
"api": {
"baseUrl": "https://wikipedia.org"
}
}
}
Example of meta.json
:
{
"name": "Wikipedia"
}
Configuration directories can be copied freely within the project and between projects. Their IDs are stored in the manifest. After copying, run the persist command to generate a new ID for the configuration and update it in the manifest.
The directory structure for configuration rows is identical to that of configurations. The component configuration
includes a rows
directory, which contains a subdirectory for each row. Each row directory includes config.json
,
description.md
, and meta.json
.
Example of meta.json
:
{
"name": "share/cities2",
"isDisabled": false
}
Example of a Google Drive extractor configuration:
In addition to standard configurations, transformation directories include a blocks
directory containing a list of codes.
Codes are stored as native files corresponding to the transformation type. For example, Snowflake transformations store codes
in .sql
files.
Example of a Snowflake transformation configuration:
The variables directory, in addition to the standard
configuration layout, contains a values
subdirectory.
For example, suppose you have the following two variables in your transformation:
When you pull them to the local directory, the structure will look like this:
Variables configuration in variables/config.json
:
{
"variables": [
{
"name": "state",
"type": "string"
},
{
"name": "city",
"type": "string"
}
]
}
Default values configuration in variables/values/default/config.json
:
{
"values": [
{
"name": "state",
"value": "NY"
},
{
"name": "city",
"value": "Boston"
}
]
}
Shared code blocks are stored in the branch directory
under the _shared
subdirectory, enabling reuse across different configurations.
If you create shared code from a block:
It will move to the _shared
directory:
The code in the transformation file blocks/block-1/join/code.sql
will then be replaced with:
The Orchestrator or any other component can have a schedule to run automatically and periodically. The schedule configuration is stored within a specific directory.
The config.json
file for the schedule contains the schedule in crontab format, the timezone, and a flag
indiciating whether the schedule is enabled.
For example, the following configuration runs at the 40th minute of every hour:
{
"schedule": {
"cronTab": "40 */1 * * *",
"timezone": "UTC",
"state": "enabled"
},
"target": {
"mode": "run"
}
}
Orchestrator directories include the phases
directory, which contains a list of tasks for execution.
Example:
Example phase.json
:
{
"name": "Transformation",
"dependsOn": [
"001-extraction"
]
}
Example task.json
:
{
"name": "keboola.snowflake-transformation-7241628",
"task": {
"mode": "run",
"configPath": "transformation/keboola.snowflake-transformation/address-completion"
},
"continueOnFailure": false,
"enabled": true
}
Using kbcdir.jsonnet
for different orchestration phases:
The kbcdir.jsonnet
file can be used to specify which directories in the phases folder should be ignored for different project backends. By setting the isIgnored
value to true in the file, you can exclude specific directories.
Example kbcdir.jsonnet
:
{
"isIgnored":false
}
The local state of the project is stored in the .keboola/manifest.json
file. It is not recommended to modify
this file manually.
version
: Current major version (e.g., 2
)project
: Information about the project
id
: ID of the projectapiHost
: URL of the Keboola instance (e.g., connection.keboola.com
)allowTargetEnv
: Boolean (default: false
)
true
, allows environment variables KBC_PROJECT_ID
and KBC_BRANCH_ID
to temporary override the target project and branch without modifying the manifest.sortBy
: Property name used for sorting configurations (default: id
)naming
: Rules for directory naming (see details)allowedBranches
: Array of branches to work withignoredComponents
: Array of components to excludetemplates
:
repositories
(array):
type
= dir
name
: Repository nameurl
: Absolute or relative path to a local directory
type
= git
name
: Repository nameurl
: URL of the Git repository
https://github.com/keboola/keboola-as-code-templates.git
ref
: Git branch
or tag
(e.g., main
or v1.2.3
)branches
: List of used branches
id
: Branch IDpath
: Directory name (e.g., main
)configurations
: List of component configurations
branchId
: Branch IDcomponentId
: Component ID (e.g., keboola.ex-aws-s3
)id
: Configuration IDpath
: Path to the configuration in the local directory (e.g., extractor/keboola.ex-aws-s3/7241111/my-aws-s3-data-source
)rows
: List of configuration rows (if the component supports rows)
id
: Row IDpath
: Path to the row from the configuration directory (e.g., rows/cities
)Directory names for configurations follow the rules in the manifest under the naming
section.
These are the default values:
{
"branch": "{branch_name}",
"config": "{component_type}/{component_id}/{config_name}",
"configRow": "rows/{config_row_name}",
"schedulerConfig": "schedules/{config_name}",
"sharedCodeConfig": "_shared/{target_component_id}",
"sharedCodeConfigRow": "codes/{config_row_name}",
"variablesConfig": "variables",
"variablesValuesRow": "values/{config_row_name}"
}
To include object IDs in directory names, use these values:
{
"branch": "{branch_id}-{branch_name}",
"config": "{component_type}/{component_id}/{config_id}-{config_name}",
"configRow": "rows/{config_row_id}-{config_row_name}",
"schedulerConfig": "schedules/{config_name}",
"sharedCodeConfig": "_shared/{target_component_id}",
"sharedCodeConfigRow": "codes/{config_row_name}",
"variablesConfig": "variables",
"variablesValuesRow": "values/{config_row_name}"
}
Use the fix-paths command to rebuild the directory structure with updated naming rules.
The project cache is stored in .keboola/project.json
and is used by local commands without making authorized requests to the Storage API.
This is its basic structure:
backends
: List of project backendsfeatures
: List of project featuresdefaultBranchId
: ID of the default branchExample:
{
"backends": [
"snowflake"
],
"features": [
"workspace-snowflake-dynamic-backend-size",
"input-mapping-read-only-storage",
"syrup-jobs-limit-10",
"oauth-v3"
],
"defaultBranchId": 123
}
You can exclude specific configurations from the sync process by creating a .kbcignore
file in the .keboola
directory.
It is a plain text file where each line specifies a path to a configuration or configuration row in the format
{component_id}/{configuration_id}/{row_id}
. The row_id
is optional for row-based configurations.
Example .kbcignore
file:
keboola.python-transformation-v2/1197618481
keboola.keboola.wr-db-snowflake/1196309603/1196309605
This excludes:
keboola.python-transformation-v2
) with the ID 1197618481
.1196309605
in the configuration of the Snowflake writer (keboola.keboola.wr-db-snowflake
) with the ID 1196309603
.As a result, the kbc sync pull
and kbc sync push
commands will not synchronize these configurations.
kbc push
operation
The kbc push
command will skip the excluded configurations and will not push them back to the project, even if they exist or have been modified in the local folder structure.
The log will display the following message:
β kbc push
Plan for "push" operation:
Γ main/transformation/keboola.python-transformation-v2/dev-l0-sample-data - IGNORED
Skipped remote objects deletion, use "--force" to delete them.
Push done.
The log clearly identifies configurations that were ignored, even if they are absent in the local folder structure.
kbc pull
operation
The kbc pull
command will exclude the matched configurations and not pull them from the project.
Warning:
If the matched configuration is already present locally, it will be deleted from both the filesystem and manifest.json.
If the configuration was already present locally, the log will indicate its deletion as shown below:
β kbc pull
Plan for "pull" operation:
Γ C main/writer/keboola.wr-db-snowflake/my-snowflake-data-destination
Γ R main/writer/keboola.wr-db-snowflake/my-snowflake-data-destination/rows/test-sheet1
Pull done.