Initial configuration of your local directory can be done using the init command. It initiates the directory and pulls configurations from the project.
The Storage API token to your project is stored in the file .env.local
under the KBC_STORAGE_API_TOKEN
directive.
Currently, it is necessary to use Master tokens.
Your token must be secret, so the file .env.local
is included in the .gitignore
file.
Manifest - Naming defines directory names. It is usually not necessary to change this setting. It is guaranteed that each object (branch, config, row) will have its unique directory, even if the objects have the same name.
The following is an example of a default project directory structure. Some files and directories are specific to the component type. For example, transformations are represented by native files. A more detailed description can be found in the chapters below.
π« .gitignore - excludes ".env.local" from git repository
π« .env.local - contains Storage API token
π« .env.dist - template for .env.local
π .keboola - project metadata directory
β π¦ manifest.json - object IDs, paths, naming and other configuration
β π¦ project.json - project cache for local commands which contains backends, features, etc.
π© description.md - project description
π [branch-name] - branch directory, e.g., main
β£ π¦ meta.json
β£ π© description.md
β£ π _shared - shared codes directory
β β π [target-component] - target, e.g., keboola.python-transfomation
β β π codes
β β π[code-name] - shared code directory
β β£ π« code.[ext] - native file, e.g., ".sql" or ".py"
β β£ π¦ config.json
β β£ π¦ meta.json
β β π© description.md
β π [component-type] - e.g., extractor, app, ...
β π [component-id] - e.g., keboola.ex-db-oracle
β π [config-name] - configuration directory, e.g., raw-data
β£ π¦ config.json
β£ π¦ meta.json
β£ π© description.md
β£ π rows - only if the configuration has some rows
β β π [row-name] - configuration row directory, e.g., prod-fact-table
β β£ π¦ config.json
β β£ π¦ meta.json
β β π© description.md
β£ π blocks - only if the configuration is a transformation
β β π 001-block-1 - block directory
β β£ π¦ meta.json
β β π 001-code-1 - code directory
β β£ π« code.[ext] - native file, e.g., ".sql" or ".py"
β β π¦ meta.json
β£ π phases - only if the configuration is an orchestration
β β π 001-phase - phase directory
β β£ π¦ phase.json
β β π 001-task - task directory
β β π¦ task.json
β£ π schedules - only if the configuration has some schedules
β β π [schedule-name] - schedule directory
β β£ π¦ config.json
β β£ π¦ meta.json
β β π© description.md
β π variables - only if the configuration has defined some variables
β£ π¦ config.json - variables definition, name and type
β£ π¦ meta.json
β£ π© description.md
β π values - multiple sets of values can be defined
β π default - default values directory
β£ π¦ config.json - default values
β£ π¦ meta.json
β π© description.md
The tool works with dev branches by default. You can choose the branches from the project
you want to work with locally in the init command. You can ignore the dev branches concept and work with
the main branch only, of course. But note that all its configurations will be stored in the directory main
.
The directory of the main branch is called simply main
and does not contain the branch ID. This way, it is easily
distinguishable from the other branches.
The directory contains description.md
where you can write the description formatted in Markdown
and meta.json
containing the name of the branch and flag if it is the default or not.
Example of meta.json
:
{
"name": "Main",
"isDefault": true
}
Then there are directories thematically grouping components: extractor
, other
, transformation
, writer
.
Example of a branch folder with components configurations:
The directory of each configuration contains config.json
with parameters specific for each component, description.md
,
where you can write a description formatted in Markdown and meta.json
containing the name
of the configuration.
Example of config.json
for Generic Extractor:
{
"parameters": {
"api": {
"baseUrl": "https://wikipedia.org"
}
}
}
Example of meta.json
:
{
"name": "Wikipedia"
}
Configuration directories can be copied freely inside the project and between other projects. Their IDs are stored in the manifest. So after the copy & paste, make sure to run the persist command, which generates a new ID for the configuration and saves it in the manifest.
The directory structure of configuration rows is the same as the configuration itself. The component configuration
contains a directory rows
that includes a directory for each row. That directory contains config.json
,
description.md
and meta.json
.
Example of meta.json
:
{
"name": "share/cities2",
"isDisabled": false
}
Example of a Google Drive extractor configuration:
In addition to other configurations, the transformations directories contain a blocks
directory and in it a list of codes.
Codes are stored in native files according to the type of transformation. I.e., Snowflake transformations store the codes
in .sql
files.
Example of a Snowflake transformation configuration:
The variables directory in addition to the standard
configuration layout contains the directory values
.
Letβs say you have these two variables in your transformation:
When you pull them to the local directory, it will look like this:
Variables configuration in variables/config.json
:
{
"variables": [
{
"name": "state",
"type": "string"
},
{
"name": "city",
"type": "string"
}
]
}
Default values configuration in variables/values/default/config.json
:
{
"values": [
{
"name": "state",
"value": "NY"
},
{
"name": "city",
"value": "Boston"
}
]
}
Shared code blocks are stored in the branch directory
under the _shared
subdirectory so that they can be reused between different configurations.
If you create shared code from your block:
It will move to the _shared
directory:
And the code in the transformation file blocks/block-1/join/code.sql
will be changed to:
Orchestrator or any other component can have a schedule to be run automatically and periodically. The schedule resides in a configuration directory.
The scheduleβs config.json
contains the crontab format of the schedule, timezone, and flag
if it should be enabled or not.
This example shows a schedule to be run at minute 40 past every hour:
{
"schedule": {
"cronTab": "40 */1 * * *",
"timezone": "UTC",
"state": "enabled"
},
"target": {
"mode": "run"
}
}
The Orchestrator directories contain the phases
directory and in it a list of tasks.
Example:
A phase.json
example:
{
"name": "Transformation",
"dependsOn": [
"001-extraction"
]
}
A task.json
example:
{
"name": "keboola.snowflake-transformation-7241628",
"task": {
"mode": "run",
"configPath": "transformation/keboola.snowflake-transformation/address-completion"
},
"continueOnFailure": false,
"enabled": true
}
The local state of the project is stored in the manifest file .keboola/manifest.json
. It is not recommended to modify
this file manually.
This is its basic structure:
version
- current major version, now 2
project
- information about the project
id
- ID of the projectapiHost
- URL of the Keboola instance (e.g., connection.keboola.com
)allowTargetEnv
- boolean, default false
true
, environment variables KBC_PROJECT_ID
and KBC_BRANCH_ID
can be used to temporary override the target project and branch.sortBy
- name of the configuration property used for sorting (default id
)naming
- rules for directory names, see the detailsallowedBranches
- array of branches to work withignoredComponents
- array of components to not work withtemplates
repositories
(array):
type
= dir
name
- repository nameurl
- absolute or relative path to a local directory
type
= git
name
- repository nameurl
- URL of the git repository
https://github.com/keboola/keboola-as-code-templates.git
ref
- git branch
or tag
, e.g. main
or v1.2.3
branches
- array of used branches
id
- ID of the branchpath
- name of the directory containing the branch configuration (e.g., main
)configurations
- array of component configurations
branchId
- ID of the branch the configuration belongs tocomponentId
- ID of the component (e.g., keboola.ex-aws-s3
)id
- ID of the configurationpath
- path to the configuration in the local directory (e.g., extractor/keboola.ex-aws-s3/7241111/my-aws-s3-data-source
)rows
- array of configuration rows (if the component supports rows)
id
- ID of the rowpath
- path to the row from the configuration directory (e.g., rows/cities
)Names of the directories of different configuration types are subject to the rules defined in
the manifest under the naming
section. These are the default values:
{
"branch": "{branch_name}",
"config": "{component_type}/{component_id}/{config_name}",
"configRow": "rows/{config_row_name}",
"schedulerConfig": "schedules/{config_name}",
"sharedCodeConfig": "_shared/{target_component_id}",
"sharedCodeConfigRow": "codes/{config_row_name}",
"variablesConfig": "variables",
"variablesValuesRow": "values/{config_row_name}"
}
If you want to include object IDs in directory names, use these values:
{
"branch": "{branch_id}-{branch_name}",
"config": "{component_type}/{component_id}/{config_id}-{config_name}",
"configRow": "rows/{config_row_id}-{config_row_name}",
"schedulerConfig": "schedules/{config_name}",
"sharedCodeConfig": "_shared/{target_component_id}",
"sharedCodeConfigRow": "codes/{config_row_name}",
"variablesConfig": "variables",
"variablesValuesRow": "values/{config_row_name}"
}
You can change them according to your wishes and let the project directory be rebuilt using the fix-paths command.
The project cache is stored in the project file .keboola/project.json
. Local commands use it because they donβt call authorized requests to the Storage API.
This is its basic structure:
backends
- list of the project backendsfeatures
- list of the project featuresdefaultBranchId
- ID of the default branch{
"backends": [
"snowflake"
],
"features": [
"workspace-snowflake-dynamic-backend-size",
"input-mapping-read-only-storage",
"syrup-jobs-limit-10",
"oauth-v3"
],
"defaultBranchId": 123
}