Project Directory Structure

The initial configuration of your local directory can be done using the init command. This command initializes the directory and pulls configurations from the project.

The Storage API token for your project is stored in the .env.local file under the KBC_STORAGE_API_TOKEN directive. Currently, you must use a master token. To maintain security, .env.local is automatically included in the .gitignore file to prevent it from being committed to your Git repository.

Manifest - Naming defines directory names. Typically, this setting does not need to be changed. Each object (branch, configuration, row) is guaranteed to have a unique directory, even if objects share the same name.

Below is an example of a default project directory structure. Some files and directories are specific to the component type. For example, transformations are represented by native files. A more detailed description can be found in the chapters below.


🟫 .gitignore                   - excludes ".env.local" from the Git repository
🟫 .env.local                   - contains the Storage API token
🟫 .env.dist                    - template for ".env.local"
πŸ“‚ .keboola                     - project metadata directory
┣ 🟦 manifest.json              - contains object IDs, paths, naming and other configuration details
┣ 🟦 project.json               - project cache for local commands, including backends and features
β”— 🟫 .kbcignore                 - optional file listing paths to configurations to exclude from CLI sync
🟩 description.md               - project description
πŸ“‚ [branch-name]                - branch directory (e.g., "main")
┣ 🟦 meta.json                  
┣ 🟩 description.md             
┣ πŸ“‚ _shared                    - shared code directory
┃ β”— πŸ“‚ [target-component]       - target component (e.g., "keboola.python-transfomation")
┃   β”— πŸ“‚ codes      
┃     β”— πŸ“‚[code-name]           - shared code directory
┃       ┣ 🟫 code.[ext]         - native file (e.g., ".sql" or ".py")
┃       ┣ 🟦 config.json    
┃       ┣ 🟦 meta.json   
┃       β”— 🟩 description.md
β”— πŸ“‚ [component-type]           - e.g., extractor, app, ...
  β”— πŸ“‚ [component-id]           - e.g., keboola.ex-db-oracle
    β”— πŸ“‚ [config-name]          - configuration directory (e.g., "raw-data")
      ┣ 🟦 config.json           
      ┣ 🟦 meta.json    
      ┣ 🟩 description.md    
      ┣ πŸ“‚ rows                 - only if the configuration has some rows
      ┃ β”— πŸ“‚ [row-name]         - configuration row directory (e.g., "prod-fact-table")
      ┃   ┣ 🟦 config.json     
      ┃   ┣ 🟦 meta.json
      ┃   β”— 🟩 description.md
      ┣ πŸ“‚ blocks               - only if the configuration is a transformation
      ┃ β”— πŸ“‚ 001-block-1        - block directory
      ┃   ┣ 🟦 meta.json   
      ┃   β”— πŸ“‚ 001-code-1       - code directory
      ┃     ┣ 🟫 code.[ext]     - native file (e.g., ".sql" or ".py")
      ┃     β”— 🟦 meta.json   
      ┣ πŸ“‚ phases               - only if the configuration is an orchestration
      ┃ β”— πŸ“‚ 001-phase          - phase directory
      ┃   ┣ 🟦 phase.json   
      ┃   β”— πŸ“‚ 001-task         - task directory
      ┃     β”— 🟦 task.json   
      ┣ πŸ“‚ schedules            - only if the configuration has some schedules
      ┃ β”— πŸ“‚ [schedule-name]    - schedule directory
      ┃   ┣ 🟦 config.json     
      ┃   ┣ 🟦 meta.json
      ┃   β”— 🟩 description.md
      β”— πŸ“‚ variables            - only if the configuration has some variables defined
        ┣ 🟦 config.json        - variable definition, name, and type
        ┣ 🟦 meta.json
        ┣ 🟩 description.md
        β”— πŸ“‚ values             - multiple sets of values can be defined
          β”— πŸ“‚ default          - default values directory
            ┣ 🟦 config.json    - default values     
            ┣ 🟦 meta.json
            β”— 🟩 description.md  

Branches

The tool works with development branches by default. You can specify which branches from the project you want to work with locally during the init command. Alternatively, you can ignore the development branches concept and work exclusively with the main branch. However, note that all configurations will then be stored in the main directory.

The main branch directory is simply named main and does not include the branch ID. This makes it easily distinguishable from the other branches.

Each branch directory contains:

  • description.md: Use this file to write a branch description formatted in Markdown.
  • meta.json: Contains the name of the branch and a flag indicating whether it is the default branch.

Example of meta.json:

{
  "name": "Main",
  "isDefault": true
}

Within the branch directory, configurations are organized into thematic directories: extractor, other, transformation, and writer.

Example of a branch folder with components configurations:

Screenshot -- A configuration directory example

Configurations

Each configuration directory contains the following files:

  • config.json: Includes parameters specific to the component.
  • description.md: A description file formatted in Markdown.
  • meta.json: Contains the name of the configuration.

Example of config.json for the Generic extractor:

{
  "parameters": {
    "api": {
      "baseUrl": "https://wikipedia.org"
    } 
  }
}

Example of meta.json:

{
  "name": "Wikipedia"
}

Configuration directories can be copied freely within the project and between projects. Their IDs are stored in the manifest. After copying, run the persist command to generate a new ID for the configuration and update it in the manifest.

Configuration Rows

The directory structure for configuration rows is identical to that of configurations. The component configuration includes a rows directory, which contains a subdirectory for each row. Each row directory includes config.json, description.md, and meta.json.

Example of meta.json:

{
  "name": "share/cities2",
  "isDisabled": false
}

Example of a Google Drive extractor configuration:

Screenshot -- A configuration rows directory example

Transformations

In addition to standard configurations, transformation directories include a blocks directory containing a list of codes. Codes are stored as native files corresponding to the transformation type. For example, Snowflake transformations store codes in .sql files.

Example of a Snowflake transformation configuration:

Screenshot -- A transformation directory example

Variables

The variables directory, in addition to the standard configuration layout, contains a values subdirectory.

For example, suppose you have the following two variables in your transformation:

Screenshot -- Variables in the UI

When you pull them to the local directory, the structure will look like this:

Screenshot -- Configuration directory with the variables

Variables configuration in variables/config.json:

{
  "variables": [
    {
      "name": "state",
      "type": "string"
    },
    {
      "name": "city",
      "type": "string"
    }
  ]
}

Default values configuration in variables/values/default/config.json:

{
  "values": [
    {
      "name": "state",
      "value": "NY"
    },
    {
      "name": "city",
      "value": "Boston"
    }
  ]
}

Shared Code

Shared code blocks are stored in the branch directory under the _shared subdirectory, enabling reuse across different configurations.

If you create shared code from a block:

Screenshot -- Shared code directory

It will move to the _shared directory:

Screenshot -- Shared code directory

The code in the transformation file blocks/block-1/join/code.sql will then be replaced with:

Screenshot -- Shared code code

Schedules

The Orchestrator or any other component can have a schedule to run automatically and periodically. The schedule configuration is stored within a specific directory.

Screenshot -- Scheduler directory

The config.json file for the schedule contains the schedule in crontab format, the timezone, and a flag indiciating whether the schedule is enabled.

For example, the following configuration runs at the 40th minute of every hour:

{
  "schedule": {
    "cronTab": "40 */1 * * *",
    "timezone": "UTC",
    "state": "enabled"
  },
  "target": {
    "mode": "run"
  }
}

Orchestrations

Orchestrator directories include the phases directory, which contains a list of tasks for execution.

Example:

Screenshot -- An orchestration directory

Example phase.json:

{
  "name": "Transformation",
  "dependsOn": [
    "001-extraction"
  ]
}

Example task.json:

{
  "name": "keboola.snowflake-transformation-7241628",
  "task": {
    "mode": "run",
    "configPath": "transformation/keboola.snowflake-transformation/address-completion"
  },
  "continueOnFailure": false,
  "enabled": true
}

Using kbcdir.jsonnet for different orchestration phases:

The kbcdir.jsonnet file can be used to specify which directories in the phases folder should be ignored for different project backends. By setting the isIgnored value to true in the file, you can exclude specific directories.

Example kbcdir.jsonnet:

{
  "isIgnored":false 
}

Manifest

The local state of the project is stored in the .keboola/manifest.json file. It is not recommended to modify this file manually.

Basic Manifest Structure

  • version: Current major version (e.g., 2)
  • project: Information about the project
    • id: ID of the project
    • apiHost: URL of the Keboola instance (e.g., connection.keboola.com)
  • allowTargetEnv: Boolean (default: false)
    • If true, allows environment variables KBC_PROJECT_ID and KBC_BRANCH_ID to temporary override the target project and branch without modifying the manifest.
    • The mapping is bidirectional and occurs during the manifest’s save and load operations.
    • For more information, see the –allow-target-env option in the kbc sync init command.
  • sortBy: Property name used for sorting configurations (default: id)
  • naming: Rules for directory naming (see details)
  • allowedBranches: Array of branches to work with
  • ignoredComponents: Array of components to exclude
  • templates:
    • repositories (array):
      • Local repository:
        • type = dir
        • name: Repository name
        • url: Absolute or relative path to a local directory
          • Relative path must be relative to the project directory.
      • Git-based repository:
        • type = git
        • name: Repository name
        • url: URL of the Git repository
          • E.g., https://github.com/keboola/keboola-as-code-templates.git
        • ref: Git branch or tag (e.g., main or v1.2.3)
  • branches: List of used branches
    • id: Branch ID
    • path: Directory name (e.g., main)
  • configurations: List of component configurations
    • branchId: Branch ID
    • componentId: Component ID (e.g., keboola.ex-aws-s3)
    • id: Configuration ID
    • path: Path to the configuration in the local directory (e.g., extractor/keboola.ex-aws-s3/7241111/my-aws-s3-data-source)
    • rows: List of configuration rows (if the component supports rows)
      • id: Row ID
      • path: Path to the row from the configuration directory (e.g., rows/cities)

Naming

Directory names for configurations follow the rules in the manifest under the naming section.
These are the default values:

{
    "branch": "{branch_name}",
    "config": "{component_type}/{component_id}/{config_name}",
    "configRow": "rows/{config_row_name}",
    "schedulerConfig": "schedules/{config_name}",
    "sharedCodeConfig": "_shared/{target_component_id}",
    "sharedCodeConfigRow": "codes/{config_row_name}",
    "variablesConfig": "variables",
    "variablesValuesRow": "values/{config_row_name}"
  }

To include object IDs in directory names, use these values:

{
    "branch": "{branch_id}-{branch_name}",
    "config": "{component_type}/{component_id}/{config_id}-{config_name}",
    "configRow": "rows/{config_row_id}-{config_row_name}",
    "schedulerConfig": "schedules/{config_name}",
    "sharedCodeConfig": "_shared/{target_component_id}",
    "sharedCodeConfigRow": "codes/{config_row_name}",
    "variablesConfig": "variables",
    "variablesValuesRow": "values/{config_row_name}"
  }

Use the fix-paths command to rebuild the directory structure with updated naming rules.

Project Cache

The project cache is stored in .keboola/project.json and is used by local commands without making authorized requests to the Storage API.

This is its basic structure:

  • backends: List of project backends
  • features: List of project features
  • defaultBranchId: ID of the default branch

Example:

{
  "backends": [
    "snowflake"
  ],
  "features": [
    "workspace-snowflake-dynamic-backend-size",
    "input-mapping-read-only-storage",
    "syrup-jobs-limit-10",
    "oauth-v3"
  ],
  "defaultBranchId": 123
}

.kbcignore

You can exclude specific configurations from the sync process by creating a .kbcignore file in the .keboola directory.

It is a plain text file where each line specifies a path to a configuration or configuration row in the format {component_id}/{configuration_id}/{row_id}. The row_id is optional for row-based configurations.

Example .kbcignore file:

keboola.python-transformation-v2/1197618481
keboola.keboola.wr-db-snowflake/1196309603/1196309605

This excludes:

  • The configuration of the Python transformation (keboola.python-transformation-v2) with the ID 1197618481.
  • Row ID 1196309605 in the configuration of the Snowflake writer (keboola.keboola.wr-db-snowflake) with the ID 1196309603.

As a result, the kbc sync pull and kbc sync push commands will not synchronize these configurations.

kbc push operation

The kbc push command will skip the excluded configurations and will not push them back to the project, even if they exist or have been modified in the local folder structure. The log will display the following message:

➜ kbc push
Plan for "push" operation:
  Γ— main/transformation/keboola.python-transformation-v2/dev-l0-sample-data - IGNORED
Skipped remote objects deletion, use "--force" to delete them.
Push done.

The log clearly identifies configurations that were ignored, even if they are absent in the local folder structure.

kbc pull operation

The kbc pull command will exclude the matched configurations and not pull them from the project.

Warning:
If the matched configuration is already present locally, it will be deleted from both the filesystem and manifest.json.

If the configuration was already present locally, the log will indicate its deletion as shown below:

➜ kbc pull
Plan for "pull" operation:
  Γ— C main/writer/keboola.wr-db-snowflake/my-snowflake-data-destination
  Γ— R main/writer/keboola.wr-db-snowflake/my-snowflake-data-destination/rows/test-sheet1
Pull done.

Next Steps