Project Directory Structure

Initial configuration of your local directory can be done using the init command. It initiates the directory and pulls configurations from the project.

The Storage API token to your project is stored in the file .env.local under the KBC_STORAGE_API_TOKEN directive. Currently, it is necessary to use Master tokens. Your token must be secret, so the file .env.local is included in the .gitignore file.

Manifest - Naming defines directory names. It is usually not necessary to change this setting. It is guaranteed that each object (branch, config, row) will have its unique directory, even if the objects have the same name.

The following is an example of a default project directory structure. Some files and directories are specific to the component type. For example, transformations are represented by native files. A more detailed description can be found in the chapters below.


🟫 .gitignore                   - excludes ".env.local" from git repository
🟫 .env.local                   - contains Storage API token
🟫 .env.dist                    - template for .env.local
πŸ“‚ .keboola                     - project metadata directory
β”— 🟦 manifest.json              - object IDs, paths, naming and other configuration
β”— 🟦 project.json               - project cache for local commands which contains backends, features, etc.
🟩 description.md               - project description
πŸ“‚ [branch-name]                - branch directory, e.g., main
┣ 🟦 meta.json
┣ 🟩 description.md
┣ πŸ“‚ _shared                    - shared codes directory
┃ β”— πŸ“‚ [target-component]       - target, e.g., keboola.python-transfomation
┃   β”— πŸ“‚ codes      
┃     β”— πŸ“‚[code-name]           - shared code directory
┃       ┣ 🟫 code.[ext]         - native file, e.g., ".sql" or ".py"
┃       ┣ 🟦 config.json    
┃       ┣ 🟦 meta.json   
┃       β”— 🟩 description.md
β”— πŸ“‚ [component-type]           - e.g., extractor, app, ...
  β”— πŸ“‚ [component-id]           - e.g., keboola.ex-db-oracle
    β”— πŸ“‚ [config-name]          - configuration directory, e.g., raw-data
      ┣ 🟦 config.json           
      ┣ 🟦 meta.json    
      ┣ 🟩 description.md    
      ┣ πŸ“‚ rows                 - only if the configuration has some rows
      ┃ β”— πŸ“‚ [row-name]         - configuration row directory, e.g., prod-fact-table
      ┃   ┣ 🟦 config.json     
      ┃   ┣ 🟦 meta.json
      ┃   β”— 🟩 description.md
      ┣ πŸ“‚ blocks               - only if the configuration is a transformation
      ┃ β”— πŸ“‚ 001-block-1        - block directory
      ┃   ┣ 🟦 meta.json   
      ┃   β”— πŸ“‚ 001-code-1       - code directory
      ┃     ┣ 🟫 code.[ext]     - native file, e.g., ".sql" or ".py"
      ┃     β”— 🟦 meta.json   
      ┣ πŸ“‚ phases               - only if the configuration is an orchestration
      ┃ β”— πŸ“‚ 001-phase          - phase directory
      ┃   ┣ 🟦 phase.json   
      ┃   β”— πŸ“‚ 001-task         - task directory
      ┃     β”— 🟦 task.json   
      ┣ πŸ“‚ schedules            - only if the configuration has some schedules
      ┃ β”— πŸ“‚ [schedule-name]    - schedule directory
      ┃   ┣ 🟦 config.json     
      ┃   ┣ 🟦 meta.json
      ┃   β”— 🟩 description.md
      β”— πŸ“‚ variables            - only if the configuration has defined some variables
        ┣ 🟦 config.json        - variables definition, name and type
        ┣ 🟦 meta.json
        ┣ 🟩 description.md
        β”— πŸ“‚ values             - multiple sets of values can be defined
          β”— πŸ“‚ default          - default values directory
            ┣ 🟦 config.json    - default values     
            ┣ 🟦 meta.json
            β”— 🟩 description.md  

Branches

The tool works with dev branches by default. You can choose the branches from the project you want to work with locally in the init command. You can ignore the dev branches concept and work with the main branch only, of course. But note that all its configurations will be stored in the directory main.

The directory of the main branch is called simply main and does not contain the branch ID. This way, it is easily distinguishable from the other branches.

The directory contains description.md where you can write the description formatted in Markdown and meta.json containing the name of the branch and flag if it is the default or not.

Example of meta.json:

{
  "name": "Main",
  "isDefault": true
}

Then there are directories thematically grouping components: extractor, other, transformation, writer.

Example of a branch folder with components configurations:

Screenshot -- A configuration directory example

Configurations

The directory of each configuration contains config.json with parameters specific for each component, description.md, where you can write a description formatted in Markdown and meta.json containing the name of the configuration.

Example of config.json for Generic Extractor:

{
  "api": {
      "baseUrl": "https://wikipedia.org"
  }
}

Example of meta.json:

{
  "name": "Wikipedia"
}

Configuration directories can be copied freely inside the project and between other projects. Their IDs are stored in the manifest. So after the copy & paste, make sure to run the persist command, which generates a new ID for the configuration and saves it in the manifest.

Configuration Rows

The directory structure of configuration rows is the same as the configuration itself. The component configuration contains a directory rows that includes a directory for each row. That directory contains config.json, description.md and meta.json.

Example of meta.json:

{
  "name": "share/cities2",
  "isDisabled": false
}

Example of a Google Drive extractor configuration:

Screenshot -- A configuration rows directory example

Transformations

In addition to other configurations, the transformations directories contain a blocks directory and in it a list of codes. Codes are stored in native files according to the type of transformation. I.e., Snowflake transformations store the codes in .sql files.

Example of a Snowflake transformation configuration:

Screenshot -- A transformation directory example

Variables

The variables directory in addition to the standard configuration layout contains the directory values.

Let’s say you have these two variables in your transformation:

Screenshot -- Variables in the UI

When you pull them to the local directory, it will look like this:

Screenshot -- Configuration directory with the variables

Variables configuration in variables/config.json:

{
  "variables": [
    {
      "name": "state",
      "type": "string"
    },
    {
      "name": "city",
      "type": "string"
    }
  ]
}

Default values configuration in variables/values/default/config.json:

{
  "values": [
    {
      "name": "state",
      "value": "NY"
    },
    {
      "name": "city",
      "value": "Boston"
    }
  ]
}

Shared Code

Shared code blocks are stored in the branch directory under the _shared subdirectory so that they can be reused between different configurations.

If you create shared code from your block:

Screenshot -- Shared code directory

It will move to the _shared directory:

Screenshot -- Shared code directory

And the code in the transformation file blocks/block-1/join/code.sql will be changed to:

Screenshot -- Shared code code

Schedules

Orchestrator or any other component can have a schedule to be run automatically and periodically. The schedule resides in a configuration directory.

Screenshot -- Scheduler directory

The schedule’s config.json contains the crontab format of the schedule, timezone, and flag if it should be enabled or not.

This example shows a schedule to be run at minute 40 past every hour:

{
  "schedule": {
    "cronTab": "40 */1 * * *",
    "timezone": "UTC",
    "state": "enabled"
  },
  "target": {
    "mode": "run"
  }
}

Orchestrations

The Orchestrator directories contain the phases directory and in it a list of tasks.

Example:

Screenshot -- An orchestration directory

A phase.json example:

{
  "name": "Transformation",
  "dependsOn": [
    "001-extraction"
  ]
}

A task.json example:

{
  "name": "keboola.snowflake-transformation-7241628",
  "task": {
    "mode": "run",
    "configPath": "transformation/keboola.snowflake-transformation/address-completion"
  },
  "continueOnFailure": false,
  "enabled": true
}

Manifest

The local state of the project is stored in the manifest file .keboola/manifest.json. It is not recommended to modify this file manually.

This is its basic structure:

  • version - current major version, now 2
  • project - information about the project
    • id - ID of the project
    • apiHost - URL of the Keboola instance (e.g., connection.keboola.com)
  • allowTargetEnv - boolean, default false
    • If true, environment variables KBC_PROJECT_ID and KBC_BRANCH_ID can be used to temporary override the target project and branch.
    • The IDs in the manifest will remain unchanged.
    • Mapping is bidirectional, it is performed on the manifest save and load.
    • See also the –allow-target-env option of the kbc sync init command.
  • sortBy - name of the configuration property used for sorting (default id)
  • naming - rules for directory names, see the details
  • allowedBranches - array of branches to work with
  • ignoredComponents - array of components to not work with
  • templates
    • repositories (array):
      • local repository:
        • type = dir
        • name - repository name
        • url - absolute or relative path to a local directory
          • relative path must be relative to the project directory
      • git repository:
        • type = git
        • name - repository name
        • url - URL of the git repository
          • e.g. https://github.com/keboola/keboola-as-code-templates.git
        • ref - git branch or tag, e.g. main or v1.2.3
  • branches - array of used branches
    • id - ID of the branch
    • path - name of the directory containing the branch configuration (e.g., main)
  • configurations - array of component configurations
    • branchId - ID of the branch the configuration belongs to
    • componentId - ID of the component (e.g., keboola.ex-aws-s3)
    • id - ID of the configuration
    • path - path to the configuration in the local directory (e.g., extractor/keboola.ex-aws-s3/7241111/my-aws-s3-data-source)
    • rows - array of configuration rows (if the component supports rows)
      • id - ID of the row
      • path - path to the row from the configuration directory (e.g., rows/cities)

Naming

Names of the directories of different configuration types are subject to the rules defined in the manifest under the naming section. These are the default values:

{
    "branch": "{branch_name}",
    "config": "{component_type}/{component_id}/{config_name}",
    "configRow": "rows/{config_row_name}",
    "schedulerConfig": "schedules/{config_name}",
    "sharedCodeConfig": "_shared/{target_component_id}",
    "sharedCodeConfigRow": "codes/{config_row_name}",
    "variablesConfig": "variables",
    "variablesValuesRow": "values/{config_row_name}"
  }

If you want to include object IDs in directory names, use these values:

{
    "branch": "{branch_id}-{branch_name}",
    "config": "{component_type}/{component_id}/{config_id}-{config_name}",
    "configRow": "rows/{config_row_id}-{config_row_name}",
    "schedulerConfig": "schedules/{config_name}",
    "sharedCodeConfig": "_shared/{target_component_id}",
    "sharedCodeConfigRow": "codes/{config_row_name}",
    "variablesConfig": "variables",
    "variablesValuesRow": "values/{config_row_name}"
  }

You can change them according to your wishes and let the project directory be rebuilt using the fix-paths command.

Project Cache

The project cache is stored in the project file .keboola/project.json. Local commands use it because they don’t call authorized requests to the Storage API.

This is its basic structure:

  • backends - list of the project backends
  • features - list of the project features
  • defaultBranchId - ID of the default branch
{
  "backends": [
    "snowflake"
  ],
  "features": [
    "workspace-snowflake-dynamic-backend-size",
    "input-mapping-read-only-storage",
    "syrup-jobs-limit-10",
    "oauth-v3"
  ],
  "defaultBranchId": 123
}

Next Steps