Manifest Files Specification

A manifest file contains additional information about tables and files injected to the /data/in folders. It also provides a way to specify options for tables and files transferred back to Storage from /data/out folders. Manifest files have the .manifest suffix to the original file.

All files in /data/in have the manifest file mandatory and injected. For files generated by your code in /data/out, the manifest file is optional. Also, keep in mind that all manifests have a lower priority than input and output mapping.

Format

The format of the manifest file is always JSON. The manifest file always has the .manifest extension. This applies to files with multiple extensions as well, so the following filenames are expected:

Data File Name Manifest File Name
myfile myfile.manifest
myfile.csv myfile.csv.manifest
myfile.csv.gz myfile.csv.gz.manifest

Examples

Tables

/data/in/tables manifests

An input table manifest stores metadata about a downloaded table. For example, a table with the ID in.c-docker-demo.data will be downloaded into `/in/tables/in.c-docker-demo.data.csv’ (unless stated otherwise in the input mapping] and a manifest file ‘/in/tables/in.c-docker-demo.data.csv.manifest’ will be created with the following contents:

{
    "id": "in.c-docker-demo.data",
    "uri": "https://connection.keboola.com//v2/storage/tables/in.c-docker-demo.data",
    "name": "data",
    "primary_key": [],
    "indexed_columns": [],
    "created": "2015-01-25T01:35:14+0100",
    "last_change_date": "2015-01-25T01:35:14+0100",
    "last_import_date": "2015-01-25T01:35:14+0100",
    "rows_count": 2,
    "data_size_bytes": 32768,
    "is_alias": false,
    "columns": [
        "id",
        "name",
        "text"
    ],
    "attributes": [],
    "metadata": [
        {
            "id": "228956",
            "key": "KBC.createdBy.component.id",
            "value": "keboola.python-transformation",
            "provider": "system",
            "timestamp": "2017-05-26 00:39:07"
        }
    ],
    "column_metadata": {
        "id": [],
        "name": [],
        "text": []
    }
}

The name node refers to the name of the component configuration. Note that the data_size_bytes and rows_count values are estimated by the database server and they may be significantly off (especially right after the table is created). The metadata and column_metadata fields contains Metadata for the table and its columns. The attributes node contains additional table attributes. If used, it has the following structure:

{
    ...
    "attributes": [
        {
            "name": "attributeName",
            "value": "attributeValue",
            "protected": false
        }
    ]
}

/data/out/tables manifests

An output table manifest sets options for transferring a table to Storage. The following examples list available manifest fields; all of them are optional. The destination field overrides the table name generated from the file name; it can (and commonly is) overridden by the end-user configuration.

{
    "destination": "out.c-main.Leads",
    "columns": ["column1", "column2", "column3"],
    "incremental": true,
    "primary_key": ["column1", "column2"],
    "delimiter": "\t",
    "enclosure": "\"",
    "metadata": ...,
    "column_metadata": ...
}

Additionally, the following options can be specified:

{
    ...
    "delete_where_column": "column name",
    "delete_where_values": ["value1", "value2"],
    "delete_where_operator": "eq"
}

The options will cause the specified rows to be deleted from the source table before the new table is imported. See an example.

The metadata and column_metadata fields allow you to set Metadata for the table and its columns. The metadata field corresponds to the Table Metadata API call. The column_metadata field corresponds to the Column Metadata API call. In both cases, the key and value are passed directly to the API; the provider value is filled by the Id of the running component (e.g., keboola.ex-db-snowflake).

{
    ...,
    "metadata": [
        {
            "key": "an.arbitrary.key",
            "value": "Some value"
        },
        {
            "key": "another.arbitrary.key",
            "value": "A different value"
        }
    ],
    "column_metadata": {
        "column1": [
            {
                "key": "yet.another.key",
                "value": "Some other value"
            }
        ]
    }
}

Files

/data/in/files manifests

An input file manifest stores metadata about a downloaded file.

{
    "id": 75807657,
    "created": "2015-01-14T00:47:00+0100",
    "is_public": false,
    "is_sliced": false,
    "is_encrypted": true,
    "name": "fooBar.jpg",
    "size_bytes": 563416,
    "tags": [
        "tag1",
        "tag2"
    ],
    "max_age_days": 180
}

/data/out/files manifests

An output file manifest sets options for transferring a file to Storage. The following example lists available manifest fields; all of them are optional.

{
    "is_public": true,
    "is_permanent": true,
    "is_encrypted": true,
    "notify": false,
    "tags": [
        "image",
        "pie-chart"
    ]
}

These parameters can be used (taken from Storage API File Import):

  • If is_permanent is false, the file will be automatically deleted after 180 days.
  • When notify is true, the members of the project will be notified that a file has been uploaded to the project.

S3 Section

When using Amazon S3 for data exchange, the manifest files will contain an additional s3 section with credentials for downloading the actual file data.

{
    "id": "in.c-docker-demo.data",
    ...
    "s3": {
        "isSliced": true,
        "region": "us-east-1",
        "bucket": "kbc-sapi-files",
        "key": "exp-30/1581/table-exports/in/c-docker-test/test/243100072.csv.gzmanifest",
        "credentials": {
            "access_key_id": "ASI...CDQ",
            "secret_access_key": "tCE..I+T",
            "session_token": "Ago...POP"
        }
    }
}

If the file is sliced and you need to merge it into a single file, read through the guide to working with sliced files.

Note: Exchanging data via S3 is currently available only for input mapping.