/data/out/tables manifests with Native Types

Native Types provide a structured way for components to define their handling of data types, referred to as “Native Types.”

The level of type handling is specified by the dataTypeSupport property, which can take one of three values:

  • Authoritative: The component reliably enforces specific data types.
  • Hints: The component provides type suggestions that may not always be reliable.
  • None: Represents the legacy state with no explicit type handling.

This design overcomes limitations in current settings, where all components automatically produce typed tables when a project switches to Native Types. For instance, Data Sources that output unreliable type hints (e.g., an int column containing values like N/A) can now explicitly signal their limitations, reducing downstream issues.

An output table manifest sets options for transferring a table to Storage. The following examples list available manifest fields; all of them are optional. The destination field overrides the table name generated from the file name; it can (and commonly is) overridden by the end-user configuration.

{
    "destination": "out.c-main.Leads",
    "incremental": true,
    "delimiter": "\t",
    "enclosure": "\"",
    "manifest_type": "output",
    "has_header": true,
    "description": "Best table",
    "table_metadata": ...
    "schema": ...
}

The table_metadata fields allow you to set Metadata for the table. The table_metadata field corresponds to the Table Metadata API call. The key and value of the object are passed directly to the API; the provider value is filled by the Id of the running component (e.g., keboola.ex-db-snowflake).

{
    ...
    "table_metadata": {
        "something else": "a value"
    }
}

Additionally, the following options will cause the specified rows to be deleted from the source table before the new table is imported. See an example. Using this option makes sense only with incremental loads.

{
    ...
    "delete_where_column": "column name",
    "delete_where_values": ["value1", "value2"],
    "delete_where_operator": "eq"
}

The schema [optional] field allow you to create a table with Native Data Types columns. Each object in the schema array represents one column:

  • The name [required] field specifies the column name.
  • The data_type [optional] field defines the data type for different storage backends, referred to as “Native Types”.
    • The base [required] type is always required and can have values specified in the Base Types documentation.
    • Other types like Snowflake and BigQuery are optional and allows you to specify settings for a particular database backend.
  • The nullable [optional] field indicates if the column can be null.
  • The primary_key [optional] field specifies if the column is a primary key.
  • The description [optional] field provides a description of the column.
  • The metadata [optional] field allows setting additional metadata for the column.
{
    "schema": [
        {
            "name": "id",
            "data_type": {
                "base": {
                    "type": "INTEGER",
                    "length": "11",
                    "default": "123"
                },
                "snowflake": {
                    "type": "GEOMETRY",
                    "length": "123,123,4455",
                    "default": "POINT(1 1)"
                },
                "bigquery": {
                    "type": "VARCHAR",
                    "length": "123",
                    "default": null
                }
            },
            "nullable": false,
            "primary_key": true,
            "description": "Optional description of the column",
            "metadata": {
                "KBC.someColumnMetadata": "value1"
                "KBC.someOther": "value2"
            }
        }
    ]
}

Base Types

Source data types are mapped to a destination using a base type. The current base types are STRING, INTEGER, NUMERIC, FLOAT, BOOLEAN, DATE, and TIMESTAMP. This means that, for example, a MySQL extractor may store the value BIGINT as a type of column; that type maps to the INTEGER base type. When the Snowflake writer consumes this value, it will read the base type INTEGER and choose a corresponding type for Snowflake, which happens to be also INTEGER. This ensures high interoperability between the components. Please take a look at the conversion table below.

View the extracted data types in the storage tables detail:

Screenshot - View Column Data Type

You can also override the data type:

Screenshot - Set Column Data Type

When you use the table (e.g., in the Snowflake writer), you’ll see the data type you have configured:

Screenshot - Set Column Data Type

The data is converted only when writing or copying (e.g., to a transformation or a writer). That means that you can extract an integer column, mark it as a timestamp in storage and write it as an integer into a target database (though you’ll be offered to write it as a timestamp).

You access both the source and base data type through the corresponding API.

Nullable Conversion

Nullable conversion, which transforms an empty string originating from data into a null value, refers to the process where a textual value consisting solely of an empty string "" is replaced with the value null.

Data Type Conversions

As described above, the source data type is converted to a base data type stored in metadata storage. The base type is then converted to the target data type. The following tables show mappings for each base type. The mapping causes possible information loss (e.g., assigning SMALLINT to INTEGER). To minimize this, we also keep track of the data type size and transfer that if possible. For example, a SMALLINT column would be stored as base type INTEGER with size 2. If the target database supports integer sizes, you will be offered to set the type in the target database as INTEGER(2).

STRING

Base type STRING represents any textual type; both CHARACTER VARYING (or VARCHAR) and TEXT types are included. Also, the string base type is used for any other unrecognized type on input. It means that the source type column is not an exhaustive list in the following table. It’s a list of suitable string types converted to a string. All other unknown types are converted to a string as well.

Source Source Type Base Type Target Type Target
Generic char STRING N/A
character varying
text
varchar
STRING Hive
STRING Impala
TEXT MS SQL Server
MySQL CHAR VARCHAR MySQL
TEXT
VARCHAR
VARCHAR2 Oracle
VARCHAR PostgreSQL
Redshift BPCHAR VARCHAR Redshift
CHAR
CHARACTER
CHARACTER VARYING
NCHAR
NVARCHAR
TEXT
VARCHAR
VARCHAR SiSense
Snowflake BINARY VARCHAR Snowflake
CHAR
CHARACTER
STRING
TEXT
VARBINARY
VARCHAR
Synapse BINARY NVARCHAR Synapse
CHAR
NCHAR
NVARCHAR
VARBINARY
VARCHAR
VARCHAR Thoughtspot
Source Source Type Base Type Target Type Target

INTEGER

The INTEGER base type represents data types for whole numbers.

Source Source Type Base Type Target Type Target
Generic bigint INTEGER
bigserial
mediumint
smallint
int
int2
int4
int64
int8
integer
serial8
tinyint
INT Hive
INT Impala
BIGINT MS SQL
MySQL BIGINT INTEGER MySQL
INT
INTEGER
MEDIUMINT
SMALLINT
TINYINT
N/A Oracle
INTEGER Postgres
Redshift BIGINT INTEGER Redshift
INT
INT2
INT4
INT8
INTEGER
SMALLINT
BIGINT SiSense
Snowflake BIGINT INTEGER Snowflake
BYTEINT
INT
INTEGER
SMALLINT
TINYINT
Synapse BIGINT INT Synapse
INT
SMALLINT
TINYINT
INT Thoughtspot
Source Source Type Base Type Target Type Target

NUMERIC

The NUMERIC base type represents fixed-point fractional numbers (real, numeric or decimal data types).

Source Source Type Base Type Target Type Target
Generic dec NUMERIC
decimal
fixed
money
number
numeric
smallmoney
DECIMAL Hive
DECIMAL Impala
DECIMAL MS SQL Server
MySQL DEC NUMERIC MySQL
DECIMAL
FIXED
NUMERIC
NUMBER Oracle
NUMERIC PostgreSQL
Redshift DECIMAL NUMERIC Redshift
NUMERIC
DECIMAL SiSense
Snowflake DECIMAL NUMBER Snowflake
NUMBER
NUMERIC
Synapse NUMERIC NUMERIC Synapse
DECIMAL
N/A Thoughtspot
Source Source Type Base Type Target Type Target

FLOAT

The FLOAT base type represents floating-point fractional numbers (float or double data types).

Source Source Type Base Type Target Type Target
Generic binary_double FLOAT
binary_float
double
double precision
d_float
float
float4
float8
quad
real
FLOAT Hive
FLOAT Impala
FLOAT MS SQL Server
MySQL DOUBLE FLOAT MySQL
DOUBLE PRECISION
FLOAT
REAL
N/A Oracle
REAL PostgreSQL
Redshift DOUBLE PRECISION FLOAT Redshift
FLOAT
FLOAT4
FLOAT8
REAL
FLOAT SiSense
Snowflake FLOAT DOUBLE Snowflake
DOUBLE PRECISION
FLOAT
FLOAT4
FLOAT8
REAL
Synapse FLOAT FLOAT Synapse
REAL
FLOAT Thoughtspot
Source Source Type Base Type Target Type Target

BOOLEAN

The BOOLEAN base type represents a true or false value.

Source Source Type Base Type Target Type Target
Generic bool BOOLEAN
boolean
BOOLEAN Hive
BOOLEAN Impala
BIT MS SQL Server
N/A MySQL
N/A Oracle
BOOLEAN PostgreSQL
Redshift BOOL BOOLEAN Redshift
BOOLEAN
BIT SiSense
Snowflake BOOLEAN BOOLEAN Snowflake
Synapse BIT BIT Synapse
BOOL Thoughtspot
Source Source Type Base Type Target Type Target

DATE

The DATE base type represents a date value without a time portion.

Source Source Type Base Type Target Type Target
Generic date DATE DATE
N/A Hive
N/A Impala
DATE MS SQL Server
MySQL DATE DATE MySQL
DATE Oracle
DATE PostgreSQL
Redshift DATE DATE Redshift
DATE SiSense
Snowflake DATE DATE Snowflake
Synapse DATE DATE Synapse
DATE Thoughtspot
Source Source Type Base Type Target Type Target

TIMESTAMP

The TIMESTAMP base type represents a date value with a time portion.

Source Source Type Base Type Target Type Target
Generic datetime TIMESTAMP
datetime2
datetimeoffset
smalldatetime
timestamp
timestamptz
timestamp_LTZ
timestamp_NTZ
TIMESTAMP_TZ
timestamp with local time zone
timestamp with time zone
timestamp without time zone
TIMESTAMP Hive
TIMESTAMP Impala
DATETIME2 MS SQL Server
MySQL DATETIME TIMESTAMP MySQL
TIMESTAMP
TIMESTAMP Oracle
TIMESTAMP PostgreSQL
Redshift TIMESTAMP TIMESTAMP Redshift
TIMESTAMPTZ
TIMESTAMP WITH TIME ZONE
TIMESTAMP WITHOUT TIME ZONE
N/A SiSense
Snowflake DATETIME TIMESTAMP Snowflake
TIMESTAMP
TIMESTAMP_NTZ
TIMESTAMP_LTZ
TIMESTAMP_TZ
Synapse DATETIMEOFFSET DATETIMEOFFSET Synapse
DATETIME
DATETIME2
SMALLDATETIME
TIME
TIMESTAMP Thoughtspot
Source Source Type Base Type Target Type Target