Generic Extractor Configuration

To configure your first Generic Extractor, follow our tutorial.

To get an overall idea of what to expect when configuring Generic Extractor, take a look at the following overview of various configuration sections.

Then go through a sample configuration featuring all configuration options and their nesting. The configuration map is also available as a separate article.

Configuration Sections

Click on the section names if you want to learn more.

  • parameters
    • api — sets the basic properties of the API.
      • baseUrl — defines the URL to which the API requests should be sent.
      • pagination — breaks a result with a large number of items into separate pages.
      • authentication — needs to be configured for any API which is not public.
      • retryConfig — automatically, and repeatedly, retries failed HTTP requests.
      • http — sets the default headers and parameters sent with each API call.
    • config — describes the actual extraction.
      • debug — shows all HTTP requests sent by Generic Extractor.
      • outputBucket — defines the name of a Storage Bucket in which the extracted tables will be stored.
      • http — sets the HTTP headers sent with every request.
      • jobs — describes the API endpoints (resources) to be extracted.
      • mappings — describes how the JSON response is converted into CSV files that will be imported into Storage.
      • incrementalOutput — loads the extracted data into Storage incrementally.
      • userData — adds arbitrary data to extracted records.
  • iterations — executes a configuration multiple times, each time with different values.
  • authorization — allows injecting OAuth authentication.

There are also simple pre-defined functions available, adding extra flexibility when needed.

Generic Extractor can be run from within the KBC user interface (only configuration JSON needed), or locally (Docker needed).

Configuration Map

The following sample configuration shows various configuration options and their nesting. You can use the map to navigate between them. The parameter map is also available separately and we recommend pinning it to your toolbar for quick reference.

{
    "parameters": {
        "api": {
            "baseUrl": "https://example.com/v3.0/",
            "pagination": {
                "method": "multiple",
                "scrollers": {
                    "offset_scroll": {
                        "method": "offset",
                        "offsetParam": "offset",
                        "limitParam": "count"
                    }
                }
            },
            "authentication": {
                "type": "basic"
            },
            "retryConfig": {
                "maxRetries": 3
            },
            "http": {
                "headers": {
                    "Accept": "application/json"
                },
                "defaultOptions": {
                    "params": {
                        "company": 123
                    }
                },
                "requiredHeaders": ["X-AppKey"]
            }
        },
        "config": {
            "debug": true,
            "username": "dummy",
            "#password": "secret",
            "outputBucket": "ge-tutorial",
            "incrementalOutput": true,
            "compatLevel": 2,
            "http": {
                "headers": {
                    "X-AppKey": "ThisIsSecret"
                }
            },
            "jobs": [
                {
                    "endpoint": "users",
                    "method": "get",
                    "dataField": "items",
                    "dataType": "users",
                    "params": {
                        "type": {
                            "attr": "userType"
                        }
                    },
                    "responseFilter": "additional.address/details",
                    "responseFilterDelimiter": "/",
                    "scroller": "offset_scroll",
                    "children": [
                        {
                            "endpoint": "users/{user_id}/orders",
                            "dataField": "items",
                            "recursionFilter": "id>20",
                            "placeholders": {
                                "user_id": "id"
                            }
                        }
                    ]
                }
            ],
            "mappings": {
                "content": {
                    "parent_id": {
                        "type": "user",
                        "mapping": {
                            "destination": "campaign_id",
                            "primaryKey": true
                        }
                    },
                    "name": {
                        "type": "column",
                        "mapping": {
                            "destination": "text"
                        }
                    },
                    "address": {
                        "type": "table",
                        "destination": "addresses",
                        "tableMapping": {
                            "street": {
                                "type": "column",
                                "mapping": {
                                    "destination": "streetName"
                                }
                            }
                        }
                    }
                }
            },
            "userData": {
                "tag": "development"
            }
        }
    },
    "iterations": [
        {
            "userType": "active"
        },
        {
            "userType": "inactive"
        }
    ],
    "authorization": {
        "oauth_api": {
            "credentials": {
                "#data": "{\"status\": \"ok\",\"refresh_token\": \"1234abcd5678efgh\"}",
                "appKey": "someId",
                "#appSecret": "clientSecret"
            }
        }
    }
}