Home
Extending Keboola
Generic Extractor
Configuration
Generic Extractor Configuration
To configure your first Generic Extractor, follow our tutorial .
To get an overall idea of what to expect when configuring Generic Extractor, take a look at the following overview of various configuration sections.
Then go through a sample configuration featuring all configuration options and their
nesting. The configuration map is also available as a separate article .
Configuration Sections
Click on the section names if you want to learn more.
parameters
api — sets the basic properties of the API.
baseUrl — defines the URL to which the
API requests should be sent.
caCertificate — defines custom certificate authority bundle in crt
/pem
format.
#clientCertificate — defines client certificate and private key in crt
/pem
format.
pagination — breaks a result with a
large number of items into separate pages.
authentication — needs to be
configured for any API which is not public.
retryConfig — automatically,
and repeatedly, retries failed HTTP requests.
http — sets the timeouts, default
headers and parameters sent with each API call.
aws
signature — defines AWS credentials for signature request
config — describes the actual extraction.
debug — shows all HTTP requests sent by
Generic Extractor.
outputBucket — defines the name
of a Storage Bucket in which the extracted tables will be stored.
http — sets the HTTP headers sent with
every request.
jobs — describes the API endpoints
(resources) to be extracted.
mappings — describes how the JSON
response is converted into CSV files that will be imported into Storage.
incrementalOutput — loads the extracted data into
Storage incrementally.
userData — adds arbitrary data to
extracted records.
sshProxy — securely access HTTP(s) endpoints inside your private Network.
iterations — executes a configuration multiple times, each time
with different values.
authorization — allows injecting OAuth authentication.
There are also simple pre-defined functions available, adding extra
flexibility when needed.
Generic Extractor can be run from within the Keboola user interface (only
configuration JSON needed), or locally
(Docker needed).
Configuration Map
The following sample configuration shows various configuration options and their nesting.
You can use the map to navigate between them. The parameter map is also available
separately and we recommend pinning it to your toolbar for quick reference.
{
"parameters" : {
"api" : {
"baseUrl" : "https://example.com/v3.0/" ,
"caCertificate" : "-----BEGIN CERTIFICATE----- \n MIIFaz...." ,
"pagination" : {
"method" : "multiple" ,
"scrollers" : {
"offset_scroll" : {
"method" : "offset" ,
"offsetParam" : "offset" ,
"limitParam" : "count"
}
}
},
"authentication" : {
"type" : "basic"
},
"retryConfig" : {
"maxRetries" : 3
},
"http" : {
"headers" : {
"Accept" : "application/json"
},
"defaultOptions" : {
"params" : {
"company" : 123
}
},
"requiredHeaders" : [ "X-AppKey" ],
"ignoreErrors" : [ 405 ],
"connectTimeout" : 30 ,
"requestTimeout" : 300
}
},
"aws" : {
"signature" : {
"credentials" : {
"accessKeyId" : "testAccessKey" ,
"#secretKey" : "testSecretKey" ,
"serviceName" : "testService" ,
"regionName" : "testRegion"
}
}
},
"config" : {
"debug" : true ,
"username" : "dummy" ,
"#password" : "secret" ,
"outputBucket" : "ge-tutorial" ,
"incrementalOutput" : true ,
"compatLevel" : 2 ,
"http" : {
"headers" : {
"X-AppKey" : "ThisIsSecret"
}
},
"jobs" : [
{
"endpoint" : "users" ,
"method" : "get" ,
"dataField" : "items" ,
"dataType" : "users" ,
"params" : {
"type" : {
"attr" : "userType"
}
},
"responseFilter" : "additional.address/details" ,
"responseFilterDelimiter" : "/" ,
"scroller" : "offset_scroll" ,
"children" : [
{
"endpoint" : "users/{user_id}/orders" ,
"dataField" : "items" ,
"recursionFilter" : "id>20" ,
"placeholders" : {
"user_id" : "id"
}
}
]
}
],
"mappings" : {
"content" : {
"parent_id" : {
"type" : "user" ,
"mapping" : {
"destination" : "campaign_id" ,
"primaryKey" : true
}
},
"name" : {
"type" : "column" ,
"mapping" : {
"destination" : "text"
}
},
"address" : {
"type" : "table" ,
"destination" : "addresses" ,
"tableMapping" : {
"street" : {
"type" : "column" ,
"mapping" : {
"destination" : "streetName"
}
}
}
},
"created.date" : {
"delimiter" : "/" ,
"type" : "column" ,
"mapping" : {
"destination" : "createdDate"
}
}
}
},
"userData" : {
"tag" : "development"
}
},
"iterations" : [
{
"userType" : "active"
},
{
"userType" : "inactive"
}
],
"sshProxy" : {
"host" : "proxy.example.com" ,
"user" : "proxy" ,
"port" : 22 ,
"#privateKey" : "-----BEGIN RSA PRIVATE KEY----- \n ... \n -----END RSA PRIVATE KEY-----"
}
},
"authorization" : {
"oauth_api" : {
"credentials" : {
"#data" : "{ \" status \" : \" ok \" , \" refresh_token \" : \" 1234abcd5678efgh \" }" ,
"appKey" : "someId" ,
"#appSecret" : "clientSecret"
}
}
}
}