If new to Generic Extractor, learn about jobs in our tutorial first. Use Parameter Map to help you navigate among various configuration options.
The jobs section of the extraction configuration contains descriptions of the API resources to be
extracted. The jobs configuration property is an array of processed API endpoints. A
single job represents a single API resource.
{
...,
"config": {
"jobs": [
{
"endpoint": "users",
"method": "get",
"dataField": "items",
"dataType": "users",
"params": {
"type": "active"
},
"responseFilter": "additional.address/details",
"responseFilterDelimiter": "/",
"children": [
{
"endpoint": "users/{user_id}/orders",
"dataField": "items",
"recursionFilter": "id>20",
"placeholders": {
"user_id": "id"
}
}
]
}
]
}
}
Generic Extractor reads and processes the responses from the API endpoints in a pretty complex
way. Each response is processed in the following steps:
dataField property or default rules.The first two steps are the responsibility of Jobs resulting in an array of objects. Generic Extractor then tries to find a common super-set of properties of all objects, for example, with the following response:
[
{
"id": 123,
"name": "foo",
"color": "green"
},
{
"id": 321,
"name": "bar",
"size": "large"
}
]The super-set of object properties consists of id, name, color and size. In the Generic Extractor
configuration, this is referred to as dataType. If the dataType configuration is not set, a
name is automatically generated. Merging the object structure requires that the objects are compatible.
The responses are merged into type-less tables. This means that values 42 and apples are perfectly compatible
because they get converted to a string. Also, the scalar and array values are compatible because the
scalar is upgraded to an array. The following are incompatible combinations:
For example, this would not be allowed:
[
{
"id": 123,
"name": "foo",
"color": "green"
},
{
"id": 321,
"name": "bar",
"color": {
"items": ["red", "blue"]
}
}
]If you want to process the above response, use the
responseFilter setting.
The endpoint property is required and represents the URL of the resource. It can be either of the following:
baseURL property of the API definitionbaseURL property of the API definitionAssume the following API definition:
"api": {
"baseURL": "https://example.com/3.0/"
}The relative endpoint must not start with a slash; so, with
endpoint set to campaign, the final resource URL would be
https://example.com/3.0/campaign.
The absolute endpoint must start with a slash. So, with /endpoint
set to campaign, the final resource URL would be https://example.com/campaign.
This means that the path part specified in the baseURL is ignored and fully
replaced by the value specified in endpoint.
The full absolute URL must start with a protocol. So, with the endpoint set to
https://eu.example.com/campaign, this would be the final resource URL
and the path specified in the baseURL is completely ignored.
The following table summarizes possible outcomes:
baseURL |
endpoint |
actual URL |
|---|---|---|
https://example.com/3.0/ |
campaign |
https://example.com/3.0/campaign |
https://example.com/3.0/ |
campaign/ |
https://example.com/3.0/campaign/ |
https://example.com/3.0/ |
/1.0/campaign |
https://example.com/1.0/campaign |
https://example.com/3.0/ |
https://eu.example.com/3.0/ |
https://eu.example.com/3.0/campaign |
https://example.com/ |
campaign |
https://example.com/campaign |
https://example.com |
campaign |
https://example.comcampaign |
https://example.com/ |
https://elpmaxe.com/endpoint |
https://elpmaxe.com/endpoint |
It is highly recommended to use the relative URL fragments. This means that the
baseURL property of the api section must end with a slash.
Use the other two options for handling exceptions in the API extraction (for instance, falling back to an older API version). Note that using a different domain (or even a base path) may interfere with the authentication — depending on the specification of the target API.
Also, closely follow the target API specification regarding trailing slashes. For some APIs,
both https://example.com/3.0/campaign and https://example.com/3.0/campaign/ URLs may
be accepted and valid. For other APIs, however, only one version may be supported.
The params section defines request parameters. They
may be optional or required, depending on the target API specification. The params section is
an object with arbitrary properties (or, more precisely, parameters understood by the target
API). It is also allowed to use function calls.
Assume that api.baseUrl is set to https://example.com/3.0/, jobs[].endpoint
is set to mock-api and that the param parameters are set as follows:
"params": {
"startDate": "2016-01-20",
"types": ["new", "active", "finished"],
"filter": {
"query": "q=user:johnDoe",
"tags": {
"first": true,
"second": false
}
}
}See our examples.
The method parameter defines the HTTP request method.
The following are the allowed values:
GET (default)POSTFORMThe HTTP method encodes the parameters in the URL. Therefore the above params definition gets transformed
in the following URL:
https://example.com/3.0/mock-api?startDate=2016-01-20&types%5B0%5D=new&types%5B1%5D=active&types%5B2%5D=finished&filter%5Bquery%5D=q%3Duser%3AjohnDoe&filter%5Btags%5D%5Bfirst%5D=1&filter%5Btags%5D%5Bsecond%5D=0
or, in a more readable URLDecoded form:
https://example.com/3.0/mock-api?/mock-server/web/users/12/orders/2/tickets/000/comments?startDate=2016-01-20&types[0]=new&types[1]=active&types[2]=finished&filter[query]=q=user:johnDoe&filter[tags][first]=1&filter[tags][second]=0
The HTTP POST method sends the parameters in the request body. They are sent as a JSON object in the same form
as entered in the configuration. For the above defined params property, the request body would be:
{
"startDate": "2016-01-20",
"types": ["new", "active", "finished"],
"filter": {
"query": "q=user:johnDoe",
"tags": {
"first": true,
"second": false
}
}
}With the POST method, also pagination parameters are sent in the request body, and not as URL query parameters. If your API requires pagination parameters in the query, use the GET method or Custom Python component instead.
This method does not support function evaluation when param is a list of objects.
Also, the Content-Type: application/json HTTP header is added to the request.
See our examples.
The FORM method type sends the request the same way the HTTP POST method does. However,
the parameters from the param object are encoded as form data, mimicking the request being sent by
a web form. This method does not support nested objects in the param object.
For example, the following params field:
"params": {
"startDate": "2016-01-20",
"types": ["new", "active", "finished"]
}will be sent as the following POST request body:
startDate=2016-01-20&types%5B0%5D=new&types%5B1%5D=active&types%5B2%5D=finished
or, in a more readable URLDecoded form:
startDate=2016-01-20&types[0]=new&types[1]=active&types[2]=finished
Also, the Content-Type: application/x-www-form-urlencoded HTTP header will be added to the request.
The dataType parameter assigns a name to the object(s) obtained from the endpoint.
Setting it is optional. If not set, a name will be generated automatically from the endpoint
value and parent jobs.
Data types are used in mappings and for naming output tables within their output buckets.
Note that you can use the same dataType for multiple resources, provided that the result objects may
be merged into a single one. This can be used,
for example, in a situation where two API endpoints return the same resource:
"jobs": [
{
"endpoint": "solved-tickets/",
"dataType": "tickets"
},
{
"endpoint": "unsolved-tickets/",
"dataType": "tickets"
}
]In the above case, only a single tickets table will be produced in the output bucket. It
will contain records from both API endpoints.
The dataField parameter is used to determine what part of the API response will be
extracted. The following rules apply by default:
dataField is configured.Apart from cases where required, the dataField configuration may also be set to override the
above default behaviour. The dataField parameter contains a
dot separated path to the response property you want to
extract. The dataField parameter may be written in two ways — either as a simple string or
as an object with the path property. For instance, these two configurations are equivalent:
"jobs": [
{
"endpoint": "solved-tickets/",
"dataField": "tickets"
}
] "jobs": [
{
"endpoint": "solved-tickets/",
"dataField": {
"path": "tickets"
}
}
]The path to the response property is by default expected to be dot separated. That is — a path
members.active refers to the property active nested inside the property members. If you need to refer to a
property containing a dot, you have to change the data field path delimiter to some other character. This can be
done using the delimiter property:
"jobs": [
{
"endpoint": "solved-tickets/",
"dataField": {
"path": "members.active",
"delimiter": "|"
}
}
]The above configuration refers to the property named members.active. To refer to the property items nested
inside the property members.active you have to use:
"jobs": [
{
"endpoint": "solved-tickets/",
"dataField": {
"path": "members.active|items",
"delimiter": "|"
}
}
]The delimiter character is completely arbitrary but must be something that is not used in the property names in the response.
See example [EX120].
The responseFilter option allows you to skip parts of the API response from processing. This can
be useful in these cases:
The value of the responseFilter property is either a path to a property in the response, or
an array of such paths. The path is dot-separated unless set otherwise in the responseFilterDelimiter configuration.
If you want to refer to the items of an array, use [] — see an example below.
The same result can be achieved using forceType parameter in
column mapping.
The children configuration allows you to retrieve sub-resources of the processes API resource.
These child jobs (nested jobs) are executed for each object retrieved from the
parent response. The definition of child jobs is the same as the definition of parent jobs,
except for placeholders. The children configuration is described in a
separate article.
The scroller parameter assigns a predefined scroller when
multiple pagination is used,
and is pointless when the multiple pagination method is not used.
If scroller is not set, the pagination method specified in the api configuration
is used. If there is no pagination method specified, the job has no pagination.
The following examples show how simple objects are extracted from different objects.
To extract data from the following API response:
[
{
"id": 123,
"name": "John Doe",
"married": true
},
{
"id": 234,
"name": "Jane Doe",
"married": false
}
]do not set the dataField parameter at all, or set it to an empty string ("dataField": "").
The following table will be extracted:
| id | name | married |
|---|---|---|
| 123 | John Doe | 1 |
| 234 | Jane Doe |
Notice that the boolean value married is converted
to 1 when true and left empty otherwise (false and null).
See example [EX001].
To extract data from the following API response:
{
"users": [
{
"id": 123,
"name": "John Doe"
},
{
"id": 234,
"name": "Jane Doe"
}
]
}do not set the dataField parameter at all, or set it to an empty string or to the value users
("dataField": "" or "dataField": "users").
The following table will be extracted:
| id | name |
|---|---|
| 123 | John Doe |
| 234 | Jane Doe |
See example [EX002].
To extract data from the following API response:
{
"users": [
{
"id": 123,
"name": "John Doe"
},
{
"id": 234,
"name": "Jane Doe"
}
],
"userTypes": [
"member",
"guest"
]
}set the dataField parameter to the value users ("dataField": "users"). Not setting the
dataField parameter would result in an error
(More than one array found in the response! Use the 'dataField' parameter to specify a key to the data array.).
The following table will be extracted:
| id | name |
|---|---|
| 123 | John Doe |
| 234 | Jane Doe |
See example [EX003].
To extract data from the following API response:
{
"members": {
"active": [
{
"id": 123,
"name": "John Doe"
},
{
"id": 234,
"name": "Jane Doe"
}
],
"inactive": [
{
"id": 345,
"name": "Jimmy Doe"
}
]
}
}set the dataField parameter to the value members.active ("dataField": "members.active"). Not setting the
dataField parameter would result in a warning (No data array found in the response!).
The following table will be extracted:
| id | name |
|---|---|
| 123 | John Doe |
| 234 | Jane Doe |
See example [EX004].
To extract both active and inactive arrays from the above API response, use two jobs:
{
"members": {
"active": [
{
"id": 123,
"name": "John Doe"
},
{
"id": 234,
"name": "Jane Doe"
}
],
"inactive": [
{
"id": 345,
"name": "Jimmy Doe"
}
]
}
}In the first job, set the dataField parameter to the value members.active. In the second job, set
the dataField parameter to the value members.inactive. The entire jobs section will look like this:
"jobs": [
{
"endpoint": "users-5",
"dataField": "members.active"
},
{
"endpoint": "users-5",
"dataField": "members.inactive"
}
]The following table will be extracted:
| id | name |
|---|---|
| 123 | John Doe |
| 234 | Jane Doe |
| 345 | Jimmy Doe |
See example [EX005].
You may encounter an API response like this:
{
"id": 123,
"name": "John Doe"
}You have to set the dataField parameter to the value . ("dataField": "."). Not setting the
dataField parameter would result in a warning (No data array found in the response!) and no data extracted.
The following table will be extracted:
| id | name |
|---|---|
| 123 | John Doe |
See example [EX006].
You may encounter an API response like this:
{
"user": {
"id": 123,
"name": "John Doe"
}
}Set the dataField parameter to the value user ("dataField": "user"). Not setting the
dataField parameter would result in a warning (No data array found in the response!) and no data extracted.
The following table will be extracted:
| id | name |
|---|---|
| 123 | John Doe |
See example [EX007].
You may encounter an API response like this:
{
"member": {
"history": [
{
"id": 123,
"name": "John Doe",
"version": 2
},
{
"id": 123,
"name": "Jonh Doe",
"version": 1
}
]
}
}To extract the first item from the history array, set the dataField parameter to the value member.history.0.
The following table will be extracted:
| id | name | version |
|---|---|---|
| 123 | John Doe | 2 |
See example [EX008].
You may encounter an API response like this:
{
"members": [
{
"type": "active",
"items": [
{
"id": 123,
"name": "John Doe"
},
{
"id": 234,
"name": "Jane Doe"
}
]
},
{
"type": "inactive",
"items": [
{
"id": 345,
"name": "Jimmy Doe"
}
]
}
]
}To extract the items from the members array, set the dataField parameter to the value members.0.items.
The following table will be extracted:
| id | name |
|---|---|
| 123 | John Doe |
| 234 | Jane Doe |
See example [EX009].
The above examples show how simple objects are extracted from different objects. Generic Extractor can also extract objects with non-scalar properties. The default JSON to CSV mapping flattens nested objects and produces secondary tables from nested arrays.
You may encounter an API response like this:
{
"members": [
{
"id": 123,
"name": "John Doe",
"tags": ["active", "admin"]
},
{
"id": 234,
"name": "Jane Doe",
"tags": ["active"]
}
]
}To extract the members array, set the dataField parameter to the value members or to an empty value.
The following tables will be extracted:
Users:
| id | name | tags |
|---|---|---|
| 123 | John Doe | users-10_3ca896f39b257a4f2d2f4784e7680c87 |
| 234 | Jane Doe | users-10_a15f4be71e739e1b2ea32bd4209d756e |
Tags:
| data | JSON_parentId |
|---|---|
| active | users-10_3ca896f39b257a4f2d2f4784e7680c87 |
| admin | users-10_3ca896f39b257a4f2d2f4784e7680c87 |
| active | users-10_a15f4be71e739e1b2ea32bd4209d756e |
Each member contains a nested array of tags that cannot be serialized into a single
database (CSV) column. Therefore the [JSON-CSV mapper] creates another table for the
tags with tag values. It also generates a unique member identifier, puts it
in the tags column and uses it in a new JSON_parentId column. This
way, the 1:N relationship between Members and Tags is represented.
See example [EX010].
If the response contains an array nested in an array, Generic extractor is not able to process it. In such case the, contents of the array are extracted as a JSON encoded string. See example [EX130]. You will also see a warning in the extraction events, e.g.:
Converting nested array 'rows.[]' to JSON string.
You may encounter the following API response:
{
"members": [
{
"id": 123,
"name": "John Doe",
"tags": "active"
},
{
"id": 234,
"name": "Jane Doe",
"tags": ["active", "admin"]
}
]
}When you extract the members array (set the dataField parameter to the value members or to an empty value),
the following tables will be extracted:
Users:
| id | name | tags |
|---|---|---|
| 123 | John Doe | users-17_c6f3e32262682b6efd6c85ad97d2d503 |
| 234 | Jane Doe | users-17_92df9d5b9af8821316172285b196318e |
Tags:
| data | JSON_parentId |
|---|---|
| active | users-17_c6f3e32262682b6efd6c85ad97d2d503 |
| active | users-17_92df9d5b9af8821316172285b196318e |
| admin | users-17_92df9d5b9af8821316172285b196318e |
As you can see, the scalar value tags in the first member object was automatically upgraded to
a single-element array because the tags property is an array elsewhere (second member) in the response.
See example [EX017].
You may encounter an API response like this:
{
"members": [
{
"id": 123,
"name": "John Doe",
"address": {
"street": "Elm Street",
"city": "New York"
}
},
{
"id": 234,
"name": "Jane Doe",
"address": {
"street": "Bates Street",
"city": "Chicago",
"state": "USA"
}
}
]
}To extract the members array, set the dataField parameter to the value members or to an empty value.
The following table will be extracted:
| id | name | address_street | address_city | address_state |
|---|---|---|---|---|
| 123 | John Doe | Elm Street | New York | |
| 234 | Jane Doe | Bates Street | Chicago | USA |
The properties of nested address objects are automatically flattened into the parent object. Therefore
the address.city property is flattened into the address_city column.
See example [EX011].
The above two examples show the basic principles of the JSON-CSV mapping used by Generic Extractor. They are applied to all child properties. So, when you encounter an API response like this:
{
"members": [
{
"id": 123,
"name": "John Doe",
"contacts": [
{
"type": "address",
"properties": {
"street": "Elm Street",
"city": "New York"
}
},
{
"type": "email",
"primary": true,
"properties": {
"address": "john.doe@example.com"
}
}
]
},
{
"id": 234,
"name": "Jane Doe",
"contacts": [
{
"type": "address",
"primary": false,
"properties": {
"street": "Bates Street",
"city": "Chicago",
"state": "USA"
}
},
{
"type": "phone",
"primary": true,
"properties": {
"number": "123 456 789"
}
}
]
}
]
}the following two tables will be extracted:
Users:
| id | name | contacts |
|---|---|---|
| 123 | John Doe | users-12_8505d6585e28c00d461ba64f085d1055 |
| 234 | Jane Doe | users-12_ec8c48efecb10334072f03a860113ea2 |
Contacts:
| type | properties_street | properties_city | properties_address | properties_state | properties_number | primary | JSON_parentId |
|---|---|---|---|---|---|---|---|
| address | Elm Street | New York | users-12_8505d6585e28c00d461ba64f085d1055 | ||||
| john.doe@example.com | 1 | users-12_8505d6585e28c00d461ba64f085d1055 | |||||
| address | Bates Street | Chicago | USA | users-12_ec8c48efecb10334072f03a860113ea2 | |||
| phone | 123 456 789 | 1 | users-12_ec8c48efecb10334072f03a860113ea2 |
The obtained table is rather sparse because the properties of the nested contacts
objects do not match exactly. For example, the properties_number column was created
as a result of flattening the properties.number object that is contained in the response
only once. Therefore the column has a single value.
The rows in the Contacts table are again linked through an
auto-generated key to the parent Users table. Also notice that the
Boolean value
primary is converted to 1 when true and left empty otherwise.
See example [EX012].
If you have an API response like this:
{
"members": [
{
"id": 123,
"name": "John Doe",
"tags": ["active", "admin"]
},
{
"id": 234,
"name": "Jane Doe",
"tags": ["active"]
}
]
}and extract the members array with the
default settings, two tables will be
produced. If you set the response filter to "responseFilter": "tags", then the tags property of the members
items will not be processed and will be stored as a serialized
JSON string. The following table will be extracted:
| id | name | tags |
|---|---|---|
| 123 | John Doe | [“active”,”admin”] |
| 234 | Jane Doe | [“active”] |
The tags column contains serialized JSON fragments which can be processed by
the JSON capable database (e.g., Snowflake).
See example [EX013].
If you have the following API response:
{
"members": [
{
"id": 123,
"name": "John Doe",
"contacts": [
{
"type": "address",
"properties": {
"street": "Elm Street",
"city": "New York"
}
},
{
"type": "email",
"primary": true,
"properties": {
"address": "john.doe@example.com"
}
}
]
},
{
"id": 234,
"name": "Jane Doe",
"contacts": [
{
"type": "address",
"primary": false,
"properties": {
"street": "Bates Street",
"city": "Chicago",
"state": "USA"
}
},
{
"type": "phone",
"primary": true,
"properties": {
"number": "123 456 789"
}
}
]
}
]
}and extract the members array with the
default settings,
two tables will be produced and the properties object will be flattened into a sparse table.
To avoid that, set the response filter to "responseFilter": "contacts[].properties". This will
leave the properties child of the contacts array of the members array unprocessed.
The following two tables will be produced:
Users:
| id | name | contacts |
|---|---|---|
| 123 | John Doe | users-12_0b9650e0f68b0c6738843d5b4ff0a961 |
| 234 | Jane Doe | users-12_cf76fb6794380244946d2bc4fa3aa04a |
Contacts:
| type | properties | primary | JSON_parentId |
|---|---|---|---|
| address | {“street”:”Elm Street”,”city”:”New York”} | users-12_0b9650e0f68b0c6738843d5b4ff0a961 | |
| {“address”:”john.doe@example.com”} | 1 | users-12_0b9650e0f68b0c6738843d5b4ff0a961 | |
| address | {“street”:”Bates Street”,”city”:”Chicago”,”state”:”USA”} | users-12_cf76fb6794380244946d2bc4fa3aa04a | |
| phone | {“number”:”123 456 789”} | 1 | users-12_cf76fb6794380244946d2bc4fa3aa04a |
The properties column contains JSON serialized objects. When setting the responseFilter parameter,
remember to use the correct path to the properties you wish to skip from processing. That is to say that
setting responseFilter to
contacts skips the entire contacts property and does not create the Contacts: table at all.properties does nothing because there is no properties property under the members array items.contacts.properties does nothing because there is no properties property under the contacts array.The last two options might seem inconsistent. This is because the responseFilter path is set relative to
the objects of the processed array (not to the array itself, not to the JSON root). Thus the only correct
setting in this case is contacts[].properties.
See example [EX014].
If you have an API response like this:
[
{
"id": 123,
"name": "foo",
"color": "green"
},
{
"id": 321,
"name": "bar",
"color": {
"items": ["red", "blue"]
}
}
]you will receive an error similar to Error parsing response JSON: Unhandled type change from "scalar" to "object" in 'users-16.color'. This means that the objects returned in the response are incompatible and cannot
be merged into a table by Generic Extractor.
To avoid the error and still retrieve the data,
use the responseFilter to skip the color property. When you set "responseFilter": "color", you
will obtain the following table:
| id | name | color |
|---|---|---|
| 123 | foo | “green” |
| 321 | bar | {“items”:[“red”,”blue”]} |
See example [EX016].
You might have a complex API response like this:
{
"members": [
{
"id": 123,
"name": "John Doe",
"tags": {
"items": ["active", "admin"]
},
"contacts": [
{
"type": "address",
"properties": {
"street": "Elm Street",
"city": "New York"
}
},
{
"type": "email",
"primary": true,
"properties": "john.doe@example.com"
}
]
},
{
"id": 234,
"name": "Jane Doe",
"tags": "none",
"contacts": [
{
"type": "address",
"primary": false,
"properties": {
"street": "Bates Street",
"city": "Chicago",
"state": "USA"
}
},
{
"type": "phone",
"primary": true,
"properties": "123 456 789"
}
]
}
]
}Because both tags and contacts.properties properties are inconsistent (sometimes using an object,
sometimes using a scalar value), you have to define multiple response filters. This can be done by using
an array of paths:
"responseFilter": [
"contacts[].properties",
"tags"
]Then you will obtain the following tables:
Users:
| id | name | tags | contacts |
|---|---|---|---|
| 123 | John Doe | {“items”:[“active”,”admin”]} | users-18_19318ac6aa76a92c8d90e603f69e02f6 |
| 234 | Jane Doe | “none” | users-18_3fdf6b12b11f85cb4eb9c34ce0322ecd |
Contacts:
| type | properties | primary | JSON_parentId |
|---|---|---|---|
| address | {“street”:”Elm Street”,”city”:”New York”} | users-18_19318ac6aa76a92c8d90e603f69e02f6 | |
| “john.doe@example.com” | 1 | users-18_19318ac6aa76a92c8d90e603f69e02f6 | |
| address | {“street”:”Bates Street”,”city”:”Chicago”,”state”:”USA”} | users-18_3fdf6b12b11f85cb4eb9c34ce0322ecd | |
| phone | “123 456 789” | 1 | users-18_3fdf6b12b11f85cb4eb9c34ce0322ecd |
See example [EX018].
The default delimiter used for referencing nested properties is a dot .. If the names of
properties in the API response contain dots, it might be necessary to change the default delimiter.
The API response might look like this:
{
"members": [
{
"id": 123,
"name": "John Doe",
"primary.address": {
"street": "Elm Street",
"city": "New York"
},
"secondary.address": {
"street": "Cemetery Ridge",
"city": "New York"
}
},
{
"id": 234,
"name": "Jane Doe",
"primary.address": {
"street": " Blossom Avenue",
"state": "U.K."
},
"secondary.address": {
"street": "1313 Webfoot Walk",
"city": "Duckburg",
"state": "Calisota"
}
}
]
}If you want to filter the secondary.address field, you cannot set the responseFilter setting to
secondary.address because it would be interpreted as an address property of the secondary property.
If you set "responseFilter": "secondary.address, the extraction will work as if you did not set the
filter at all; it will be filtering the non-existent address property.
For the filter to work correctly, set the responseFilterDelimiter to an arbitrary character not
used in the response property names. The following would be a valid configuration:
{
...
"responseFilter": "secondary.address",
"responseFilterDelimiter": "#"
}It might by tempting to change the response filter to secondary#address. However, this would be
incorrect as it would again mean that we are referring to an address property nested in the secondary
object. With the above settings you will obtain a table like this:
| id | name | primary_address_street | primary_address_city | primary_address_state | secondary_address |
|---|---|---|---|---|---|
| 123 | John Doe | Elm Street | New York | {“street”:”Cemetery Ridge”,”city”:”New York”} | |
| 234 | Jane Doe | Blossom Avenue | U.K. | {“street”:”1313 Webfoot Walk”,”city”:”Duckburg”,”state”:”Calisota”} |
See example [EX019].
For the custom set delimiter in the response filter, you need to have a complex API response. For example:
{
"members": [
{
"id": 123,
"name": "John Doe",
"primary.address": {
"street": "Elm Street",
"city": "New York",
"tags": []
},
"secondary.address": {
"street": "Cemetery Ridge",
"city": "New York",
"tags": ["work", "usaddress"]
}
},
{
"id": 234,
"name": "Jane Doe",
"primary.address": {
"street": " Blossom Avenue",
"state": "U.K.",
"tags": ["home"]
},
"secondary.address": {
"street": "1313 Webfoot Walk",
"city": "Duckburg",
"state": "Calisota"
}
}
]
}To filter out all the tags properties, you need to set the following:
{
"responseFilter": [
"secondary.address#tags",
"primary.address#tags"
],
"responseFilterDelimiter": "#"
}You will obtain a table similar to the one below:
| id | name | primary_address_street | primary_address_city | primary_address_tags | primary_address_state | secondary_address_street | secondary_address_city | secondary_address_tags | secondary_address_state |
| 123 | John Doe | Elm Street | New York | Cemetery Ridge | New York | [“work”,”usaddress”] | |||
| 234 | Jane Doe | Blossom Avenue | [“home”] | U.K. | 1313 Webfoot Walk | Duckburg | Calisota |
See example [EX020].
Assume that you have an API with the endpoint users which requires the
GET parameter type to specify which
users are to be retrieved. For example, a request to /users?type=active returns a response
with active users:
[
{
"id": 123,
"name": "John Doe",
"married": true
},
{
"id": 234,
"name": "Jane Doe",
"married": false
}
]To retrieve inactive users, send a request to /users?type=inactive. This
can be solved using the following jobs configuration:
"jobs": [
{
"endpoint": "users",
"params": {
"type": "active"
}
},
{
"endpoint": "users",
"params": {
"type": "inactive"
}
}
]The params configuration option specifies the
parameters to be sent to the API. Therefore the type property is the name defined by the API itself.
The above configuration produces the following table:
| id | name | married |
|---|---|---|
| 123 | John Doe | 1 |
| 234 | Jane Doe | |
| 345 | Jimmy Doe |
See example [EX033] or example [EX136] which is also using functions.
You may encounter an API which is not exactly RESTful
and has to be queried using the HTTP POST method.
Assume that you have an API with the endpoint getUsers that expects an empty HTTP POST request. The endpoint
then returns the following response:
[
{
"id": 123,
"name": "John Doe",
"married": true
},
{
"id": 234,
"name": "Jane Doe",
"married": false
}
]Generic Extractor can handle this too, using the method configuration:
"jobs": [
{
"endpoint": "getUsers",
"method": "POST"
}
]The above configuration produces the following table:
| id | name | married |
|---|---|---|
| 123 | John Doe | 1 |
| 234 | Jane Doe |
See example [EX034].
A not-exactly-RESTful API (see above) may require some JSON
parameters in the request. Let’s say you have the getUsers endpoint which requires an HTTP POST request with
the following body:
{
"filter": {
"type": "active"
},
"return": {
"fields": ["id", "name"]
}
}The request returns the following JSON:
[
{
"id": 123,
"name": "John Doe"
},
{
"id": 234,
"name": "Jane Doe"
}
]The above situation can be handled by passing the entire request JSON to the
params configuration.
"jobs": [
{
"endpoint": "getUsers",
"method": "POST",
"params": {
"filter": {
"type": "active"
},
"return": {
"fields": ["id", "name"]
}
}
}
]The above configuration produces the following table:
| id | name |
|---|---|
| 123 | John Doe |
| 234 | Jane Doe |
See example [EX035].
Sometimes even the HTTP GET requests require complex parameters. Suppose the API
endpoint /users requires the filter and return definitions. The API may describe
the configuration in many different ways, for instance:
| filter | name | example value |
|---|---|---|
| Name of property for filtering | field | type |
| Filtering operator | operator | equal |
| Value to use in filter | value | active |
| return | name | example value |
|---|---|---|
| Names of properties to return in response | fields | id,name |
In the HTTP protocol, this would be encoded in the following query string:
filter[field]=type&filter[operator]=equal&filter[value]=active&return[fields][0]=id&return[fields][1]=name
or, in the URL Encoded form:
filter%5Bfield%5D%3Dtype%26filter%5Boperator%5D%3Dequal%26filter%5Bvalue%5D%3Dactive%26return%5Bfields%5D%5B0%5D%3Did%26return%5Bfields%5D%5B1%5D%3Dname
The following JSON is returned:
[
{
"id": 123,
"name": "John Doe"
},
{
"id": 234,
"name": "Jane Doe"
}
]The above situation can be handled by encoding the parameters in a JSON into the
params configuration.
"jobs": [
{
"endpoint": "getUsers",
"method": "POST",
"params": {
"filter": {
"field": "type",
"operator": "equal",
"value": "active"
},
"return": {
"fields": ["id", "name"]
}
}
}
]The above configuration produces the following table:
| id | name |
|---|---|
| 123 | John Doe |
| 234 | Jane Doe |
See example [EX036].