If new to Generic Extractor, learn about jobs in our tutorial first. Use Parameter Map to help you navigate among various configuration options.
Child jobs allow you to iterate/traverse over sub-resources of an API resource. Because child jobs may contain other child jobs, you may query for sub-sub-resources in a virtually unlimited depth.
For instance, when downloading a list of users, you can download details of each user or a list of orders for each user. See the Generic Extractor tutorial for a basic example of using child jobs.
Apart from two additional fields, placeholders
and recursionFilter
, configuring a child job is no different than
configuring any other job.
In a child job, the endpoint
configuration must contain a placeholder enclosed in curly braces {}
.
For example, the following endpoint defines the placeholder user-id:
The placeholder name is rather arbitrary (it should not contain any special characters though). To assign it
a value, use the placeholders
configuration. It is an object whose properties are placeholder names. The value
of each placeholders
object property is a property path in the parent job response.
The placeholder in the child endpoint
will be replaced by the value of that parent property. The property
path is configured relative to the extracted object (see an example). The child
endpoint
is configured relative to the api.baseUrl
configuration,
not relative to the parent endpoint.
The following configuration:
means that Generic Extractor sends as many requests to the /user/XXX
endpoint as there
are result objects in the parent API response. The XXX
will be replaced by the userId
value
of each individual response. Placeholders must be used in child jobs so that each child job sends a different API request.
Note: It is technically possible to define a child job without using placeholders
configuration
or without having a placeholder in the endpoint
. But then all the child requests would be the same and
that is usually not what you intend to do.
Optionally, the placeholder name may be prefixed by a nesting level. Nesting allows you to
refer to properties in other objects than the direct parent. The level is written as the
placeholder name prefix, delimited by a colon :
. For example, 2:user-id
.
The default level is 1, meaning that the placeholder user-id
is equivalent to 1:user-id
and
that the property path will be searched in the direct parent of the child job. The level
is counted from the child ‘upwards’. Therefore a placeholder 2:user-id
means that
the property path will be searched in the parent of the child parent (two levels up).
See the corresponding examples.
The configuration option recursionFilter
allows you to skip some child jobs. This can be
useful in these cases:
The responseFilter
configuration contains a string expression with a filter condition composed of the following:
<
, >
, <=
, >=
, ==
(equal), !=
(not equal), ~~
(like), !~
(unlike)|
(or), &
(and); optionally, they may be used to join multiple conditions.An example response filter may be type!=employee
or product.value>150
. To test for an empty value (null
, false
, ""
, 0
) do not use any value – e.g. type!=
filter matches an empty value.
Important: The expression is whitespace sensitive. Therefore type != employee
filters the "type "
property
to not contain the value " employee"
(which is probably not what you intended to do). String comparisons are always
case sensitive.
This section contains a number of examples using child jobs.
Let’s say that you have an API with two endpoints:
/users/
— Returns a list of users./user/?
— Returns user details with a given user ID.The users
endpoint returns a response like this:
The user/123
endpoint returns a response like this:
Now use the following configuration, retrieving the user list and user details for each user:
The jobs
section defines a single job for the users
resource. This job has child jobs for
the users/{user-id}
resource. The user-id
placeholder in the endpoint URL is
replaced by the value of the id
property of each user in the parent job response. This means that
Generic Extractor makes three API calls:
users
users/123
users/234
The dataField
is set to a dot to retrieve the
entire response as a single object. Running Generic Extractor produces the following tables:
users:
id | name |
---|---|
123 | John Doe |
234 | Jane Doe |
user__user-id:
id | name | address_city | address_country | address_street | parent_id |
---|---|---|---|---|---|
123 | John Doe | London | UK | Whitehaven Mansions | 123 |
234 | Jane Doe | St Mary Mead | UK | High Street | 234 |
Notice that the table representing child resources contains all the responses merged into a single table; the usual merging rules apply.
Also notice that a new column — parent_id
was added, containing the placeholder value used
to retrieve the resource. The parent_id
column is not always named parent_id
.
Its name is created by joining the parent_
prefix to the placeholder path.
To create a friendly name for the table, it is good to use the dataType property (see the next example). The auto-generated name is rather ugly.
See example [EX021].
To avoid automatic table names, it is advisable to always use the dataType
property for
child jobs:
In the above configuration, dataType
is set to user-detail
, hence you will obtain the
following tables:
users:
id | name |
---|---|
123 | John Doe |
234 | Jane Doe |
user-detail:
id | name | address_city | address_country | address_street | parent_id |
---|---|---|---|---|---|
123 | John Doe | London | UK | Whitehaven Mansions | 123 |
234 | Jane Doe | St Mary Mead | UK | High Street | 234 |
See example [EX022].
It is also possible that the main job returns objects which contain direct references to the children:
The following configuration is the same as in the previous example:
The child jobs will iterate both over the returned array of objects and
the array of each children
. Therefore the following tables will be extracted:
users:
id | name |
---|---|
123 | John Doe |
234 | Jane Doe |
user-child:
id | name | address_city | address_country | address_street | parent_children |
---|---|---|---|---|---|
a1 | John Doe | London | UK | Whitehaven Mansions | a1 |
a2 | Jane Doe | St Mary Mead | UK | High Street | a2 |
a3 | Jimmy Doe | Scaryville | Nowhere | Cemetery Lane | a3 |
See example [EX135].
If the placeholder value is nested within the response object, you can use dot notation to access child properties of the response object. For instance, if the parent response with a list of users returns a response similar to this:
you have to modify the placeholders
definition:
Setting the placeholder to "user-id": "user-info.id"
means that the user-id
placeholder
will be replaced by the value of the id
property inside the user-info
object in the parent response.
If you fail to set a correct path for the placeholder, you will receive an error:
`No value found for user-id in the parent result. (level: 1)`
When you set the correct path, you will get the following tables:
users:
name | user-info_id | user-info_active |
---|---|---|
John Doe | 123 | 1 |
Jane Doe | 234 |
user detail:
id | name | address_city | address_country | address_street | parent_user-info_id |
---|---|---|---|---|---|
123 | John Doe | London | UK | Whitehaven Mansions | 123 |
234 | Jane Doe | St Mary Mead | UK | High Street | 234 |
Notice that the parent reference column name is the concatenation of the parent
prefix and
user-info_id
placeholder path (with special characters replaced by the underscore _
).
See example [EX023].
The placeholder path is configured relative to the extracted object. Assume that the parent endpoint returns a complicated response like this:
The following job definition extracts the active-users
array together with the details for each user:
Notice that the placeholder path remains set to user-info.id
because it is relative to
the parent object, which itself is located at the path active-users.items
. This
may be confusing because the endpoint property in that child job is set relative to the
api.baseUrl
and not to the parent URL.
See example [EX024].
Because a new column is added to the table representing child properties, it is possible that you run into a naming conflict. That is, if the child response with user details looks like this:
and you use the following job configuration:
the output for the child job will contain the column parent_id
. At the same time, Generic Extractor will attempt
to create the column parent_id
with the placeholder value, overwriting the original column. That column will be lost.
See example [EX025].
By default, the placeholder value is taken from the object retrieved in the parent job. As long as the child jobs are nested only one level deep, there is no other option anyway. Let’s see what happens with a deeper nesting.
Assume that you have an API with the following endpoints:
/users/
— Returns a list of users./user/?
— Returns user details with given user ID./user/?/orders
— Returns a list of user orders./user/?/orders/?
— Returns order detail with given user and order ID.The users
endpoint returns a response like this:
The user/123
endpoint returns a response like this:
The user/123/orders
endpoint returns a response like this:
The user/123/order/1234
endpoint returns a response like this:
Then you can create a job configuration with three nested children to retrieve all the API resources:
The jobs
configuration retrieves all users from the users
API endpoint. The first child retrieves
details for each user (from user/?
endpoint) and stores them in the user-detail
table. The second
child retrieves each user orders (from user/?/orders
endpoint) and stores them in the orders
table.
Finally, the deepest nested child returns details of each order (for each user) from the
user/?/order/?
endpoint and stores them in the order-detail
table. Therefore the following four tables
will be produced:
users:
userId | name |
---|---|
123 | John Doe |
234 | Jane Doe |
user-detail:
userId | name | description | parent_userId |
---|---|---|---|
123 | John Doe | Good ol’ father John | 123 |
234 | Jane Doe | Good young mommy Jenny | 234 |
orders:
orderId | price | parent_userId |
---|---|---|
1234 | $12 | 123 |
1345 | $1212 | 123 |
2345 | $42 | 234 |
order-detail:
orderId | price | timestamp | state | parent_userId | parent_orderId |
---|---|---|---|---|---|
1234 | $12 | 2017-05-06 8:21:45 | cancelled | 123 | 1234 |
1345 | $1212 | 2017-12-24 12:30:53 | delivered | 123 | 1345 |
2345 | $42 | 2017-01-12 2:12:43 | cancelled | 234 | 2345 |
Notice that each table contains additional columns with the placeholder property path prefixed with parent_
.
See example [EX026].
Because the required user and order IDs are present in multiple requests (in the list and in the detail), there are multiple ways how the jobs may be configured. For example, the following configuration produces the exact same result as the above configuration:
Even though the above configuration is less explicit and not really recommended, it is still acceptable.
Placeholders are defined globally, which means that the second nested child job to user/{user-id}/orders
does
not define any because it relies on those defined by its parent job (which happen to be correct). Also the
deepest child defines only the order-id
placeholder because, again, the user-id
placeholder was defined in
some of its parents.
Although the placeholders are defined globally, the ones defined in child jobs override the ones in the parent
jobs. For example, in the following (probably very incorrect) configuration, the 1:user-id
placeholder in
the deepest child will really contain the orderId
value.
See example [EX027].
Let’s look at how to retrieve more nested API resources:
The above configuration assumes that all API resources simply have an id
property (unlike in the
previous example, where the users had userId
and the orders had orderId
). This makes the configuration look
rather cryptic. Read the deepest child placeholder configuration
"5:user-id": "id",
"3:order-id": "id",
"1:item-id": "id"
as:
id
property from the response and put it in place of the user-id
in the endpoint URL.id
property from the response and put it in place of the order-id
in the endpoint URL.id
property from the response and put it in place of the item-id
in the endpoint URL.Important: Once you run into using placeholders with the same property path, their order becomes important.
This is because the property path is used as the name of an additional column in the extracted table. Because
the property path is id
in all cases, it will lead to the column parent_id
in all cases, and therefore it
will get overwritten. With the above configuration, the following item-detail
table will be produced:
id | code | name | parent_id |
---|---|---|---|
345 | PA10 | Pick Axe | 345 |
456 | TB20 | Tooth Brush | 456 |
where the parent_id
column refers to the 1:item-id
placeholder. If you used this placeholder configuration:
you would obtain the following item-detail
table:
id | code | name | parent_id |
---|---|---|---|
345 | PA10 | Pick Axe | 123 |
456 | TB20 | Tooth Brush | 123 |
where the parent_id
column refers the 5:user-id
placeholder.
See example [EX028].
Suppose now that the endpoint /users
returns a more complicated response:
The API also has an endpoint /user/{userId}
which returns details about a specific user. If
you want to obtain all the fields from the above response and also the details about each user,
you have to create a rather tricky configuration. Even though you may be tempted to start with
the following job configuration:
this is not possible because the root of the response in the members
field is not
an array and therefore it cannot create child jobs. For that reason the job configuration must start with:
The members.items
is an array which now can be used as a source for child jobs:
Notice that the placeholder path (user-info.id
) is entered relative to the dataField
setting
(members.items
). Now, to extract the other fields from the /users
response (other than member.items
),
create another job:
Note that the dataType
must be different than in the first job because the structure of the response is different.
You will receive a number of tables:
users (first job):
name | user-info_id | user-info_active |
---|---|---|
John Doe | 123 | 1 |
Jane Doe | 234 |
user-detail (first job children):
id | name | address_city | address_country | address_street | parent_user-info_id |
---|---|---|---|---|---|
123 | John Doe | London | UK | Whitehaven Mansions | 123 |
234 | Jane Doe | St Mary Mead | UK | High Street | 234 |
users-2 (second job):
members_description | members_tags | members_count | members_items |
---|---|---|---|
Active System Members | users-2.members_c6eb0647a7f2fb2cbe02ba62d56e3312 | 2 | users-2.members_c6eb0647a7f2fb2cbe02ba62d56e3312 |
users-2_members_items (second job, generated from array node items
):
name | user-info_id | user-info_active | JSON_parentId |
---|---|---|---|
John Doe | 123 | 1 | users-2.members_c6eb0647a7f2fb2cbe02ba62d56e3312 |
Jane Doe | 234 | users-2.members_c6eb0647a7f2fb2cbe02ba62d56e3312 |
The users-2\_members\_items
contains the same results as the users
table, but it also contains the
JSON\_parentId
column which allows you to link the user list to the list description in the users-2
table.
This makes the response in the users
table quite useless, but the job is still required to generate
the child jobs to obtain the user-detail
table. See example [EX106].
Let’s assume that you have an API which has two endpoints:
users
— Returns a list of users.users/?
— Returns a user detail.The users
endpoint returns a response like this:
The user/123
endpoint returns a response like this:
A simple child filter can be then set up using the following jobs
configuration:
The recursionFilter
setting will cause Generic Extractor to query only the sub-resources for which the
filter evaluates to true. The filter property name type
refers to the parent response, but it
filters only the children. So, the following tables will be returned:
users:
id | name | role | type |
---|---|---|---|
123 | John Doe | parent | admin |
234 | Jane Doe | parent | administrator |
345 | Jimmy Doe | child | user |
456 | Janet Doe | child | user |
user-detail:
id | name | userRole | userType | description | parent_id |
---|---|---|---|---|---|
123 | John Doe | parent | admin | Father John | 123 |
234 | Jane Doe | parent | administrator | Mother Jane | 234 |
You can see from the above tables that the filter is applied to the child results only so that the details are retrieved only for the desired users.
See example [EX029].
Apart from the standard comparison operators, the recursive filter allows to use
a like comparison operator ~
. It expects that the value contains a placeholder %
,
which matches any number of characters. The following configuration:
filters out all child resources not containing the string min
in their parent type property.
The expression %min%
matches any string which contains any number of characters (including none)
before and after the string min
. The operator !~
is negative like, therefore the
following user-detail
table will be extracted:
id | name | userRole | userType | description | parent_id |
---|---|---|---|---|---|
345 | Jimmy Doe | child | user | Sonny Jimmy | 345 |
456 | Janet Doe | child | user | Missy Jennie | 456 |
See example [EX030].
Multiple filters can be combined using the
logical &
(and) and |
(or) operators.
For example, the following configuration retrieves details for users who have
both id < 400
and role = child
:
The following user-detail
will be produced:
id | name | userRole | userType | description | parent_id |
---|---|---|---|---|---|
345 | Jimmy Doe | child | user | Sonny Jimmy | 345 |
See example [EX031].
Although you can join a multiple filter expression with logical operators as in the above example, there is no support for parentheses. The following configuration combines multiple filters:
The precedence of logical operators is defined so that the first operator occurring in the
expression takes precedence over the second. That is to say that the condition role=parent|id>300&id<400
is interpreted as role=parent|(id>300&id<400)
because the operator |
takes precedence
over the &
operator. The condition id>300&id<400|role==parent
is interpreted as
id>300&(id<400|role==parent)
because the &
operator takes precedence over the |
operator.
With the above configuration, the following user-detail
table will be produced:
id | name | userRole | userType | description | parent_id |
---|---|---|---|---|---|
123 | John Doe | parent | admin | Father John | 123 |
234 | Jane Doe | parent | administrator | Mother Jane | 234 |
345 | Jimmy Doe | child | user | Sonny Jimmy | 345 |
Because the described system of operator precedence may lead to rather unusual behaviour, we recommend that you keep the recursive filter simple.
See example [EX032].