In the previous part of the tutorial, you extracted the content of a MailChimp campaign. Now it’s time to clean up the response.
This is the initial configuration:
It extracts MailChimp campaigns together with the
send-checklist items and campaign
However, there are some parts of the content resource you are probably not
interested in. Also,the table contains duplicates.
Technical note on duplicates: If you examine the job events, you will see
GET /3.0/campaigns/f7ed43aaea/content?count=1&offset=0 sent. That is to say, the
pagination applies to all API requests. Generic Extractor tries to page the
/content resource. This may ultimately lead to duplicates because the extraction of that
resource is terminated only after the resource returns the same response twice.
A mapping defines the shape of Generic Extractor outputs. It is stored
config.mappings property and is identified by the resource data type.
When a resource is assigned an internal
dataType, a mapping can be created
for it. To be able to use a mapping, first define a
dataType in the job property.
The value of the
dataType property is an arbitrary name. Apart from identifying
the resource type, it is also used as the output table name. If you run
the job, the content will be stored in
Each mapping item is identified by the property name of the resource and must contain
mapping.destination with the target column name in the output table. For example:
The above mapping setting defines that for the
content data type, the
plain_text will be stored in the table column
text. No other
properties of the content resource will be imported. In other words, the mapping defines
all columns of the output table.
To give an example, if you are interested in having the
html versions of the
campaign content, use a mapping like this:
Note that the
destination value is arbitrary, but it must be a valid column name.
The data type name (
content) must match the value of the
as defined in some of the jobs.
The above mapping works, but is missing the campaign id and you would not be able to
match the content to some campaign records. Therefore you need to extract the campaign id
from the context (i.e. from the job parameter). This can be done using a special
When the mapping
type is set to
user, use the special prefix
parent_ to refer to
placeholder defined in the job. You can create the following mapping:
The above configuration defines a mapping for the
content data type.
In the result table named
content, the column
campaign_id will be created.
Its content will be the value of the
parent_id minus the
parent_ prefix) in the respective job.
Apart from specifying what columns should be present in the output table, the mapping allows you to set a column to be part of a primary key. The entire configuration would then look like this:
Now, let’s review what parts are connected and how. Note that the values in blue have been chosen arbitrarily when the configuration was created:
Mapping lets you define precisely what the extraction output will look like; it also defines primary keys.
If you are doing a one time ad-hoc extraction, you may skip setting up the mapping and clean the extracted data later in Transformations. However, if you intend to use your configuration regularly, or want to make it into its own component, setting up a mapping is recommended.
The key of the mapping supports dot notation to traverse into children. So if the key contains a dot, you need to change the delimiter. See the following example:
As you changed the delimiter from the default
/, it’s no longer parsed as two separate keys
date, but rather just a single key