Generic Extractor

Generic Extractor is a KBC component which acts like a customizable HTTP REST client. It can be configured to extract data from virtually any sane web API.

Due to the versatility of different APIs running in the wild, Generic Extractor offers a vast amount of configuration options. Even though it may seem somewhat abstract and hard to understand at first, once you configure your first extractor, you will see that it is a great tool. With it, you can build an entirely new extractor for KBC in less than an hour.

To get started quickly, follow our Generic Extractor tutorial.

Generic Extractor Requirements

Generic Extractor allows you to extract data from an API into KBC only by configuring it. No programming skills or additional tools are required. You just need to do two easy things before you start:

  • Learn how to write JSON.
  • Have the documentation of your chosen API at hand. The API should be RESTful and, more or less, follow the HTTP specification.

Configuration & Development

Again, if you are new to Generic Extractor, we strongly suggest you go through the Generic Extractor tutorial. It shows the basic principles, as well as the most important features.

If you intend to develop a more complicated configuration, check out how to run Generic Extractor locally. There are a number of examples accompanying the documentation which can be run locally too. In addition, several working configuration snippets, which have not made it to complete extractors yet, are available in our Wiki.

Registering Generic Extractor

Each configuration of Generic Extractor can be registered as a new standalone component. For the registration, configurations have to be converted to a template.

Registering your Generic Extractor configuration is not required. However, when registered, it can be easily used in multiple projects. A great advantage of using templates is that they do not limit the configuration at all. You can always switch to JSON free-form configuration when necessary.

Also note that templates can be used only with registered Generic Extractor configurations.

Generic Extractor Source

As with any other KBC components, the Generic Extractor source is available on GitHub. Apart from the main repository, it uses some vital libraries (which partially define its capabilities):

  • Juicer — component responsible for processing HTTP JSON responses
  • CSV Map — library which converts JSON data into CSV tables
  • Filter — library which allows to match values together
  • JSON Parser — JSON parser which produces CSV tables while maintaining relations