Skip to content

takumakanari/embulk-input-http

Repository files navigation

Embulk::Input::Http

CircleCI

Input HTTP plugin for Embulk. Fetch data via HTTP.

Installation

Run this command with your embulk binary.

$ embulk gem install embulk-input-http

Usage

Specify in your config.yml file.

in:
  type: http
  url: http://express.heartrails.com/api/json
  params:
    - {name: method, value: getStations}
    - {name: x, value: 135.0}
    - {name: y, value: "{30..35}.0", expand: true}
  method: get
  • type: specify this plugin as http
  • url: base url something like api (required)
  • params: pair of name/value to specify query parameter (optional)
  • pager: configuration to parameterize paging (optional)
  • method: http method, get is used by default (optional)
  • user_agent: the user agent to specify request header (optional)
  • request_headers: the extra request headers as key-value (optional)
  • request_body: the request body content, enabled if method is post and params are empty (optional)
  • charset: charset to specify request header (optional, default: utf8)
  • basic_auth: username/password for basic authentication (optional)
  • open_timeout: timeout msec to open connection (optional, default: 2000)
  • read_timeout: timeout msec to read content via http (optional, default: 10000)
  • max_retries: max number of retry request if failed (optional, default: 5)
  • retry_interval: interval msec to retry max (optional, default: 10000)
  • request_interval: wait msec before each requests (optional, default: 0)
  • interval_includes_response_time: yes/no, if yes and you set request_interval, response time will be included in interval for next request (optional, default: no)
  • input_direct: If false, dumps content to temp file first, to avoid read timeout due to process large data while downloading from remote (optional, default: true)

Defining multiple requests in params

To defining multiple requests in params by using values or brace expansion with setting expand: true.

Simply using values array is as below:

params:
  - {name: id, values: [5, 4, 3, 2, 1]}
  - {name: name, values: ["John", "Paul", "George", "Ringo"], expand: true}

The values is also rewritable with brace expansion like as follows:

params:
  - {name: id, value "{5,4,3,2,1}", expand: true}
  - {name: name, value "{John,Paul,George,Ringo}", expand: true}

Basic authentication

The following is configuring username/password for the basic authentication.

basic_auth:
 user: MyUser
 password: MyPassword

Paginate by pager

Configure like as follows to easily paginate a request:

in:
  type: http
  url: http://express.heartrails.com/api/json
  pager: {from_param: from, to_param: to, start: 1, step: 1000, pages: 10}

Properties of pager is as below:

  • from_param: parameter name of 'from' index
  • to_param: parameter name of 'to' index (optional)
  • pages: total page size
  • start: first index number (optional, 0 is used by default)
  • step: size to increment (optional, 1 is used by default)

Examples of using pager

  1. Combination of from and to

    pager: {from_param: from, to_param: to, pages: 4, start: 1, step: 10}
    1. ?from=1&to=10
    2. ?from=11&to=20
    3. ?from=21&to=30
    4. ?from=31&to=40
  2. Increment page parameter

    params:
      - {name: size, value: 100}
    pager: {from_param: page, pages: 4, start: 1, step: 1}
    1. ?page=1&size=100
    2. ?page=2&size=100
    3. ?page=3&size=100
    4. ?page=4&size=100

Example

Fetch json via http api

in:
  type: http
  url: http://express.heartrails.com/api/json
  params:
    - {name: method, value: getStations}
    - {name: x, value: 135.0}
    - {name: y, value: "{35,34,33,32,31}.0", expand: true}
  request_headers: {X-Some-Key1: some-value1, X-Some-key2: some-value2}
  parser:
    type: json
    root: $.response.station
    schema:
      - {name: name, type: string}
      - {name: next, type: string}
      - {name: prev, type: string}
      - {name: distance, type: string}
      - {name: lat, type: double, path: x}
      - {name: lng, type: double, path: y}
      - {name: line, type: string}
      - {name: postal, type: string}

Fetch csv

in:
  type: http
  url: http://192.168.33.10:8085/sample.csv
    - {name: y, value: "{35,34,33,32,31}.0", expand: true}
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    escape: ''
    skip_header_lines: 1
    columns:
    - {name: id, type: long}
    - {name: account, type: long}
    - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'}
    - {name: purchase, type: timestamp, format: '%Y%m%d'}
    - {name: comment, type: string}

TODO

  • HTTP-proxy
  • Guess

Patch

Welcome!