Skip to content

jaysnm/dremio-arrow

Repository files navigation

Dremio SQL Lakehouse Arrow Flight Client

Python3.9 Python3.10 Python3.11

Arrow Flight is a high-speed, distributed protocol designed to handle big data, providing increase in throughput between client applications and Dremio. This Dremio Arrow Flight Client is based on python official examples.

Disclaimer: This project is not affliated to dremio in any way. It is a tool that I developed while at CIFOR-ICRAF and now we have decided to open source it for wider community use. While I may not have enough time to actively maintain it, the tool is stable enough to sustain future use cases. Besides, community contribution is warmly welcome in form of PRs and forks.

Flight Basics

The Arrow Flight libraries provide a development framework for implementing a service that can send and receive data streams. A Flight server supports several basic kinds of requests:

  • Handshake: a simple request to determine whether the client is authorized and, in some cases, to establish an implementation-defined session token to use for future requests
  • ListFlights: return a list of available data streams
  • GetSchema: return the schema for a data stream
  • GetFlightInfo: return an “access plan” for a dataset of interest, possibly requiring consuming multiple data streams. This request can accept custom serialized commands containing, for example, your specific application parameters.
  • DoGet: send a data stream to a client
  • DoPut: receive a data stream from a client
  • DoAction: perform an implementation-specific action and return any results, i.e. a generalized function call
  • ListActions: return a list of available action types

More details can be found here

Illustration

Installation

Please see installation notes here