Warning The MATLAB interface is under active development and should be considered experimental.
This is a very early stage MATLAB interface to the Apache Arrow C++ libraries.
Currently, the MATLAB interface supports:
- Converting between a subset of Arrow
Array
types and MATLAB array types (see table below) - Converting between MATLAB
table
s andarrow.tabular.RecordBatch
s - Creating Arrow
Field
s,Schema
s, andType
s - Reading and writing Feather V1 files
Supported arrow.array.Array
types are included in the table below.
NOTE: All Arrow Array
classes listed below are part of the arrow.array
package (e.g. arrow.array.Float64Array
).
MATLAB Array Type | Arrow Array Type |
---|---|
uint8 |
UInt8Array |
uint16 |
UInt16Array |
uint32 |
UInt32Array |
uint64 |
UInt64Array |
int8 |
Int8Array |
int16 |
Int16Array |
int32 |
Int32Array |
int64 |
Int64Array |
single |
Float32Array |
double |
Float64Array |
logical |
BooleanArray |
string |
StringArray |
datetime |
TimestampArray |
duration |
Time32Array |
duration |
Time64Array |
To build the MATLAB Interface to Apache Arrow from source, the following software must be installed on the target machine:
- MATLAB
- CMake
- C++ compiler which supports C++17 (e.g.
gcc
on Linux,Xcode
on macOS, orVisual Studio
on Windows) - Git
To set up a local working copy of the source code, start by cloning the apache/arrow
GitHub repository using Git:
$ git clone https://github.com/apache/arrow.git
After cloning, change the working directory to the matlab
subdirectory:
$ cd arrow/matlab
To build the MATLAB interface, use CMake:
$ cmake -S . -B build
$ cmake --build build --config Release
To install the MATLAB interface to the default software installation location for the target machine (e.g. /usr/local
on Linux or C:\Program Files
on Windows), pass the --target install
flag to CMake.
$ cmake --build build --config Release --target install
As part of the install step, the installation directory is added to the MATLAB Search Path.
Note: This step may fail if the current user is lacking necessary filesystem permissions. If the install step fails, the installation directory can be manually added to the MATLAB Search Path using the addpath
command.
To run the MATLAB tests, start MATLAB in the arrow/matlab
directory and call the runtests
command on the test
directory with IncludeSubFolders=true
:
>> runtests("test", IncludeSubFolders=true);
Included below are some example code snippets that illustrate how to use the MATLAB interface.
>> matlabArray = double([1, 2, 3])
matlabArray =
1 2 3
>> arrowArray = arrow.array(matlabArray)
arrowArray =
[
1,
2,
3
]
>> arrowArray = arrow.array([true, false, true])
arrowArray =
[
true,
false,
true
]
>> matlabArray = toMATLAB(arrowArray)
matlabArray =
3×1 logical array
1
0
1
>> matlabArray = int8([122, -1, 456, -10, 789])
matlabArray =
1×5 int8 row vector
122 -1 127 -10 127
% Treat all negative array elements as Null
>> validElements = matlabArray > 0
validElements =
1×5 logical array
1 0 1 0 1
% Specify which values are Null/Valid by supplying a logical validity "mask"
>> arrowArray = arrow.array(matlabArray, Valid=validElements)
arrowArray =
[
122,
null,
127,
null,
127
]
>> matlabTable = table(["A"; "B"; "C"], [1; 2; 3], [true; false; true])
matlabTable =
3x3 table
Var1 Var2 Var3
____ ____ _____
"A" 1 true
"B" 2 false
"C" 3 true
>> arrowRecordBatch = arrow.recordBatch(matlabTable)
arrowRecordBatch =
Var1: [
"A",
"B",
"C"
]
Var2: [
1,
2,
3
]
Var3: [
true,
false,
true
]
>> arrowRecordBatch
arrowRecordBatch =
Var1: [
"A",
"B",
"C"
]
Var2: [
1,
2,
3
]
Var3: [
true,
false,
true
]
>> matlabTable = table(arrowRecordBatch)
matlabTable =
3x3 table
Var1 Var2 Var3
____ ____ _____
"A" 1 true
"B" 2 false
"C" 3 true
>> stringArray = arrow.array(["A", "B", "C"])
stringArray =
[
"A",
"B",
"C"
]
>> timestampArray = arrow.array([datetime(1997, 01, 01), datetime(1998, 01, 01), datetime(1999, 01, 01)])
timestampArray =
[
1997-01-01 00:00:00.000000,
1998-01-01 00:00:00.000000,
1999-01-01 00:00:00.000000
]
>> booleanArray = arrow.array([true, false, true])
booleanArray =
[
true,
false,
true
]
>> arrowRecordBatch = arrow.tabular.RecordBatch.fromArrays(stringArray, timestampArray, booleanArray)
arrowRecordBatch =
Column1: [
"A",
"B",
"C"
]
Column2: [
1997-01-01 00:00:00.000000,
1998-01-01 00:00:00.000000,
1999-01-01 00:00:00.000000
]
Column3: [
true,
false,
true
]
>> arrowRecordBatch = arrow.tabular.RecordBatch.fromArrays(stringArray, timestampArray, booleanArray)
arrowRecordBatch =
Column1: [
"A",
"B",
"C"
]
Column2: [
1997-01-01 00:00:00.000000,
1998-01-01 00:00:00.000000,
1999-01-01 00:00:00.000000
]
Column3: [
true,
false,
true
]
>> timestampArray = arrowRecordBatch.column(2)
timestampArray =
[
1997-01-01 00:00:00.000000,
1998-01-01 00:00:00.000000,
1999-01-01 00:00:00.000000
]
>> type = arrow.int8()
type =
Int8Type with properties:
ID: Int8
>> type = arrow.timestamp(TimeUnit="Second", TimeZone="Asia/Kolkata")
type =
TimestampType with properties:
ID: Timestamp
TimeUnit: Second
TimeZone: "Asia/Kolkata"
>> type.ID
ans =
ID enumeration
Timestamp
>> type = arrow.string()
type =
StringType with properties:
ID: String
>> type.ID
ans =
ID enumeration
String
>> field = arrow.field("Number", arrow.int8())
field =
Number: int8
>> field.Name
ans =
"Number"
>> field.Type
ans =
Int8Type with properties:
ID: Int8
>> field = arrow.field("Letter", arrow.string())
field =
Letter: string
>> field.Name
ans =
"Letter"
>> field.Type
ans =
StringType with properties:
ID: String
>> arrowSchema
arrowSchema =
Letter: string
Number: double
% Specify the field to extract by its index (i.e. 2)
>> field = arrowSchema.field(2)
field =
Number: double
>> arrowSchema
arrowSchema =
Letter: string
Number: double
% Specify the field to extract by its name (i.e. "Letter")
>> field = arrowSchema.field("Letter")
field =
Letter: string
>> letter = arrow.field("Letter", arrow.string())
letter =
Letter: string
>> number = arrow.field("Number", arrow.int8())
number =
Number: int8
>> schema = arrow.schema([letter, number])
schema =
Letter: string
Number: int8
>> matlabTable = table(["A"; "B"; "C"], [1; 2; 3], VariableNames=["Letter", "Number"])
matlabTable =
3x2 table
Letter Number
______ ______
"A" 1
"B" 2
"C" 3
>> arrowRecordBatch = arrow.recordBatch(matlabTable)
arrowRecordBatch =
Letter: [
"A",
"B",
"C"
]
Number: [
1,
2,
3
]
>> arrowSchema = arrowRecordBatch.Schema
arrowSchema =
Letter: string
Number: double
>> t = table(["A"; "B"; "C"], [1; 2; 3], [true; false; true])
t =
3×3 table
Var1 Var2 Var3
____ ____ _____
"A" 1 true
"B" 2 false
"C" 3 true
>> filename = "table.feather";
>> featherwrite(filename, t)
>> filename = "table.feather";
>> t = featherread(filename)
t =
3×3 table
Var1 Var2 Var3
____ ____ _____
"A" 1 true
"B" 2 false
"C" 3 true