sparrow
C++20 idiomatic APIs for the Apache Arrow Columnar Format
Introduction
sparrow is an implementation of the Apache Arrow Columnar format in C++. It provides array structures with idiomatic APIs and convenient conversions from and to the C interface.
sparrow requires a modern C++ compiler supporting C++20.
Installation
Package managers
We provide a package for the mamba (or conda) package manager:
mamba install -c conda-forge sparrow
Install from sources
sparrow has a few dependencies that you can install in a mamba environment:
mamba env create -f environment-dev.yml mamba activate sparrow
You can then create a build directory, and build the project and install it with cmake:
mkdir build cd build cmake .. \ -DCMAKE_BUILD_TYPE=Debug \ -DCMAKE_INSTALL_PREFIX= $CONDA_PREFIX \ -DBUILD_EXAMPLES=ON \ -DBUILD_TESTS=ON \ -BUILD_DOCS=ON \ .. make install
Usage
Requirements
Compilers:
Clang 18 or higher
GCC 11.2 or higher
Apple Clang 16 or higher
MSVC 19.41 or higher
Initialize data with sparrow and extract C data structures
# include " sparrow/sparrow.hpp " namespace sp = sparrow; sp::primitive_array< int > ar = { 1 , 3 , 5 , 7 , 9 }; auto [arrow_array, arrow_schema] = sp::extract_arrow_structures(std::move(ar)); // Use arrow_array and arrow_schema as you need (serialization, passing it to // a third party library) // ... // You are responsible for releasing the structure in the end arrow_array.release(&arrow_array); arrow_schema.release(&arrow_schema);
Initialize data with sparrow and use C data structures
# include " sparrow/sparrow.hpp " namespace sp = sparrow; sp::primitive_array< int > ar = { 1 , 3 , 5 , 7 , 9 }; // Caution: get_arrow_structures returns pointers, not values auto [arrow_array, arrow_schema] = sp::get_arrow_structures(ar); // Use arrow_array and arrow_schema as you need (serialization, passing it to // a third party library) // ... // do NOT release the C structures in the end, the "ar" variable will do it for you
Read data from somewhere and pass it to sparrow
# include " sparrow/sparrow.hpp " # include " thrid-party-lib.hpp " namespace sp = sparrow; namespace tpl = third_party_library; ArrowArray array; ArrowSchema schema; tpl::read_arrow_structures (&array, &schema); sp::array ar (&array, &schema); // Use ar as you need // ... // You are responsible for releasing the structure in the end array.release(&array); schema.release(&schema);
Read data from somewhere and move it into sparrow
# include " sparrow/sparrow.hpp " # include " thrid-party-lib.hpp " namespace sp = sparrow; namespace tpl = third_party_library; ArrowArray array; ArrowSchema schema; tpl::read_arrow_structures (&array, &schema); sp::array ar (std::move(array), std::move(schema)); // Use ar as you need // ... // do NOT release the C structures in the end, the "ar" variable will do it for you
Documentation
The documentation (currently being written) can be found at https://man-group.github.io/sparrow/index.html
Acknowledgements
This development has been funded as part of a collaboration between ArcticDB, Bloomberg, and QuantStack.
License
This software is licensed under the Apache License 2.0. See the LICENSE file for details.