# pollux **Repository Path**: kumo-pub/pollux ## Basic Information - **Project Name**: pollux - **Description**: No description available - **Primary Language**: Unknown - **License**: AGPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-11-02 - **Last Updated**: 2025-05-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README Pollux logo Pollux is a composable execution engine distributed as an open source C++ library. It provides reusable, extensible, and high-performance data processing components that can be (re-)used to build data management systems focused on different analytical workloads, including batch, interactive, stream processing, and AI/ML. Pollux was created by Meta and it is currently developed in partnership with IBM/Ahana, Intel, Voltron Data, Microsoft, ByteDance and many other companies. In common usage scenarios, Pollux takes a fully optimized query plan as input and performs the described computation. Considering Pollux does not provide a SQL parser, a dataframe layer, or a query optimizer, it is usually not meant to be used directly by end-users; rather, it is mostly used by developers integrating and optimizing their compute engines. Pollux provides the following high-level components: * **Type**: a generic typing system that supports scalar, complex, and nested types, such as structs, maps, arrays, etc. * **Vector**: an [Arrow-compatible columnar memory layout module](https://facebookincubator.github.io/velox/develop/vectors.html), providing encodings such as Flat, Dictionary, Constant, and Sequence/RLE, in addition to a lazy materialization pattern and support for out-of-order writes. * **Expression Eval**: a [fully vectorized expression evaluation engine](https://facebookincubator.github.io/velox/develop/expression-evaluation.html) that allows expressions to be efficiently executed on top of Vector/Arrow encoded data. * **Functions**: sets of vectorized scalar, aggregates, and window functions implementations following the Presto and Spark semantic. * **Operators**: implementation of relational operators such as scans, writes, projections, filtering, grouping, ordering, shuffle/exchange, [hash, merge, and nested loop joins](https://facebookincubator.github.io/velox/develop/joins.html), unnest, and more. * **I/O**: a connector interface for extensible data sources and sinks, supporting different file formats (ORC/DWRF, Parquet, Nimble), and storage adapters (S3, HDFS, GCS, ABFS, local files) to be used. * **Network Serializers**: an interface where different wire protocols can be implemented, used for network communication, supporting [PrestoPage](https://prestodb.io/docs/current/develop/serialized-page.html) and Spark's UnsafeRow. * **Resource Management**: a collection of primitives for handling computational resources, such as [memory arenas](https://facebookincubator.github.io/velox/develop/arena.html) and buffer management, tasks, drivers, and thread pools for CPU and thread execution, spilling, and caching. Pollux is extensible and allows developers to define their own engine-specific specializations, including: 1. Custom types 2. [Simple and vectorized functions](https://facebookincubator.github.io/velox/develop/scalar-functions.html) 3. [Aggregate functions](https://facebookincubator.github.io/velox/develop/aggregate-functions.html) 4. Window functions 5. Operators 6. File formats 7. Storage adapters 8. Network serializers ## Examples Examples of extensibility and integration with different component APIs [can be found here](velox/examples) ## Documentation Developer guides detailing many aspects of the library, in addition to the list of available functions [can be found here.](https://facebookincubator.github.io/velox) Blog posts are available [here](https://velox-lib.io/blog). ## Community Pollux is an open source project supported by a community of individual contributors and organizations. The project's technical governance mechanics is described [in this document.](https://velox-lib.io/docs/community/technical-governance). Project maintainers [are listed here](https://velox-lib.io/docs/community/components-and-maintainers). The main communication channel with the Pollux OSS community is through the [the Pollux-OSS Slack workspace](http://velox-oss.slack.com), github Issues, and Discussions. ## Contributing Check our [contributing guide](CONTRIBUTING.md) to learn about how to contribute to the project. ## License Pollux is licensed under the Apache 2.0 License. A copy of the license [can be found here.](LICENSE) ## Getting Started install `kmpkg` first. ### Get the Pollux Source ``` git clone https://gitee.com/kumo-pub/pollux.git cd pollux cmake --preset=defaulk cmake --build build -j 2 ``` Once Pollux is checked out, the first step is to install the dependencies. Details on the dependencies and how Pollux manages some of them for you [can be found here](CMake/resolve_dependency_modules/README.md). Pollux also provides the following scripts to help developers setup and install Pollux dependencies for a given platform. ### Building Pollux Run `make` in the root directory to compile the sources. For development, use `make debug` to build a non-optimized debug version, or `make release` to build an optimized version. Use `make unittest` to build and run tests. Note that, * Pollux requires a compiler at the minimum GCC 11.0 or Clang 15.0. * Pollux requires the CPU to support instruction sets: * bmi * bmi2 * f16c * Pollux tries to use the following (or equivalent) instruction sets where available: * On Intel CPUs * avx * avx2 * sse * On ARM * Neon * Neon64