Offloading Index Traversal to the Network Card

Internship, Prof. Gustavo Alonso, Systems Group, ETH Zurich

B+-trees are the dominating data structure for indexing in databases. B+-trees can answer lookup queries (single entry) and range queries (multiple entries). The goal of this project is to enable direct index traversal and retrieval of data objects from the network card. While querying the B+-tree is offloaded entirely to the network card, inserts can be done by the CPU or implemented in a hybrid fashion. To improve performance the implementation of the B+-tree takes into account the characteristics of the DMA and the FPGA to minimize the overhead when traversing the tree. In addition on-chip memory on the FPGA can be used to cache tree nodes. Access to the B+-tree is exposed to other machines through specific RDMA verbs to either query or insert tuples.

MineLiB: A Library of Sub-Operators for Data Mining Algorithms

Internship, Prof. Gustavo Alonso, Systems Group, ETH Zurich

Accelerating data mining algorithms on modern computing hardware is critical for efficient use of the data deluge available in the cloud today. In this project we want to develop a library of algorithmic building blocks (sub-operators) optimize them, and expose them as an application programming interface (API) for data mining application developers. This set of library API, allow the integration of different optimized implementations of these sub-operators targeting different platforms (multicore CPUs, FPGAs, GPUs, etc.).

FastTree: Efficient Distributed Gradient Boosting Tree Framework

Final Year Project @ Systems Research Group, Microsoft Research Asia

A distributed machine learning framework based on Gradient Boosting Decision Tree (GBDT, GBRT, GBM or MART). The current version is built on top of ChaNa, the RDMA-optimized distributed computing engine developed by MSRA Systems Group. With both system-level and algorithm-level optimizations, memory usage and communication cost are reduced, and the performance is better than popular similar tools (e.g. XGBoost).