Alumni Lecture Series: Chapter 3 (Sidharth Kumar)
Data creation consistently outpaces the infrastructure for its storage and utilization. With the growing size of scientific simulations, increasing resolution of sensors and the advent of big data, this chasm between production and utilization is wider than ever. Deriving insight from massive scientific data requires injecting data management techniques directly into the analysis and visualization pipeline, to build tools that allow effective, interactive exploration. Sidharth Kumar has developed scalable algorithms for high-performance data movement that can facilitate key requirements of data management including parallel I/O, out-of-core processing, streaming, and paging. Overall, the research provides a unified solution for large-scale data management, impacting all three key stages of the data life cycle: generation, storage, and exploration (analysis and visualization).
As part of the third DA-IICT Alumni Lecture Series he presented the first the parallel framework for the high-performance movement of analysis and visualization-appropriate, multiresolution data. The framework grants users flexibility concerning the scale at which they want to process data, making it possible to interact with large volumes of data with low latency on a variety of platforms, ranging from standard workstations to the largest supercomputers.
In his talk, he presented an HPC framework for scalable Relational algebra that forms a basis of primitive operations suitable for applications in graphs and networks, program analysis, deductive databases, scientific data and constraint logic programming. Despite its expressive power, relational algebra has not received the same attention in high-performance-computing research as more common primitives like stencil computations, floating-point operations, numerical integration, and sparse linear algebra. Furthermore, specific challenges in addressing representation and communication among distributed portions of a relation, especially for inherently imbalanced relations, have previously thwarted successful scaling of relational algebra applications to supercomputers. Sidharth presented a set of efficient algorithms to effectively parallelize and scale key relational algebra primitives. He also introduced a hybrid hash-tree approach to representing distributed imbalanced relations and permitting efficient communication.
Sidharth Kumar is an Assistant Professor in theDepartment of Computer Science at the University of Alabama at Birmingham. He received his B.Tech degree from DAIICT in 2009 and a PhD from the University of Utah in 2015. His area of research is in high-performance computing and large-scale data management. His research has led to the development of a highly scalable parallelI/O system (PIDX), which has the unique ability to write data in a hierarchical format inherently suitable for analysis and visualization tasks. The technology has been deployed on several DOE supercomputing facilities like Theta and Mira (ANL) and Titan (ORNL). Dr Kumar has also developed a parallel framework for performing relational algebra at scale. The work has received the best paper award at HiPC 2019 and won the Hans Meuer Best research paper award at the International Supercomputing conference 2020.