publications | Raja R. Sambasivan

2024

ICPE
Systemizing and mitigating topological inconsistencies in Alibaba’s microservice call-graph datasets

Darby Huye , Lan Liu , and Raja R. Sambasivan

In ACM/SPEC International Conference on Performance Engineering , May 2024

Abs Bib

Alibaba’s 2021 and 2022 microservice datasets are the only publicly available sources of request-workflow traces from a large-scale microservice deployment. They have the potential to strongly influence future research as they provide much-needed visibility into industrial microservices’ characteristics. We conduct the first systematic analyses of both datasets to help facilitate their use by the community. We find that the 2021 dataset contains numerous inconsistencies preventing accurate reconstruction of full trace topologies. The 2022 dataset also suffers from inconsistencies, but at a much lower rate. Tools that strictly follow Alibaba’s specs for constructing traces from these datasets will silently ignore these inconsistencies, misinforming researchers by creating traces of the wrong sizes and shapes. Tools that discard traces with inconsistencies will discard many traces. We present Casper, a construction method that uses redundancies in the datasets to sidestep the inconsistencies. Compared to an approach that discards traces with inconsistencies, Casper accurately reconstructs an additional 25.5% of traces in the 2021 dataset (going from 58.32% to 83.82%) and an additional 12.18% in the 2022 dataset (going from 86.42% to 98.6%).
@inproceedings{Huye2024, author = {Huye, Darby and Liu, Lan and Sambasivan, Raja R.}, title = {Systemizing and mitigating topological inconsistencies in Alibaba's microservice call-graph datasets}, booktitle = {ACM/SPEC International Conference on Performance Engineering}, publisher = {ACM/SPEC}, month = may, year = {2024}, doi = {https://www.doi.org/10.1145/3629526.3645043}, isbn = {979-8-4007-0444-4/24/05} }

2023

ATC
Lifting the veil on Meta’s microservice architecture: Analyses of topology and request workflows

Darby Huye , Yuri Shkuro , and Raja R. Sambasivan

In USENIX Annual Technical Conference , Jul 2023

Abs Bib PDF

The microservice architecture is a novel paradigm for building and operating distributed applications in many organizations. This paradigm changes many aspects of how distributed applications are built, managed, and operated in contrast to monolithic applications. It introduces new challenges to solve and requires changing assumptions about previously well-known ones. But, today, the characteristics of large-scale microservice architectures are invisible outside their organizations, depressing opportunities for research. Recent studies provide only partial glimpses and represent only single design points. This paper enriches our understanding of large-scale microservices by characterizing Meta’s microservice architecture. It focuses on previously unreported (or underreported) aspects important to developing and researching tools that use the microservice topology or traces of request workflows. We find that the topology is extremely heterogeneous, is in constant flux, and includes software entities that do not cleanly fit in the microservice architecture. Request work- flows are highly dynamic, but local properties can be predicted using service and endpoint names. We quantify the impact of obfuscating factors in microservice measurement and conclude with implications for tools and future-work opportunities.
@inproceedings{Huye2023, author = {Huye, Darby and Shkuro, Yuri and Sambasivan, Raja R.}, title = {Lifting the veil on {M}eta's microservice architecture: {A}nalyses of topology and request workflows}, booktitle = {USENIX Annual Technical Conference}, publisher = {USENIX}, pages = {419-432}, year = {2023}, month = jul, day = {10}, isbn = {978-1-939133-35-9}, }

2022

JSys
[SoK] Identifying Mismatches Between Microservice Testbeds and Industrial Perceptions of Microservices

Vishwanath Seshagiri , Darby Huye , Lan Liu , and 2 more authors

Journal of Systems Research, Jul 2022

Abs Bib PDF

Industrial microservice architectures vary so wildly in their characteristics, such as size or communication method, that comparing systems is difficult and often leads to confusion and misinterpretation. In contrast, the academic testbeds used to conduct microservices research employ a very constrained set of design choices. This lack of systemization in these key design choices when developing microservice architectures has led to uncertainty over how to use experiments from testbeds to inform practical deployments and indeed whether this should be done at all. We conduct semi-structured interviews with industry participants to understand the representativeness of existing testbeds’ design choices. Surprising results included the presence of cycles in industry deployments, as well as a lack of clarity about the presence of hierarchies. We then systematize the possible design choices we learned about from the interviews, and identify important mismatches between our interview results and testbeds’ designs that will inform future, more representative testbeds.
@article{Seshagiri2022, author = {Seshagiri, Vishwanath and Huye, Darby and Liu, Lan and Wildani, Avani and Sambasivan, Raja R}, title = {[SoK] Identifying Mismatches Between Microservice Testbeds and Industrial Perceptions of Microservices}, journal = {Journal of Systems Research}, volume = {2}, number = {1}, year = {2022}, }
OSR
VAIF: Variance-driven Automated Instrumentation Framework

Mert Toslali , Emre Ates , Darby Huye , and 5 more authors

ACM SIGOPS Operating Systems Review, Jul 2022

Abs Bib PDF

Developers use logs to diagnose performance problems in distributed applications. But, it is difficult to know a pri- ori where logs are needed and what information in them is needed to help diagnose problems that may occur in the future. We summarize our work on the Variance-driven Automated Instrumentation Framework (VAIF), which runs alongside distributed applications. In response to newly-observed per- formance problems, VAIF automatically searches the space of possible instrumentation choices to enable the logs needed to help diagnose them. To work, VAIF combines distributed trac- ing (an enhanced form of logging) with insights about how response-time variance can be decomposed on the critical- path portions of requests’ traces.
@article{Toslali2022, author = {Toslali, Mert and Ates, Emre and Huye, Darby and Zhang, Zhaoqi and Liu, Lan and Puterman, Samantha and Coskun, Ayse K and Sambasivan, Raja R}, title = {VAIF: Variance-driven Automated Instrumentation Framework}, journal = {ACM SIGOPS Operating Systems Review}, volume = {56}, number = {1}, pages = {42--50}, year = {2022}, }

2021

SoCC
Automating instrumentation choices for performance problems in distributed applications with VAIF

Mert Toslali , Emre Ates , Alex Ellis , and 6 more authors

In ACM Symposium on Cloud Computing , Nov 2021

Abs Bib PDF

Developers use logs to diagnose performance problems in distributed applications. However, it is difficult to know a priori where logs are needed and what information in them is needed to help diagnose problems that may occur in the future. We present the Variance-driven Automated Instrumentation Framework (VAIF), which runs alongside distributed applica- tions. In response to newly-observed performance problems, VAIF automatically searches the space of possible instrumen- tation choices to enable the logs needed to help diagnose them. To work, VAIF combines distributed tracing (an enhanced form of logging) with insights about how response-time variance can be decomposed on the critical-path portions of requests’ traces. We evaluate VAIF by using it to localize performance problems in OpenStack and HDFS. We show that VAIF can localize problems related to slow code paths, resource contention, and problematic third-party code while enabling only 3-34% of the total tracing instrumentation.
@inproceedings{Toslali2021, author = {Toslali, Mert and Ates, Emre and Ellis, Alex and Zhang, Zhaoqi and Huye, Darby and Liu, Lan and Puterman, Samantha and Coskun, Ayse K. and Sambasivan, Raja R.}, title = {Automating instrumentation choices for performance problems in distributed applications with VAIF}, booktitle = {ACM Symposium on Cloud Computing}, publisher = {ACM}, pages = {}, year = {2021}, month = nov, keywords = {} }

2019

BigData
D3N: A multi-layer cache for the rest of us

Emine Ugur Kaynar , Mania Abdi , Mohammad Hossein Hajkazemi , and 6 more authors

In IEEE International Conference on Big Data (Big Data) , Dec 2019

Abs Bib PDF Slides

Current caching methods for improving the performance of big-data jobs assume high (e.g., full bi-section) bandwidth; however many enterprise data centers and co-location facilities have large network imbalances due to over-subscription and incremental networking upgrades. We describe D3N, a multi-layer cooperative caching architecture that mitigates network imbalances by caching data on the access side of each layer of a hierarchical network topology, adaptively adjusting cache sizes of each layer based on observed workload patterns and network congestion. We have added (and submitted upstream) a 2-layer D3N cache to the Ceph RADOS Gateway; read bandwidth achieves the 5GB/s speed of our SSDs, and we show that it substantially improves big-data job performance while reducing network traffic.
@inproceedings{Kaynar:2019ab, author = {Kaynar, Emine Ugur and Abdi, Mania and Hajkazemi, Mohammad Hossein and Turk, Ata and Sambasivan, Raja R and Cohen, David and Rudolph, Larry and Desnoyers, Peter and Krieger, Orran}, title = {D3N: A multi-layer cache for the rest of us}, booktitle = {IEEE International Conference on Big Data (Big Data)}, publisher = {IEEE}, pages = {327-338}, year = {2019}, month = dec, keywords = {} }
SoCC
An automated, cross-layer instrumentation framework for diagnosing performance problems in distributed applications

Emre Ates , Lily Sturmann , Mert Toslali , and 4 more authors

In ACM Symposium on Cloud Computing (SoCC) , Nov 2019

Abs Bib PDF Slides

Diagnosing performance problems in distributed applications is extremely challenging. A significant reason is that it is hard to know where to place instrumentation a priori to help diagnose problems that may occur in the future. We present the vision of an automated instrumentation framework, Pythia, that runs alongside deployed distributed applications. In response to a newly-observed performance problem, Pythia searches the space of possible instrumentation choices to enable the instru- mentation needed to help diagnose it. Our vision for Pythia builds on workflow-centric tracing, which records the order and timing of how requests are processed within and among a distributed application’s nodes (i.e., records their workflows). It uses the key insight that localizing the sources high perfor- mance variation within the workflows of requests that are ex- pected to perform similarly gives insight into where additional instrumentation is needed.
@inproceedings{Ates:2019th, author = {Ates, Emre and Sturmann, Lily and Toslali, Mert and Krieger, Orran and Megginson, Richard and Coskun, Ayse K and Sambasivan, Raja R.}, title = {An automated, cross-layer instrumentation framework for diagnosing performance problems in distributed applications}, booktitle = {ACM Symposium on Cloud Computing (SoCC)}, publisher = {ACM}, pages = {165-170}, year = {2019}, month = nov, keywords = {} }

2017

SIGCOMM
Bootstrapping evolvability for inter-domain routing with D-BGP

Raja R. Sambasivan , David Tran-Lam , Aditya Akella , and 1 more author

In ACM Special Interest Group on Data Communication (SIGCOMM) , Aug 2017

Abs Bib PDF Video Slides Video

Abstract The Internet’s inter-domain routing infrastructure, provided today by BGP, is extremely rigid and does not facilitate the introduction of new inter-domain routing protocols. This rigidity has made it incredibly difficult to widely deploy critical fixes to BGP. It has also depressed ASes’; ability to sell value-added services or replace BGP entirely with a more sophisticated protocol. Even if operators undertook the significant effort needed to fix or replace BGP, it is likely the next protocol will be just as difficult to change or evolve. To help, this paper identifies two features needed in the routing infrastructure (i.e., within any inter-domain routing protocol) to facilitate evolution to new protocols. To understand their utility, it presents D-BGP, a version of BGP that incorporates them.
@inproceedings{Sambasivan:2017jb, author = {Sambasivan, Raja R. and Tran-Lam, David and Akella, Aditya and Steenkiste, Peter}, title = {Bootstrapping evolvability for inter-domain routing with D-BGP}, booktitle = {ACM Special Interest Group on Data Communication (SIGCOMM)}, publisher = {ACM}, pages = {474-487}, year = {2017}, month = aug, keywords = {BGP; Control plane; Extensibility; Evolvability; Routing} }

2016

SoCC
Principled workflow-centric tracing of distributed systems

Raja R. Sambasivan , Ilari Shafer , Jonathan Mace , and 3 more authors

In ACM Symposium on Cloud Computing , Oct 2016

Abs Bib PDF Slides

Workflow-centric tracing captures the workflow of causally- related events (e.g., work done to process a request) within and among the components of a distributed system. As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for understanding distributed system behavior. Yet, there is a fundamental lack of clarity about how such infrastructures should be designed to provide maximum benefit for important management tasks, such as resource ac- counting and diagnosis. Without research into this important issue, there is a danger that workflow-centric tracing will not reach its full potential. To help, this paper distills the design space of workflow-centric tracing and describes key design choices that can help or hinder a tracing infrastructure’s utility for important tasks. Our design space and the design choices we suggest are based on our experiences developing several previous workflow-centric tracing infrastructures.
@inproceedings{Sambasivan:2016bo, author = {Sambasivan, Raja R. and Shafer, Ilari and Mace, Jonathan and Sigelman, Benjamin H. and Fonseca, Rodrigo and Ganger, Gregory R.}, title = {Principled workflow-centric tracing of distributed systems}, booktitle = {ACM Symposium on Cloud Computing}, publisher = {ACM}, pages = {401-414}, year = {2016}, month = oct, keywords = {} }
TechReport
Bootstrapping evolvability for inter-domain routing with d-bgp

Raja R. Sambasivan , David Tran-Lam , Aditya Akella , and 1 more author

Jun 2016

Abs Bib PDF

It is extremely difficult to utilize new routing protocols in today’s Internet. As a result, the Internet’s baseline inter-domain protocol for connectivity (BGP) has remained largely unchanged, despite known significant flaws. The difficulty of using new protocols has also depressed opportunities for (currently commoditized) transit providers to provide value-added routing services. To help, this paper proposes Darwin’s BGP (D-BGP), a modified version of BGP that can support evolvability to new protocols. D-BGP modifies BGP’s advertisements and advertisement processing based on requirements imposed by key evolvability scenarios, which we identified via analyses of recently- proposed routing protocols.
@techreport{Sambasivan:2016ta, author = {Sambasivan, Raja R. and Tran-Lam, David and Akella, Aditya and Steenkiste, Peter}, title = {Bootstrapping evolvability for inter-domain routing with d-bgp}, type = {Computer Science Technical Report}, number = {CMU-CS-16-117}, publisher = {Carnegie Mellon University}, address = {Pittsburgh, PA, USA}, pages = {}, year = {2016}, month = jun, keywords = {} }

2015

HotNets
Bootstrapping evolvability for inter-domain routing

Raja R. Sambasivan , David Tran-Lam , Aditya Akella , and 1 more author

In ACM Workshop on Hot Topics in Networks (HotNets) , Jul 2015

Abs Bib PDF Video Slides Video

It is extremely difficult to deploy new inter-domain routing pro- tocols in today’s Internet. As a result, the Internet’s baseline pro- tocol for connectivity, BGP, has remained largely unchanged, despite known significant flaws. The difficulty of deploying new protocols has also depressed opportunities for (currently commoditized) transit providers to provide value-added rout- ing services. To help, we identify the key deployment models under which new protocols are introduced and the require- ments each poses for enabling their usage goals. Based on these requirements, we argue for two modifications to BGP that will greatly improve support for new routing protocols.
@inproceedings{Sambasivan:2015il, author = {Sambasivan, Raja R. and Tran-Lam, David and Akella, Aditya and Steenkiste, Peter}, title = {Bootstrapping evolvability for inter-domain routing}, booktitle = {ACM Workshop on Hot Topics in Networks (HotNets)}, publisher = {ACM}, pages = {}, year = {2015}, month = jul, keywords = {} }

2014

TechReport
So, you want to trace your distributed system? Key design insights from years of practical experience

Raja R. Sambasivan , Rodrigo Fonseca , Ilari Shafer , and 1 more author

Apr 2014

Abs Bib PDF

End-to-end tracing captures the workflow of causally-related activity (e.g., work done to process a request) within and among the components of a distributed system. As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for management tasks like diagnosis and resource accounting. Drawing upon our experiences building and using end-to-end tracing infrastructures, this paper distills the key design axes that dictate trace utility for important use cases. Developing tracing infrastructures without explicitly understanding these axes and choices for them will likely result in infrastructures that are not useful for their intended purposes. In addition to identifying the design axes, this paper identifies good design choices for various tracing use cases, contrasts them to choices made by previous tracing implementations, and shows where prior implementations fall short. It also identifies remaining challenges on the path to making tracing an integral part of distributed system design.
@techreport{Sambasivan:2014va, author = {Sambasivan, Raja R. and Fonseca, Rodrigo and Shafer, Ilari and Ganger, Gregory R.}, title = {So, you want to trace your distributed system? Key design insights from years of practical experience}, type = {Carnegie Mellon Parallel Data Lab Technical Report}, number = {CMU-PDL-14-102}, publisher = {Carnegie Mellon University Parallel Data Lab}, address = {}, pages = {}, year = {2014}, month = apr, keywords = {} }

2013

HotStorage
Specialized storage for big numeric time series

Ilari Shafer , Raja R Sambasivan , Anthony Rowe , and 1 more author

In USENIX Workshop on Hot Topics in Storage (HotStorage) , 2013

Abs Bib PDF

Numeric time series data has unique storage requirements and access patterns that can benefit from specialized support, given its importance in Big Data analyses. Popular frameworks and databases focus on addressing other needs, making them a suboptimal fit. This paper describes the support needed for numeric time series, suggests an architecture for efficient time series storage, and illustrates its potential for satisfying key requirements.
@inproceedings{Shafer:2013ab, author = {Shafer, Ilari and Sambasivan, Raja R and Rowe, Anthony and Ganger, Gregory R}, title = {Specialized storage for big numeric time series}, booktitle = {USENIX Workshop on Hot Topics in Storage (HotStorage)}, publisher = {USENIX Association}, pages = {}, year = {2013}, month = {}, keywords = {} }
TechReport
Visualizing request-flow comparison to aid performance diagnosis in distributed systems

Raja R. Sambasivan , Ilari Shafer , Michelle L Mazurek , and 1 more author

Apr 2013

Abs Bib PDF

Distributed systems are complex to develop and administer, and performance problem diagnosis is particularly challenging. When performance degrades, the problem might be in any of the system’s many components or could be a result of poor interactions among them. Recent research efforts have created tools that automatically localize the problem to a small number of potential culprits, but effective visualizations are needed to help developers understand and explore their results. This paper compares side-by-side, diff, and animation-based approaches for visualizing the results of one proven automated localization technique called request-flow comparison. Via a 26-person user study, which included real distributed systems developers, we identify the unique benefits that each approach provides for different usage modes and problem types.
@techreport{Sambasivan:2013tq, author = {Sambasivan, Raja R. and Shafer, Ilari and Mazurek, Michelle L and Ganger, Gregory R.}, title = {Visualizing request-flow comparison to aid performance diagnosis in distributed systems}, type = {Parallel Data Lab Technical Report}, number = {CMU-PDL-13-104}, publisher = {Carnegie Mellon University}, address = {Pittsburgh, PA, USA}, pages = {}, year = {2013}, month = apr, keywords = {} }
InfoVis
Visualizing request-flow comparison to aid performance diagnosis in distributed systems

Raja R Sambasivan , Ilari Shafer , Michelle L Mazurek , and 1 more author

IEEE transactions on visualization and computer graphics, Apr 2013

Abs Bib PDF Slides

Distributed systems are complex to develop and administer, and performance problem diagnosis is particularly challenging. When performance degrades, the problem might be in any of the system’s many components or could be a result of poor interactions among them. Recent research efforts have created tools that automatically localize the problem to a small number of potential culprits, but research is needed to understand what visualization techniques work best for helping distributed systems developers understand and explore their results. This paper compares the relative merits of three well-known visualization approaches (side-by-side, diff, and animation) in the context of presenting the results of one proven automated localization technique called request-flow comparison. Via a 26-person user study, which included real distributed systems developers, we identify the unique benefits that each approach provides for different problem types and usage modes.
@article{Sambasivan:2013gn, author = {Sambasivan, Raja R and Shafer, Ilari and Mazurek, Michelle L and Ganger, Gregory R}, editor = {}, title = {Visualizing request-flow comparison to aid performance diagnosis in distributed systems}, journal = {IEEE transactions on visualization and computer graphics}, volume = {19}, number = {12}, pages = {2466–2475}, year = {2013}, location = {New York, NY, USA}, keywords = {} }
Dissertation
Diagnosing performance changes in distributed systems by comparing request flows

Raja R. Sambasivan

Carnegie Mellon University , May 2013

Abs Bib PDF

Diagnosing performance problems in modern datacenters and distributed systems is challenging, as the root cause could be contained in any one of the system’s numerous components or, worse, could be a result of interactions among them. As distributed systems continue to increase in complexity, di- agnosis tasks will only become more challenging. There is a need for a new class of diagnosis techniques capable of helping developers address problems in these distributed environments.As a step toward satisfying this need, this dissertation proposes a novel technique, called request-flow comparison, for automatically localizing the sources of performance changes from the myriad potential culprits in a distributed system to just a few potential ones. Request-flow comparison works by contrasting the workflow of how individual requests are serviced within and among every component of the distributed system between two periods: a non-problem period and a problem period. By identifying and ranking performance-affecting changes, request-flow comparison provides developers with promising starting points for their diagnosis efforts. Request workflows are obtained with less than 1% overhead via use of recently developed end-to-end tracing techniques. To demonstrate the utility of request-flow comparison in various distributed systems, this dissertation describes its implementation in a tool called Spectroscope and describes how Spectroscope was used to diagnose real, previously unsolved problems in the Ursa Minor distributed storage service and in select Google services. It also explores request-flow comparison’s applicability to the Hadoop File System. Via a 26-person user study, it identifies effective visualiza- tions for presenting request-flow comparison’s results and further demonstrates that request-flow comparison helps developers quickly identify starting points for diagnosis. This dissertation also distills design choices that will maximize an end-to-end tracing infrastructure’s utility for diagnosis tasks and other use cases.
@phdthesis{Sambasivan:2013uz, author = {Sambasivan, Raja R.}, title = {Diagnosing performance changes in distributed systems by comparing request flows}, school = {Carnegie Mellon University}, address = {Pittsburgh, PA, USA}, pages = {}, year = {2013}, month = may, keywords = {} }

2012

TechReport
Visualizing request-flow comparison to aid performance diagnosis in distributed systems

Raja R. Sambasivan , Michelle L Mazurek , and Ilari Shafer

May 2012

Abs Bib PDF

Distributed systems are complex to develop and administer, and performance problem diagnosis is particularly challenging. When performance decreases, the problem might be in any of the system’s many components or could be a result of poor interactions among them. Recent research has provided the ability to automatically identify a small set of most likely problem locations, leaving the diagnoser with the task of exploring just that set. This paper describes and evaluates three approaches for visualizing the results of a proven technique called “request-flow comparison” for identifying likely causes of performance decreases in a distributed system. Our user study provides a number of insights useful in guiding visualization tool design for distributed system diagnosis. For example, we find that both an overlay-based approach (e.g., diff) and a side-by-side approach are effective, with tradeoffs for different users (e.g., expert vs. not) and different problem types. We also find that an animation-based approach is confusing and difficult to use.
@techreport{Sambasivan:2012tq, author = {Sambasivan, Raja R. and Mazurek, Michelle L and Shafer, Ilari}, title = {Visualizing request-flow comparison to aid performance diagnosis in distributed systems}, type = {Parallel Data Lab Technical Report}, number = {CMU-PDL-12-102}, publisher = {Carnegie Mellon University}, address = {Pittsburgh, PA, USA}, pages = {}, year = {2012}, month = may, keywords = {} }
HotCloud
Automated diagnosis without predictability is a recipe for failure

Raja R Sambasivan , and Gregory R Ganger

In USENIX Workshop on Hot Topics in Cloud Computing (HotCloud) , Jun 2012

Abs Bib PDF Video Slides Video

Automated management is critical to the success of cloud computing, given its scale and complexity. But, most sys- tems do not satisfy one of the key properties required for automation: predictability, which in turn relies upon low variance. Most automation tools are not e?ective when variance is consistently high. Using automated perfor- mance diagnosis as a concrete example, this position pa- per argues that for automation to become a reality, system builders must treat variance as an important metric and make conscious decisions about where to reduce it. To help with this task, we describe a framework for reason- ing about sources of variance in distributed systems and describe an example tool for helping identify them.
@inproceedings{Sambasivan:2012wv, author = {Sambasivan, Raja R and Ganger, Gregory R}, title = {Automated diagnosis without predictability is a recipe for failure}, booktitle = {USENIX Workshop on Hot Topics in Cloud Computing (HotCloud)}, publisher = {USENIX Association}, pages = {}, year = {2012}, month = jun, keywords = {} }

2011

NSDI
Diagnosing performance changes by comparing request flows

Raja R Sambasivan , Alice X Zheng , Michael De Rosa , and 6 more authors

In USENIX Conference on Networked Systems Design and Implementation (NSDI) , Mar 2011

Abs Bib PDF Video Slides Video

The causes of performance changes in a distributed system often elude even its developers. This paper de- velops a new technique for gaining insight into such changes: comparing request flows from two executions (e.g., of two system versions or time periods). Build- ing on end-to-end request-flow tracing within and across components, algorithms are described for identifying and ranking changes in the flow and/or timing of request pro- cessing. The implementation of these algorithms in a tool called Spectroscope is evaluated. Six case studies are presented of using Spectroscope to diagnose perfor- mance changes in a distributed storage service caused by code changes, configuration modifications, and compo- nent degradations, demonstrating the value and efficacy of comparing request flows. Preliminary experiences of using Spectroscope to diagnose performance changes within select Google services are also presented.
@inproceedings{Sambasivan2011vw, author = {Sambasivan, Raja R and Zheng, Alice X and De Rosa, Michael and Krevat, Elie and Whitman, Spencer and Stroucken, Michael and Wang, William and Xu, Lianghong and Ganger, Gregory R}, title = {Diagnosing performance changes by comparing request flows}, booktitle = {USENIX Conference on Networked Systems Design and Implementation (NSDI)}, publisher = {USENIX Association}, pages = {43-56}, year = {2011}, month = mar, keywords = {} }
TechReport
Automation without predictability is a recipe for failure

Raja R. Sambasivan , and Gregory R. Ganger

Jan 2011

Abs Bib PDF

Automated management seems a must, as distributed systems and datacenters continue to grow in scale and complexity. But, automation of performance problem diagnosis and tuning relies upon predictability, which in turn relies upon low variance—most automation tools aren’t effective when variance is regularly high. This paper argues that, for automation to become a reality, system builders must treat variance as an important metric and make conscious decisions about where to reduce it. To help with this task, we describe a framework for understanding sources of variance and describe an example tool for helping identify them.
@techreport{Sambasivan2011:ab, author = {Sambasivan, Raja R. and Ganger, Gregory R.}, title = {Automation without predictability is a recipe for failure}, type = {Parallel Data Lab Technical Report}, number = {CMU-PDL-11-101}, publisher = {Carnegie Mellon University}, address = {Pittsburgh, PA}, pages = {}, year = {2011}, month = jan, keywords = {} }

2010

ATC
A transparently-scalable metadata service for the ursa minor storage system.

Shafeeq Sinnamohideen , Raja R Sambasivan , James Hendricks , and 2 more authors

In USENIX Annual Technical Conference (ATC) , Jun 2010

Abs Bib PDF

The metadata service of the Ursa Minor distributed storage system scales metadata throughput as metadata servers are added. While doing so, it correctly handles metadata operations that involve items served by dif- ferent metadata servers, consistently and atomically up- dating the items. Unlike previous systems, it does so by reusing existing metadata migration functionality to avoid complex distributed transaction protocols. It also assigns item IDs to minimize the occurrence of multi- server operations. Ursa Minor’s approach allows one to implement a desired feature with less complexity than al- ternative methods and with minimal performance penalty (under 1% in non-pathological cases).
@inproceedings{Sinnamohideen:2010a, author = {Sinnamohideen, Shafeeq and Sambasivan, Raja R and Hendricks, James and Liu, Likun and Ganger, Gregory R}, title = {A transparently-scalable metadata service for the ursa minor storage system.}, booktitle = {USENIX Annual Technical Conference (ATC)}, publisher = {USENIX Association}, pages = {}, year = {2010}, month = jun, keywords = {} }
Patent
Managing execution of database queries

Stepan Krompass , Harumi Anne Kuno , Umeshwar Dayal , and 2 more authors

Apr 2010

Abs Bib PDF

One embodiment is a method to manage queries in a database. The method identified a query that executes on the database for an elapsed time that is greater than a threshold and then implements a remdial action when the query executes on the database for an execution time that is greater than an estimated execution time.
@patent{Krompass:2010uz, author = {Krompass, Stepan and Kuno, Harumi Anne and Dayal, Umeshwar and Wiener, Janet and Sambasivan, Raja R.}, title = {Managing execution of database queries}, number = {US2010/0082503A1}, year = {2010}, month = apr, keywords = {} }
TechReport
Diagnosing performance problems by visualizing and comparing system behaviours

Raja R. Sambasivan , Alice X. Zheng , Elie Krevat , and 2 more authors

Feb 2010

Abs Bib PDF

Spectroscope is a new toolset aimed at assisting developers with the long-standing challenge of performance debugging in dis- tributed systems. To do so, it mines end-to-end traces of request processing within and across components. Using Spectroscope, developers can visualize and compare system behaviours between two periods or system versions, identifying and ranking various changes in the flow or timing of request processing. Examples of how Spectroscope has been used to diagnose real performance problems seen in a distributed storage system are presented, and Spectroscope’s primary assumptions and algorithms are evaluated.
@techreport{Sambasivan:2010cd, author = {Sambasivan, Raja R. and Zheng, Alice X. and Krevat, Elie and Whitman, Spencer and Ganger, Gregory R.}, title = {Diagnosing performance problems by visualizing and comparing system behaviours}, type = {Parallel Data Lab Technical Report}, number = {CMU-PDL-10-103}, publisher = {Carnegie Mellon University}, address = {}, pages = {}, year = {2010}, month = feb, keywords = {} }
TechReport
Diagnosing performance changes by comparing system behaviours

Raja R. Sambasivan , Alice X Zheng , Elie Krevat , and 5 more authors

Jul 2010

Abs Bib PDF

The causes of performance changes in a distributed system often elude even its developers. This paper develops a new technique for gaining insight into such changes: comparing system behaviours from two executions (e.g., of two system versions or time periods). Building on end-to-end request flow tracing within and across components, algorithms are described for identifying and ranking changes in the flow and/or timing of request processing. The implementation of these algorithms in a tool called Spectroscope is described and evaluated. Five case studies are presented of using Spectroscope to diagnose performance changes in a distributed storage system caused by code changes and configuration modifications, demonstrating the value and efficacy of comparing system behaviours.
@techreport{Sambasivan:2010te, author = {Sambasivan, Raja R. and Zheng, Alice X and Krevat, Elie and Whitman, Spencer and Stroucken, Michael and Wang, William and Xu, Lianghong and Ganger, Gregory R.}, title = {Diagnosing performance changes by comparing system behaviours}, type = {Parallel Data Lab Technical Report}, number = {CMU-PDL-10-107}, publisher = {Carnegie Mellon University}, address = {Pittsburgh, PA, USA}, pages = {}, year = {2010}, month = jul, keywords = {} }

2009

CACM
Relative fitness modeling

Michael P. Mesnier , Matthew Wachs , Raja R. Sambasivan , and 2 more authors

Communications of the ACM, Jul 2009

Abs Bib PDF

Relative fitness is a new approach to modeling the performance of storage devices (e.g., disks and RAID arrays). In contrast to a conventional model, which predicts the performance of an application’s I/O on a given device, a relative fitness model predicts performance differences between devices. The result is significantly more accurate predictions.
@article{Mesnier:2009jw, author = {Mesnier, Michael P. and Wachs, Matthew and Sambasivan, Raja R. and Zheng, Alice X. and Ganger, Gregory R.}, editor = {}, title = {Relative fitness modeling}, journal = {Communications of the ACM}, volume = {52}, number = {4}, pages = {91–96}, year = {2009}, location = {}, keywords = {} }

2007

SIGMETRICS
Modeling the relative fitness of storage

Michael P. Mesnier , Matthew Wachs , Raja R. Sambasivan , and 2 more authors

In ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS) , Jun 2007

Abs Bib PDF

Relative fitness is a new black-box approach to modeling the performance of storage devices. In contrast with an absolute model that predicts the performance of a workload on a given storage device, a relative fitness model predicts per- formance differences between a pair of devices. There are two primary advantages to this approach. First, because a relative fitness model is constructed for a device pair, the application-device feedback of a closed workload can be cap- tured (e.g., how the I/O arrival rate changes as the workload moves from device A to device B). Second, a relative fitness model allows performance and resource utilization to be used in place of workload characteristics. This is beneficial when workload characteristics are difficult to obtain or concisely express (e.g., rather than describe the spatio-temporal char- acteristics of a workload, one could use the observed cache behavior of device A to help predict the performance of B).This paper describes the steps necessary to build a relative fitness model, with an approach that is general enough to be used with any black-box modeling technique. We compare relative fitness models and absolute models across a vari- ety of workloads and storage devices. On average, relative fitness models predict bandwidth and throughput within 10– 20% and can reduce prediction error by as much as a factor of two when compared to absolute models.
@inproceedings{Mesnier:2007dy, author = {Mesnier, Michael P. and Wachs, Matthew and Sambasivan, Raja R. and Zheng, Alice X. and Ganger, Gregory R.}, title = {Modeling the relative fitness of storage}, booktitle = {ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS)}, publisher = {ACM}, pages = {37-48}, year = {2007}, month = jun, keywords = {} }
HotAC
Categorizing and differencing system behaviours

Raja R Sambasivan , Alice X Zheng , Eno Thereska , and 1 more author

In USENIX Workshop on Hot Topics in Autonomic Computing , Jun 2007

Abs Bib PDF

Making request flow tracing an integral part of soft- ware systems creates the potential to better understand their operation. The resulting traces can be converted to per- request graphs of the work performed by a service, repre- senting the flow and timing of each request’s processing. Collectively, these graphs contain detailed and comprehen- sive data about the system’s behavior and the workload that induced it, leaving the challenge of extracting insights. Categorizing and differencing such graphs should greatly improve our ability to understand the runtime behavior of complex distributed services and diagnose problems. Clus- tering the set of graphs can identify common request pro- cessing paths and expose outliers. Moreover, clustering two sets of graphs can expose differences between the two; for example, a programmer could diagnose a problem that arises by comparing current request processing with that of an earlier non-problem period and focusing on the aspects that change. Such categorizing and differencing of system behavior can be a big step in the direction of automated problem diagnosis.
@inproceedings{Sambasivan:2007uh, author = {Sambasivan, Raja R and Zheng, Alice X and Thereska, Eno and Ganger, Gregory R}, title = {Categorizing and differencing system behaviours}, booktitle = {USENIX Workshop on Hot Topics in Autonomic Computing}, publisher = {USENIX}, pages = {}, year = {2007}, month = jun, keywords = {} }
FAST
// Trace: Parallel trace replay with approximate causal events

Michael P Mesnier , Matthew Wachs , Raja R Simbasivan , and 4 more authors

In USENIX Conference on File and Storage Technologies , Feb 2007

Abs Bib PDF

//TRACE1 is a new approach for extracting and replaying traces of parallel applications to recreate their I/O behav- ior. Its tracing engine automatically discovers inter-node data dependencies and inter-I/O compute times for each node (process) in an application. This information is re- flected in per-node annotated I/O traces. Such annota- tion allows a parallel replayer to closely mimic the be- havior of a traced application across a variety of stor- age systems. When compared to other replay mecha- nisms, //TRACE offers significant gains in replay accu- racy. Overall, the average replay error for the parallel applications evaluated in this paper is below 6%.
@inproceedings{Mesnier:2007ab, author = {Mesnier, Michael P and Wachs, Matthew and Simbasivan, Raja R and Lopez, Julio and Hendricks, James and Ganger, Gregory R and O’hallaron, David R}, title = {// Trace: Parallel trace replay with approximate causal events}, booktitle = {USENIX Conference on File and Storage Technologies}, publisher = {Carnegie Mellon University}, pages = {}, year = {2007}, month = feb, keywords = {} }

2006

IEEE Bulletin
Early experiences on the journey towards self-* storage.

Michael Abd-El-Malek , William V Courtright II , Chuck Cranor , and 7 more authors

IEEE Data Engineering Bulletin, Feb 2006

Abs Bib PDF

Self-* systems are self-organizing, self-configuring, self-healing, self-tuning and, in general, self- managing. Ursa Minor is a large-scale storage infrastructure being designed and deployed at Carnegie Mellon University, with the goal of taking steps towards the self-* ideal. This paper discusses our early experiences with one specific aspect of storage management: performance tuning and projection. Ursa Minor uses self-monitoring and rudimentary system modeling to support analysis of how system changes would affect performance, exposing simple What.if query interfaces to administrators and tuning agents. We find that most performance predictions are sufficiently accurate (within 10-20%) and that the associated performance overhead is less than 6%. Such embedded support for What.if queries simplifies tuning automation and reduces the administrator expertise needed to make acquisition decisions.
@article{AbdelMalek:2006ab, author = {Abd-El-Malek, Michael and Courtright II, William V and Cranor, Chuck and Ganger, Gregory R and Hendricks, James and Klosterman, Andrew J and Mesnier, Michael P and Prasad, Manish and Salmon, Brandon and Sambasivan, Raja R}, editor = {}, title = {Early experiences on the journey towards self-* storage.}, journal = {IEEE Data Engineering Bulletin}, volume = {29}, number = {3}, pages = {55–62}, year = {2006}, keywords = {} }
TechReport
Improving small file performance in object-based storage

James Hendricks , Raja R. Sambasivan , Shafeeq Sinnamohideen , and 1 more author

May 2006

Abs Bib PDF

This paper proposes architectural refinements, server-driven metadata prefetching and namespace flattening, for improving the efficiency of small file workloads in object-based storage systems. Server-driven metadata prefetching consists of having the metadata server provide information and capabilities for multiple objects, rather than just one, in response to each lookup. Doing so allows clients to access the contents of many small files for each metadata server interaction, reducing access latency and metadata server load. Namespace flattening encodes the directory hierarchy into object IDs such that namespace locality translates to object ID similarity. Doing so exposes namespace relationships among objects (e.g., as hints to storage devices), improves locality in metadata indices, and enables use of ranges for exploiting them. Trace-driven simulations and experiments with a prototype implementation show significant performance benefits for small file workloads.
@techreport{Hendricks:2006wn, author = {Hendricks, James and Sambasivan, Raja R. and Sinnamohideen, Shafeeq and Ganger, Gregory R.}, title = {Improving small file performance in object-based storage}, type = {Parallel Data Lab Technical Report}, number = {CMU-PDL-06-104}, publisher = {Carnegie Mellon University}, address = {Pittsburgh, PA, USA}, pages = {}, year = {2006}, month = may, keywords = {} }
TechReport
Eliminating cross-server operations in scalable file systems

James Hendricks , Shafeeq Sinnamohideen , Raja R. Sambasivan , and 1 more author

May 2006

Abs Bib PDF

Distributed file systems that scale by partitioning files and directories among a collection of servers inevitably encounter cross- server operations. A common exampleis a REN ME that moves afile from a directorymanaged by oneserver to a directory managed by another. Systems that provide the same semantics for cross-server operations as for those that do not span servers traditionally implement dedicated protocols for these rare operations. This paper suggests an alternate approach that exploits the existence of dynamic redistribution functionality (e.g., for load balancing, incorporation of new servers, and so on). When a client request would involve files on multiple servers, the system can redistribute those files onto one server and have it service the request. Although such redistribution is more expensive than a dedicated cross-server protocol, the rareness of such operations makes the overall performance impact minimal. Analysis of NFS traces indicates that cross-server operations make up fewer than 0.001% of client requests, and experiments with a prototype implementation show that the performance impact is negligible when such operations make up as much as 0.01% of operations. Thus, when dynamic redistribution functionality exists in the system, cross-server operations can be handled with little additional implementation complexity.
@techreport{Hendricks:2006vx, author = {Hendricks, James and Sinnamohideen, Shafeeq and Sambasivan, Raja R. and Ganger, Gregory R.}, title = {Eliminating cross-server operations in scalable file systems}, type = {Parallel Data Lab Technical Report}, number = {CMU-PDL-06-105}, publisher = {Carnegie Mellon University}, address = {Pittsburgh, PA, USA}, pages = {}, year = {2006}, month = may, keywords = {} }

2005

TechReport
Ursa minor: Versatile cluster-based storage

Apr 2005

Abs Bib PDF

No single data encoding scheme or fault model is right for all data. A versatile storage system allows these to be data-specific, so that they can be matched to access patterns, reliability requirements, and cost goals. Ursa Minor is a cluster-based storage system that allows data-specific selection of and on-line changes to encoding schemes and fault models. Thus, different data types can share a scalable storage infrastructure and still enjoy customized choices, rather than suffering from “one size fits all.” Experiments with Ursa Minor show performance penalties as high as 2–3􏰀 for workloads using poorly-matched choices. Experiments also show that a single cluster supporting multiple workloads is much more efficient when the choices are specialized rather than forced to use a “one size fits all” configuration.
@techreport{Ganger:2005kl, author = {}, title = {Ursa minor: Versatile cluster-based storage}, type = {Parallel Data Lab Technical Report}, number = {CMU-PDL-05-104}, publisher = {Carnegie Mellon University}, address = {}, pages = {}, year = {2005}, month = apr, keywords = {} }
FAST
Ursa minor: Versatile cluster-based storage

Michael Abd-El-Malek , William V Courtright II , Chuck Cranor , and 7 more authors

In USENIX Conference on File and Storage Technologies , Dec 2005

Abs Bib PDF

No single encoding scheme or fault model is optimal for all data. A versatile storage system allows them to be matched to access patterns, reliability requirements, and cost goals on a per-data item basis. Ursa Minor is a cluster-based storage system that allows data-specific selection of, and on-line changes to, encoding schemes and fault models. Thus, different data types can share a scalable storage infrastructure and still enjoy specialized choices, rather than suffering from “one size fits all.” Ex- periments with Ursa Minor show performance benefits of 2–3× when using specialized choices as opposed to a single, more general, configuration. Experiments also show that a single cluster supporting multiple workloads simultaneously is much more efficient when the choices are specialized for each distribution rather than forced to use a “one size fits all” configuration. When using the specialized distributions, aggregate cluster through- put nearly doubled.
@inproceedings{AbdElMalek:2005wx, author = {Abd-El-Malek, Michael and Courtright II, William V and Cranor, Chuck and Ganger, Gregory R and Hendricks, James and Klosterman, Andrew J and Mesnier, Michael P and Prasad, Manish and Salmon, Brandon and Sambasivan, Raja R}, title = {Ursa minor: Versatile cluster-based storage}, booktitle = {USENIX Conference on File and Storage Technologies}, publisher = {USENIX Association}, pages = {}, year = {2005}, month = dec, keywords = {} }
MASCOTS
Replication policies for layered clustering of nfs servers

Raja R. Sambasivan , Andrew J Klosterman , and Gregory R. Ganger

In IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems , Sep 2005

Abs Bib PDF

Layered clustering offers cluster-like load balancing for unmodified NFS or CIFS servers. Read requests sent to a busy server can be offloaded to other servers holding replicas of the accessed files. This paper explores a key design question for this approach; which files should be replicated? We find that the popular policy of replicating read-only files offers little benefit. A policy that replicates read-only portions of read-mostly files, however, implicitly coordinates with client cache invalidations and thereby allows almost all read operations to be offloaded. In a read-heavy trace, 75% of all operations and 52% of all data transfers can be offloaded.
@inproceedings{Sambasivan:2005hb, author = {Sambasivan, Raja R. and Klosterman, Andrew J and Ganger, Gregory R.}, title = {Replication policies for layered clustering of nfs servers}, booktitle = {IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems}, publisher = {IEEE}, pages = {361-370}, year = {2005}, month = sep, keywords = {} }
TechReport
Selected project reports, spring 2005 advanced os & distributed systems (15-712)

Jangwoo Kim Kim , Eriko Nurvitadhi , Eric Chung , and 13 more authors

May 2005

Abs Bib PDF

This technical report contains six final project reports contributed by participants in CMU’s Spring 2005 Advanced Operating Systems and Distributed Systems course (15-712) offered by professor Garth Gibson. This course examines the design and analysis of various aspects of operating systems and distributed systems through a series of background lectures, paper readings, and group projects. Projects were done in groups of two or three, required some kind of implementation and evalution pertaining to the classrom material, but with the topic of these projects left up to each group. Final reports were held to the standard of a systems conference paper submission; a standard well met by the majority of completed projects. Some of the projects will be extended for future submissions to major system conferences. The reports that follow cover a broad range of topics. These reports present a characterization of synchronization behavior and overhead in commercial databases, and a hardware-based lock predictor based on the characterization; design and implementation of a partitioned protocol offload architecture that provides Direct Data Placement (DDP) functionality and better utilizes both the network interface and the host CPU; design and implementation of file indexing inside file systems for fast content searching support; comparison-based server verification techniques for stateful and semi-deterministic protocols such as NFSv4; data-plane protection techniques for link-state routing protocols such as OSPF, which is resilient to the existence of compromised routers; and performance comparison of in-band and out-of-band data access strategies in file systems. While not all of these reports report definitely and positively, all are worth reading because they involve novelty in the systems explored and bring forth interesting research questions.
@techreport{Gibson:2005ab, author = {Kim, Jangwoo Kim and Nurvitadhi, Eriko and Chung, Eric and Nizhnert, Alex and Biggadike, Andrew and Chamcham, Jad and Sridhar, Srinath and Stylos, Jeffrey and Zeilberger, Noam and Economou, Gregg and Sambasivan, Raja R. Wong, Terrence and Shi, Elaine and Lu, Yong and Reid, Matt and Palekar, Amber and Iyer, Rahul}, title = {Selected project reports, spring 2005 advanced os \& distributed systems (15-712)}, type = {Computer Science Technical Report}, number = {CMU-CS-05-138}, publisher = {Carnegie Mellon University}, address = {Pittsburgh, PA}, pages = {}, year = {2005}, month = may, keywords = {} }