Graphs & Systems
This is was a reading group open to the University community. The goal is to provide a place for discussion and learning about topics related to graph theory, applications, and system design. The discussions are based on articles on the following topics:
- High performance graph processing platforms;
- Parallel graph algorithms
- Characterization, modelling, and supporting infrastructure for online social systems (e.g., Facebook, Google+, del.icio.us, Twitter, YouTube, MMPG, etc)
- Mobile networks
Location: KAIS 4018
Schedule / Reading List:
2012
- 2 PM, 30 Jul 2012
Soman, J. Fast Community Detection Algorithm with GPUs and Multicore Architectures. In IPDPS’2012.
Abstract. In this paper, we present the design of a novel scalable parallel algorithm for community detection optimized for multi-core and GPU architectures. Our algorithm is based on label propagation, which works solely on local information, thus giving it the scalability advantage over conventional approaches. We also show that weighted label propagation can overcome typical quality issues in communities detected with label propagation. Experimental results on well known massive scale graphs such as Wikipedia (100M edges) and also on RMAT graphs with 10M 40M edges, demonstrate the superior performance and scalability of our algorithm compared to the well known approaches for community detection. On the hep-th graph (352K edges) and the wikipedia graph (100M edges), using Power 6 architecture with 32 cores, our algorithm achieves one to two orders of magnitude better performance compared to the best known prior results on parallel architectures with similar number of CPUs. Further, our GPGPU based algorithm achieves 8× improvement over the Power 6 performance on 40M edge R-MAT graph. Alongside, we achieve high quality (modularity) of communities detected, with experimental evidence from well-known graphs such as Zachary karate club, Dolphin network and Football club, where we achieve modularity that is close to the best known alternatives. To the best of our knowledge these are best known results for community detection on massive graphs (100M edges) in terms of performance and also quality vs. performance trade-off. This is also a unique work on community detection on GPGPUs with scalable performance.
- 10:30 AM, 16 Jul 2012
Marc Najork, Dennis Fetterly, Alan Halverson, Krishnaram Kenthapadi, and Sreenivas Gollapudi. Of Hammers and Nails: An Empirical Comparison of Three Paradigms for Processing Large Graphs. In WSDM’2012.
Abstract. Many phenomena and artifacts such as road networks, social networks and the web can be modeled as large graphs and analyzed using graph algorithms. However, given the size of the underlying graphs, efficient implementation of basic operations such as connected component analysis, approximate shortest paths, and link-based ranking (e.g. PageRank) becomes challenging.
This paper presents an empirical study of computations on such large graphs in three well-studied platform models, viz., a relational model, a data-parallel model, and a special-purpose in-memory model. We choose a prototypical member of each platform model and analyze the computational efficiencies and requirements for five basic graph operations used in the analysis of real-world graphs viz., PageRank, SALSA, Strongly Connected Components (SCC), Weakly Connected Components (WCC), and Approximate Shortest Paths (ASP). Further, we characterize each platform in terms of these computations using model-specific implementations of these algorithms on a large web graph. Our experiments show that there is no single platform that performs best across different classes of operations on large graphs. While relational databases are powerful and flexible tools that support a wide variety of computations, there are computations that benefit from using special-purpose storage systems and others that can exploit data-parallel platforms.
- 10:30 AM, 17 Feb 2012
Sebastiaan J. van Schaik and Oege de Moor. A memory efficient reachability data structure through bit vector compression. In SIGMOD’2011.
Abstract. When answering many reachability queries on a large graph, the principal challenge is to represent the transitive closure of the graph compactly, while still allowing fast membership tests on that transitive closure. Recent attempts to address this problem are complex data structures and algorithms such as Path-Tree and 3-HOP. We propose a simple alternative based on a novel form of bit-vector compression. Our starting point is the observation that when computing the transitive closure, reachable vertices tend to cluster together. We adapt the well-known scheme of word-aligned hybrid compression (WAH) to work more efficiently by introducing word partitions. We prove that the resulting scheme leads to a more compact data structure than its closest competitor, namely interval lists. In extensive and detailed experiments, this is confirmed in practice. We also demonstrate that the new technique can handle much larger graphs than alternative algorithms.
- 10:30 AM, 27 Jan 2012
Zhao et al. Efficient Shortest Paths on Massive Social Graphs. In CollaborateCom’2011.
Abstract. Analysis of large networks is a critical component of many of today’s application environments, including online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive network graphs with hundreds of millions of nodes, e.g. social graphs, presents a unique challenge to graph analysis applications. Most of these applications rely on computing distances between node pairs, which for large graphs can take minutes to compute using traditional algorithms such as breadth-first-search (BFS).
In this paper, we study ways to enable scalable graph processing for today’s massive networks. We explore the design space of graph coordinate systems, a new approach that accurately approximates node distances in constant time by embedding graphs into coordinate spaces. We show that a hyperbolic embedding produces relatively low distortion error, and propose Rigel, a hyperbolic graph coordinate system that lends itself to efficient parallelization across a compute cluster. Rigel produces significantly more accurate results than prior systems, and is naturally parallelizable across compute clusters, allowing it to provide accurate results for graphs up to 43 million nodes. Finally, we show that Rigel’s functionality can be easily extended to locate (near-) shortest paths between node pairs. After a one-time preprocessing cost, Rigel answers node-distance queries in 10’s of microseconds, and also produces shortest path results up to 18 times faster than prior shortest-path systems with similar levels of accuracy.
2011
- 10:30 AM, 16 Dec 2011
Sewall et al. PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors. In VLDB’2011.
Abstract. Concurrency control on B+ trees is primarily achieved with latches, but serialization and contention can hinder scalability. As core counts on current processors increase, it is imperative to develop scalable latch-free techniques for concurrency control. We present PALM, a novel technique for performing multiple concurrent queries on in-memory B+ trees. PALM is based on the Bulk Synchronous Parallel model, which guarantees freedom from deadlocks and race conditions. Input queries are grouped and processed in atomic batches, and work proceeds in stages that preclude contention. Transitions between stages are accomplished with scalable point-to-point communication. PALM exploits data and thread-level parallelism on modern many-core architectures, and performs 40M1 updates/second on trees with 128M keys, and 128M updates/second on trees with 512K keys on the latest CPU architectures. Our throughput is 2.3X–19X that of state-of-theart concurrent update algorithms on in-memory B+ trees. PALM obtains close to peak throughput at very low response times of less than 350s, even for large trees. We also evaluate PALM on the Intel Many Integrated Core (Intel MIC) architecture, and demonstrate a speedup of 1.5–2.1X for out-of-cache tree sizes on an Intel Knights Ferry over a pair of Intel Xeon processors DP X5680 (Westmere-EP) in a dual-socket configuration
- 10:30 AM, 2 Dec 2011
Gupte et al. Finding hierarchy in directed online social networks. In WWW’2011.
Abstract. Social hierarchy and stratification among humans is a well studied concept in sociology. The popularity of online social networks presents an opportunity to study social hierarchy for different types of networks and at different scales. We adopt the premise that people form connections in a social network based on their perceived social hierarchy; as a result, the edge directions in directed social networks can be leveraged to infer hierarchy. In this paper, we define a measure of hierarchy in a directed online social network, and present an efficient algorithm to compute this measure. We validate our measure using ground truth including Wikipedia notability score. We use this measure to study hierarchy in several directed online social networks including Twitter, Delicious, YouTube, Flickr, LiveJournal, and curated lists of several categories of people based on different occupations, and different organizations. Our experiments on different online social networks show how hierarchy emerges as we increase the size of the network. This is in contrast to random graphs, where the hierarchy decreases as the network size increases. Further, we show that the degree of stratification in a network increases very slowly as we increase the size of the graph.
- 10:30 AM, 18 Nov 2011
Abou-Rjeili and Karypis. Multilevel Algorithms for Partitioning Power-Law Graphs. In IPDPS’2006.
Abstract. Graph partitioning is an enabling technology for parallel processing as it allows for the effective decomposition of unstructured computations whose data dependencies correspond to a large sparse and irregular graph. Even though the problem of computing high-quality partitionings of graphs arising in scientific computations is to a large extent well understood, this is far from being true for emerging HPC applications whose underlying computation involves graphs whose degree distribution follows a power-law curve. This paper presents new multilevel graph partitioning algorithms that are specifically designed for partitioning such graphs. It presents new clustering-based coarsening schemes that identify and collapse together groups of vertices that are highly connected. An experimental evaluation of these schemes on 10 different graphs show that the proposed algorithms consistently and significantly outperform existing state-of-the-art approaches.
- 10:30 AM, 4 Nov 2011
Broder et al. Efficiently evaluating graph constraints in content-based publish/subscribe. In WWW ’11.
Abstract. We introduce the problem of evaluating graph constraints in content-based publish/subscribe (pub/sub) systems. This problem formulation extends traditional content-based pub/sub systems in the following manner: publishers and subscribers are connected via a (logical) directed graph G with node and edge constraints, which limits the set of valid paths between them. Such graph constraints can be used to model a Web advertising exchange (where there may be restrictions on how advertising networks can connect advertisers and publishers) and content delivery problems in social networks (where there may be restrictions on how information can be shared via the social graph). In this context, we develop efficient algorithms for evaluating graph constraints over arbitrary directed graphs G. We also present experimental results that demonstrate the effectiveness and scalability of the proposed algorithms using a realistic dataset from Yahoo!’s Web advertising exchange.
- 10:30 AM, 21 Oct 2011
Aydin Buluc, Kamesh Madduri. Parallel Breadth-First Search on Distributed Memory Systems. In Supercomputing’2011.
Abstract. Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned parallel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix-partitioning-based approach that mitigates parallel communication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execution regimes in which these approaches will be competitive, and we demonstrate extremely high performance on leading distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny-Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.
- 10:30 AM, 7 Oct 2011
Papgelis et al. Individual behavior and social influence in online social systems. In HT’2011.
Abstract. The capacity to collect and analyze the actions of individuals in online social systems at minute-by-minute time granularity offers new perspectives on collective human behavior research. Macroscopic analysis of massive datasets raises interesting observations of patterns in online social processes. But working at a large scale has its own limitations, since it typically doesn’t allow for interpretations on a microscopic level. We examine how different types of individual behavior affect the decisions of friends in a network. We begin with the problem of detecting social influence in a social system. Then we investigate the causality between individual behavior and social influence by observing the diffusion of an innovation among social peers. Are more active users more influential? Are more credible users more influential? Bridging this gap and finding points where the macroscopic and microscopic worlds converge contributes to better interpretations of the mechanisms of spreading of ideas and behaviors in networks and offer design opportunities for online social systems.
- 10:30 AM, 23 Sep 2011 — first meeting
Boldi et al. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In WWW’2011.
Abstract. We continue the line of research on graph compression started with WebGraph, but we move our focus to the compression of social networks in a proper sense (e.g., LiveJournal): the approaches that have been used for a long time to compress web graphs rely on a specific ordering of the nodes (lexicographical URL ordering) whose extension to general social networks is not trivial. In this paper, we propose a solution that mixes clusterings and orders, and devise a new algorithm, called Layered Label Propagation, that builds on previous work on scalable clustering and can be used to reorder very large graphs (billions of nodes). Our implementation uses task decomposition to perform aggressively on multi-core architecture, making it possible to reorder graphs of more than 600 millions nodes in a few hours.
Experiments performed on a wide array of web graphs and social networks show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. These improvements make it possible to analyse in main memory significantly larger graphs.