Interacting with Large Distributed Datasets Using Sketch

It is my pleasure to announce that my recent collaboration with Microsoft Research Silicon Valley on interacting with big data has resulted in a paper to be published in the proceedings of the Eurographics Symposium on Parallel Graphics and Visualization (EGPGV’16), which is to be held in Groningen, Netherlands, in June 4-6, 2016.

In this work, we present Sketch, a library and a distributed runtime for building interactive tools for exploring large datasets, distributed across multiple machines. We have built several sophisticated applications using this framework; in this paper we describe a billion-row spreadsheet, and a distributed-systems performance analyzer. Sketch applications allow interactive and responsive exploration of complex distributed datasets, scaling effectively to take advantage of large computational resources.


Thwarting Fake OSN Accounts by Predicting their Victims

Our recent work on fighting against automated fake accounts by predicting their victims has been accepted for publication at the 8th ACM Workshop on Artificial Intelligence and Security (AI-Sec’15), which is collocated with the 22nd ACM Conference on Computer and Communications Security (CCS), Denver, Colorado, USA.

In this work, we start with the observation that traditional defense mechanisms for fighting against automated fake accounts in online social networks are victim-agnostic. Even though victims of fake accounts play an important role in the viability of subsequent attacks, there is no work on utilizing this insight to improve the status quo. We then take the first step and propose to incorporate predictions about victims of unknown fakes into the workflows of existing defense mechanisms. In particular, we investigated how such an integration could lead to more robust fake account defense mechanisms. We also used real-world datasets from Facebook and Tuenti to evaluate the feasibility of predicting victims of fake accounts using supervised machine learning.