Kumseok Jung, Julien Gascon Samson, Sathish Gopalakrishnan, and Karthik Pattabiraman, IEEE Transactions on Parallel and Distributed Systems (TPDS). [ PDF ]
This paper is an extended version of the following conference paper.
Abstract: Application developers often need to employ a combination of software such as communication middleware and cloud-based services to deal with the challenges of heterogeneity and network dynamism in the edge-to-cloud continuum. Consequently, developers write extra glue code peripheral to the application’s core business logic, to provide interoperability between interacting software frameworks. Each software framework comes with its own framework-specific API, and as technology evolves, the developer must keep up with the changing APIs by updating the glue code in their application. Thus, framework-specific APIs hinder interoperability and cause technology fragmentation.
We propose a design of a middleware-based distributed operating system (OS) called OneOS1 to realize a computing paradigm that alleviates such interoperability challenges. OneOS provides a single system image of the distributed computing platform, and transparently provides interoperability between software components through the standard POSIX API. Using OneOS’s domain-specific language, users can compose complex distributed applications from legacy POSIX programs. OneOS
tolerates failures by adopting a distributed checkpoint-restore algorithm. We evaluate the performance of OneOS against an open-source IoT Platform, ThingsJS, using an IoT stream processing benchmark suite, and a video processing application. OneOS executes the programs about 3x faster than ThingsJS, reduces the code size by about 22%, and recovers the state of failed applications within 1 second upon detecting their failure.