Efficient sharing of opaque data files through the nodes of a distributed computing system is not a solved problem. Comunicating geographically distributed computers through proxies, firewalls and NAT systems is still a challenge, and most current technologies do not fully address it.

Overview

In a Grid, geographically distributed machines are constantly entering into short-term federations, and exiting from them. These groupings may have a lifetime ranging from seconds to weeks, even more, but they are never permanent. Although SOAP and XML can be used to transmit a wide range of simple and complex data, there are occasions in which converting data into an XML format is not desirable, or even reasonable. This is, for example, the case of multimedia data, or massive binary streams.

In this scenario, sharing opaque data among the group components can become a problem hard to solve: data must pass through firewalls, proxies, NAT systems and the like; internet connections are generally slower than the ones in intranets; most file transfer protocols lack the level of security required in a grid; opaque data tend to be in the order of GigaBytes, and increasing.

Current approaches are focusing on the creation of transfer protocols that would be used for efficient sharing of individual (and independent) resources. These resources would be addressable through a standard mechanism like an URL, or a similar one.

TVFS represents a different, higher level approach. Federated machines would be able, through this framework, to access a shared virtual file system. In this file system all files may be, not only transferred among nodes, but also randomly read or written. What is more, other operations like getting a directory listing, obtaining file meta-data, or locking a file for writing could be performed. This means data will not be referenced through generic URLs, but through file paths.

Actual transfers would be done in the backstage, isolating the programmer from low-level details. These details include decissions on whether to use direct or P2P BitTorrent-like downloads, data caching policies, whether to join the federation as a passive or active node, mirroring strategies...

Project Goals

The ultimate goal of this project is to develop a framework for data sharing in distributed applications, and specifically, in grid federations. This framework must include a reference implementation, and all the necessary documentation to allow compatible third-party implementations.

As this can be a huge goal, a divide-and-conquer approach will be used. The project will be separated in several areas, maybe sub-projects in a future, each one with its own goals to be achieved.