Communication Module Proposal

From Giews

Jump to: navigation, search

This is a proposal for new GIEWS communication module. The solution I suggest is an hybrid P2P and web-services system, because we need to share and transfer resources that are very small (e.g. dataset) or huge (e.g. satellite's images).

Contents

[edit] Small Files

Small files we need to transfer are dataset and .torrent. A .torrent is a file which describes a huge file and how to obtain it, I'll discuss it later. For this kind of files we can use a web-service architecture on HTTPS combined with encrypted files. Mainly we need just two kind of service:

  • Find resources: this service retrieves a list of available resources in the net, mainly a list of dataset and .torrent files.
  • Get resource: this service is in charge to download selected encrypted resources via HTTPS.

[edit] Huge Files

These could be satellite images or big compressed data archives. We should use a P2P architecture for these files, by means of BitTorrent and DHT technologies.

[edit] Split the resource

To optimize a P2P network it's necessary to split the shared resource into small parts, each part has the same size. For every part a SHA1 digest must be calculated, this prevent final user to recombine the original file with corrupted parts. A SHA1 digest is a 160 bit long sequence calculated on file's content. If a file change even its SHA1 changes, that's how it's possible to prevent corrupted parts. Once a file has been divided it's possible to get its parts from multiple sources, in order to optimize download speed and network's bandwidth.

[edit] The .torrent file

Every kind of file (jpg, mpeg, zip) can be described by a .torrent file to be shared into a P2P network. This is just a metadata file (eventually written in BenCode), which contains:

  • Original file name
  • File dimension
  • File SHA1 digest
  • Number of file's parts
  • SHA1 digest for each file's part

To obtain a huge resource an user only needs to download the .torrent file, the communication module takes care to download each part described into the .torrent from multiple sources and recombine them into the original file. Once the resource has been re-built the system calculates the SHA1 digest and compare it with the one into the .torrent, if these digests are equals there's no corruption into the transfered resource.

[edit] DHT - Distributed Hash Table

In order to implement a completely decentralized P2P network we need to use a DHT, or something similar. A DHT is used to map all resources and parts of files shared into the network. The DHT should be a list of strigs, each string is formatted like this:

SHA1 | GAUL_CODE | IP:PORT

Every peer into the network has the same DHT. When a peer share a new file, or creates new file's part, or obtain a new file's part, it has to send an acknowledgement about this event, in order to update DHT hold by other peers. The most efficient thing is that a peer updates only peers that are "close" to him into the network, then these peers update their "neighbours" and so on.

[edit] Example: How to Download a "Huge" Resource

  • User retrieves a list of available resources, e.g. by means of Resource Explorer into the Workstation. A web-services on HTTPS is used to achieve this target.
  • User selects resources he needs to download, corresponding .torrent files are downloaded by the system by means of another HTTPS web-service.
  • Once the .torrent file has been downloaded the communication module reads it, and looks for every file's SHA1 digest into the DHT in order to find an owner for that file piece.
  • Every time a file's part is download the communication module re-calculates the SHA1 digest and compares it with the one into the .torrent in order to prevent corrupted files. The same process is done at the end with the entire file.

[edit] Forced Sharing

In order to optimize network's performance, every time a peer finish downloading a file's part this is sent to peer's "neighbours". By this way more sources are available for the same resource, DHT must be update and network performance is improved in terms of bandwith and download speed.

[edit] Issue 1: Network Topology

This is the main problem of a completely decentralized system, because there are no central servers to track all the peers of a network. A solution could be to maintain another DHT, filled with IP:PORT of each peer. Every time a peer joins or leaves the network this DHT must be updated and propagated to all the peers.

[edit] Issue 2: DHT, how to?

DHT is the core element to keep track of peers and resources, how to implement it? A solution is to have a Java Class containing a collection of DHT entries, like ArrayList<String>. Every time a peer leaves the network it saves this ArrayList in a file, like a CSV or XML. Every time a peer wants to join the network, it creates a new ArrayList from its previous saved CSV or XML file, then it looks for an update into the net.

Personal tools