Kademlia Documentation

Note

This library assumes you have a working familiarity with asyncio.

This library is an asynchronous Python implementation of the Kademlia distributed hash table. It uses asyncio to provide asynchronous communication. The nodes communicate using RPC over UDP to communiate, meaning that it is capable of working behind a NAT.

This library aims to be as close to a reference implementation of the Kademlia paper as possible.

Installation

The easiest (and best) way to install kademlia is through pip:

$ pip install kademlia

Usage

To start a new network, create the first node. Future nodes will connect to this first node (and any other nodes you know about) to create the network.

import logging
import asyncio

from kademlia.network import Server

handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
log = logging.getLogger('kademlia')
log.addHandler(handler)
log.setLevel(logging.DEBUG)


loop = asyncio.get_event_loop()
loop.set_debug(True)

server = Server()
loop.run_until_complete(server.listen(8468))

try:
    loop.run_forever()
except KeyboardInterrupt:
    pass
finally:
    server.stop()
    loop.close()

Here’s an example of bootstrapping a new node against a known node and then setting a value:

import logging
import asyncio
import sys

from kademlia.network import Server

if len(sys.argv) != 5:
    print("Usage: python set.py <bootstrap node> <bootstrap port> <key> <value>")
    sys.exit(1)

handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
log = logging.getLogger('kademlia')
log.addHandler(handler)
log.setLevel(logging.DEBUG)

async def run():
    server = Server()
    await server.listen(8469)
    bootstrap_node = (sys.argv[1], int(sys.argv[2]))
    await server.bootstrap([bootstrap_node])
    await server.set(sys.argv[3], sys.argv[4])
    server.stop()

asyncio.run(run())

Note

You must have at least two nodes running to store values. If a node tries to store a value and there are no other nodes to provide redundancy, then it is an exception state.

Running Tests

To run tests:

$ pip install -r dev-requirements.txt
$ pytest

Fidelity to Original Paper

The current implementation should be an accurate implementation of all aspects of the paper except one - in Section 2.3 there is the requirement that the original publisher of a key/value republish it every 24 hours. This library does not do this (though you can easily do this manually).

Querying the DHT

If you just want to query the network, you can use the example query script. For instance:

$ python examples/get.py 1.2.3.4 8468 SpecialKey

The query script is simple:

import logging
import asyncio
import sys

from kademlia.network import Server

if len(sys.argv) != 4:
    print("Usage: python get.py <bootstrap node> <bootstrap port> <key>")
    sys.exit(1)

handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
log = logging.getLogger('kademlia')
log.addHandler(handler)
log.setLevel(logging.DEBUG)

async def run():
    server = Server()
    await server.listen(8469)
    bootstrap_node = (sys.argv[1], int(sys.argv[2]))
    await server.bootstrap([bootstrap_node])

    result = await server.get(sys.argv[3])
    print("Get result:", result)
    server.stop()

asyncio.run(run())

Check out the examples folder for other examples.

Kademlia API

Kademlia is a Python implementation of the Kademlia protocol which utilizes the asyncio library.

The best place to start is the examples folder before diving into the API.

kademlia package

The best place to start is the examples folder before diving into the API.

kademlia.crawling module

class kademlia.crawling.NodeSpiderCrawl(protocol, node, peers, ksize, alpha)[source]

Bases: kademlia.crawling.SpiderCrawl

Create a new C{SpiderCrawl}er.

Parameters
  • protocol – A KademliaProtocol instance.

  • node – A Node representing the key we’re looking for

  • peers – A list of Node instances that provide the entry point for the network

  • ksize – The value for k based on the paper

  • alpha – The value for alpha based on the paper

async find()[source]

Find the closest nodes.

class kademlia.crawling.RPCFindResponse(response)[source]

Bases: object

A wrapper for the result of a RPC find.

Parameters

response – This will be a tuple of (<response received>, <value>) where <value> will be a list of tuples if not found or a dictionary of {‘value’: v} where v is the value desired

get_node_list()[source]

Get the node list in the response. If there’s no value, this should be set.

get_value()[source]
happened()[source]

Did the other host actually respond?

has_value()[source]
class kademlia.crawling.SpiderCrawl(protocol, node, peers, ksize, alpha)[source]

Bases: object

Crawl the network and look for given 160-bit keys.

Create a new C{SpiderCrawl}er.

Parameters
  • protocol – A KademliaProtocol instance.

  • node – A Node representing the key we’re looking for

  • peers – A list of Node instances that provide the entry point for the network

  • ksize – The value for k based on the paper

  • alpha – The value for alpha based on the paper

class kademlia.crawling.ValueSpiderCrawl(protocol, node, peers, ksize, alpha)[source]

Bases: kademlia.crawling.SpiderCrawl

Create a new C{SpiderCrawl}er.

Parameters
  • protocol – A KademliaProtocol instance.

  • node – A Node representing the key we’re looking for

  • peers – A list of Node instances that provide the entry point for the network

  • ksize – The value for k based on the paper

  • alpha – The value for alpha based on the paper

async find()[source]

Find either the closest nodes or the value requested.

kademlia.network module

kademlia.node module

class kademlia.node.Node(node_id, ip=None, port=None)[source]

Bases: object

Simple object to encapsulate the concept of a Node (minimally an ID, but also possibly an IP and port if this represents a node on the network). This class should generally not be instantiated directly, as it is a low level construct mostly used by the router.

Create a Node instance.

Parameters
  • node_id (int) – A value between 0 and 2^160

  • ip (string) – Optional IP address where this Node lives

  • port (int) – Optional port for this Node (set when IP is set)

distance_to(node)[source]

Get the distance between this node and another.

same_home_as(node)[source]
class kademlia.node.NodeHeap(node, maxsize)[source]

Bases: object

A heap of nodes ordered by distance to a given node.

Constructor.

@param node: The node to measure all distnaces from. @param maxsize: The maximum size that this heap can grow to.

get_ids()[source]
get_node(node_id)[source]
get_uncontacted()[source]
have_contacted_all()[source]
mark_contacted(node)[source]
popleft()[source]
push(nodes)[source]

Push nodes onto heap.

@param nodes: This can be a single item or a C{list}.

remove(peers)[source]

Remove a list of peer ids from this heap. Note that while this heap retains a constant visible size (based on the iterator), it’s actual size may be quite a bit larger than what’s exposed. Therefore, removal of nodes may not change the visible size as previously added nodes suddenly become visible.

kademlia.protocol module

kademlia.routing module

class kademlia.routing.KBucket(rangeLower, rangeUpper, ksize, replacementNodeFactor=5)[source]

Bases: object

add_node(node)[source]

Add a C{Node} to the C{KBucket}. Return True if successful, False if the bucket is full.

If the bucket is full, keep track of node in a replacement list, per section 4.1 of the paper.

depth()[source]
get_nodes()[source]
has_in_range(node)[source]
head()[source]
is_new_node(node)[source]
remove_node(node)[source]
split()[source]
touch_last_updated()[source]
class kademlia.routing.RoutingTable(protocol, ksize, node)[source]

Bases: object

@param node: The node that represents this server. It won’t be added to the routing table, but will be needed later to determine which buckets to split or not.

add_contact(node)[source]
find_neighbors(node, k=None, exclude=None)[source]
flush()[source]
get_bucket_for(node)[source]

Get the index of the bucket that the given node would fall into.

is_new_node(node)[source]
lonely_buckets()[source]

Get all of the buckets that haven’t been updated in over an hour.

remove_contact(node)[source]
split_bucket(index)[source]
class kademlia.routing.TableTraverser(table, startNode)[source]

Bases: object

kademlia.storage module

class kademlia.storage.ForgetfulStorage(ttl=604800)[source]

Bases: kademlia.storage.IStorage

By default, max age is a week.

__getitem__(key)[source]

Get the given key. If item doesn’t exist, raises C{KeyError}

__setitem__(key, value)[source]

Set a key to the given value.

cull()[source]
get(key, default=None)[source]

Get given key. If not found, return default.

iter_older_than(seconds_old)[source]

Return the an iterator over (key, value) tuples for items older than the given secondsOld.

class kademlia.storage.IStorage[source]

Bases: abc.ABC

Local storage for this node. IStorage implementations of get must return the same type as put in by set

abstract __getitem__(key)[source]

Get the given key. If item doesn’t exist, raises C{KeyError}

abstract __setitem__(key, value)[source]

Set a key to the given value.

abstract get(key, default=None)[source]

Get given key. If not found, return default.

abstract iter_older_than(seconds_old)[source]

Return the an iterator over (key, value) tuples for items older than the given secondsOld.

kademlia.utils module

General catchall for functions that don’t make sense as methods.

kademlia.utils.bytes_to_bit_string(bites)[source]
kademlia.utils.digest(string)[source]
async kademlia.utils.gather_dict(dic)[source]
kademlia.utils.shared_prefix(args)[source]

Find the shared prefix between the strings.

For instance:

sharedPrefix([‘blahblah’, ‘blahwhat’])

returns ‘blah’.

Indices and tables