Migrated over discussion documents

This commit is contained in:
str4d
2012-12-11 03:09:07 +00:00
parent 2e4c4ed746
commit ceed5eb5bc
3 changed files with 7 additions and 7 deletions

View File

@@ -0,0 +1,230 @@
{% extends "global/layout.html" %}
{% block title %}Naming discussion{% endblock %}
{% block content %}
<p>
NOTE: The following is a discussion of the reasons behind the I2P naming system,
common arguments and possible alternatives.
See <a href="naming.html">the naming page</a> for current documentation.
</p>
<h2>Discarded alternatives</h2>
<p>
Naming within I2P has been an oft-debated topic since the very beginning with
advocates across the spectrum of possibilities. However, given I2P's inherent
demand for secure communication and decentralized operation, the traditional
DNS-style naming system is clearly out, as are "majority rules" voting systems.
</p>
<p>
I2P does not promote the use of DNS-like services though, as the damage done
by hijacking a site can be tremendous - and insecure destinations have no
value. DNSsec itself still falls back on registrars and certificate authorities,
while with I2P, requests sent to a destination cannot be intercepted or the reply
spoofed, as they are encrypted to the destination's public keys, and a destination
itself is just a pair of public keys and a certificate. DNS-style systems on the
other hand allow any of the name servers on the lookup path to mount simple denial
of service and spoofing attacks. Adding on a certificate authenticating the
responses as signed by some centralized certificate authority would address many of
the hostile nameserver issues but would leave open replay attacks as well as
hostile certificate authority attacks.
</p>
<p>
Voting style naming is dangerous as well, especially given the effectiveness of
Sybil attacks in anonymous systems - the attacker can simply create an arbitrarily
high number of peers and "vote" with each to take over a given name. Proof-of-work
methods can be used to make identity non-free, but as the network grows the load
required to contact everyone to conduct online voting is implausible, or if the
full network is not queried, different sets of answers may be reachable.
</p>
<p>
As with the Internet however, I2P is keeping the design and operation of a
naming system out of the (IP-like) communication layer. The bundled naming library
includes a simple service provider interface which <a href="#alternatives">alternate naming systems</a> can
plug into, allowing end users to drive what sort of naming tradeoffs they prefer.
</p>
<h2>Discussion</h2>
<p>
See also <a href="https://zooko.com/distnames.html">Names: Decentralized, Secure, Human-Meaningful: Choose Two</a>.
</p>
<h3>Comments by jrandom</h3>
<p>(adapted from a post in the old Syndie, November 26, 2005)</p>
<p>
Q:
What to do if some hosts
do not agree on one address and if some addresses are working, others are not?
Who is the right source of a name?
</p><p>
A:
You don't. This is actually a critical difference between names on I2P and how
DNS works - names in I2P are human readable, secure, but <b>not globally
unique</b>. This is by design, and an inherent part of our need for security.
</p><p>
If I could somehow convince you to change the destination associated with some
name, I'd successfully "take over" the site, and under no circumstances is that
acceptable. Instead, what we do is make names <b>locally unique</b>: they are
what <i>you</i> use to call a site, just as how you can call things whatever
you want when you add them to your browser's bookmarks, or your IM client's
buddy list. Who you call "Boss" may be who someone else calls "Sally".
</p><p>
Names will not, ever, be securely human readable and globally unique.
</p>
<h3>Comments by zzz</h3>
<p>The following from zzz is a review of several common
complaints about I2P's naming system.
<ul>
<li>Inefficiency<br/>
The whole hosts.txt is downloaded (if it has changed, since eepget uses the etag and last-modified headers).
It's about 400K right now for almost 800 hosts.
<p>
True, but this isn't a lot of traffic in the context of i2p, which is itself wildly inefficient
(floodfill databases, huge encryption overhead and padding, garlic routing, etc.).
If you downloaded a hosts.txt file from someone every 12 hours it averages out to about 10 bytes/sec.
<p>
As is usually the case in i2p, there is a fundamental tradeoff here between anonymity and efficiency.
Some would say that using the etag and last-modified headers is hazardous because it exposes when you
last requested the data.
Others have suggested asking for specific keys only (similar to what jump services do, but
in a more automated fashion), possibly at a further cost in anonymity.
<p>
Possible improvements would be a replacement or supplement to addressbook (see <a href="http://i2host.i2p/">i2host.i2p</a>),
or something simple like subscribing to http://example.i2p/cgi-bin/recenthosts.cgi rather than http://example.i2p/hosts.txt.
If a hypothetical recenthosts.cgi distributed all hosts from the last 24 hours, for example,
that could be both more efficient and more anonymous than the current hosts.txt with last-modified and etag.
<p>
A sample implementation is on stats.i2p at
<a href="http://stats.i2p/cgi-bin/newhosts.txt">http://stats.i2p/cgi-bin/newhosts.txt</a>.
This script returns an Etag with a timestamp.
When a request comes in with the If-None-Match etag,
the script ONLY returns new hosts since that timestamp, or 304 Not Modified if there are none.
In this way, the script efficiently returns only the hosts the subscriber
does not know about, in an addressbook-compatible manner.
<p>
So the inefficiency is not a big issue and there are several ways to improve things without
radical change.
<li>Not Scalable<br/>
The 400K hosts.txt (with linear search) isn't that big at the moment and
we can probably grow by 10x or 100x before it's a problem.
<p>
As far as network traffic see above.
But unless you're going to do a slow real-time query over the network for
a key, you need to have the whole set of keys stored locally, at a cost of about 500 bytes per key.
<li>Requires configuration and "trust"<br/>
Out-of-the-box addressbook is only subscribed to http://www.i2p2.i2p/hosts.txt, which is rarely updated,
leading to poor new-user experience.
<p>
This is very much intentional. jrandom wants a user to "trust" a hosts.txt
provider, and as he likes to say, "trust is not a boolean".
The configuration step attempts to force users to think about issues of trust in an anonymous network.
<p>
As another example, the "Eepsite Unknown" error page in the HTTP Proxy
lists some jump services, but doesn't "recommend" any one in particular,
and it's up to the user to pick one (or not).
jrandom would say we trust the listed providers enough to list them but not enough to
automatically go fetch the key from them.
<p>
How successful this is, I'm not sure.
But there must be some sort of hierarchy of trust for the naming system.
To treat everyone equally may increase the risk of hijacking.
<li>It isn't DNS<br/>
Unfortunately real-time lookups over i2p would significantly slow down web browsing.
<p>
Also, DNS is based on lookups with limited caching and time-to-live, while i2p
keys are permanent.
<p>
Sure, we could make it work, but why? It's a bad fit.
<li>Not reliable<br/>
It depends on specific servers for addressbook subscriptions.
<p>
Yes it depends on a few servers that you have configured.
Within i2p, servers and services come and go.
Any other centralized system (for example DNS root servers) would
have the same problem. A completely decentralized system (everybody is authoritative)
is possible by implementing an "everybody is a root DNS server" solution, or by
something even simpler, like a script that adds everybody in your hosts.txt to your addressbook.
<p>
People advocating all-authoritative solutions generally haven't thought through
the issues of conflicts and hijacking, however.
<li>Awkward, not real-time<br/>
It's a patchwork of hosts.txt providers, key-add web form providers, jump service providers,
eepsite status reporters.
Jump servers and subscriptions are a pain, it should just work like DNS.
<p>
See the reliability and trust sections.
</p>
</ul>
<p>So, in summary, the current system is not horribly broken, inefficient, or un-scalable,
and proposals to "just use DNS" aren't well thought-through.
</p>
<h2 id="alternatives">Alternatives</h2>
<p>The I2P source contains several pluggable naming systems and supports configuration options
to enable experimentation with naming systems.
<ul>
<li><b>Meta</b> - calls two or more other naming systems in order.
By default, calls PetName then HostsTxt.
<li><b>PetName</b> - Looks up in a petnames.txt file.
The format for this file is NOT the same as hosts.txt.
<li><b>HostsTxt</b> - Looks up in the following files, in order:
<ol>
<li>privatehosts.txt
<li>userhosts.txt
<li>hosts.txt
</ol>
<li><b>AddressDB</b> - Each host is listed in a separate file in a addressDb/ directory.
<li>
<b>Eepget</b> - does an HTTP lookup request from an external
server - must be stacked after the HostsTxt lookup with Meta.
This could augment or replace the jump system.
Includes in-memory caching.
<li>
<b>Exec</b> - calls an external program for lookup, allows
additional experimentation in lookup schemes, independent of java.
Can be used after HostsTxt or as the sole naming system.
Includes in-memory caching.
<li><b>Dummy</b> - used as a fallback for Base64 names, otherwise fails.
</ul>
<p>
The current naming system can be changed with the advanced config option 'i2p.naming.impl'
(restart required).
See core/java/src/net/i2p/client/naming for details.
<p>
Any new system should be stacked with HostsTxt, or should
implement local storage and/or the addressbook subscription functions, since addressbook
only knows about the hosts.txt files and format.
<h2 id="certificates">Certificates</h2>
<p>
I2P destinations contain a certificate, however at the moment that certificate
is always null.
With a null certificate, base64 destinations are always 516 bytes ending in "AAAA",
and this is checked in the addressbook merge mechanism, and possibly other places.
Also, there is no method available to generate a certificate or add it to a
destination. So these will have to be updated to implement certificates.
</p><p>
One possible use of certificates is for <a href="todo.html#hashcash">proof of work</a>.
</p><p>
Another is for "subdomains" (in quotes because there is really no such thing,
i2p uses a flat naming system) to be signed by the 2nd level domain's keys.
</p><p>
With any certificate implementation must come the method for verifying the
certificates.
Presumably this would happen in the addressbook merge code.
Is there a method for multiple types of certificates, or multiple certificates?
</p><p>
Adding on a certificate authenticating the
responses as signed by some centralized certificate authority would address many of
the hostile nameserver issues but would leave open replay attacks as well as
hostile certificate authority attacks.
</p>
{% endblock %}

View File

@@ -0,0 +1,428 @@
{% extends "global/layout.html" %}
{% block title %}Network Database Discussion{% endblock %}
{% block content %}
<p>
NOTE: The following is a discussion of the history of netdb implementation and is not current information.
See <a href="{{ site_url('docs/how/networkdatabase') }}">the main netdb page</a> for current documentation</a>.
<h2><a name="status">History</a></h2>
<p>
The netDb is distributed with a simple technique called "floodfill".
Long ago, the netDb also used the Kademlia DHT as a fallback algorithm. However,
it did not work well in our application, and it was completely disabled
in release 0.6.1.20.
<p>
(Adapted from a post by jrandom in the old Syndie, Nov. 26, 2005)
<br />
The floodfill netDb is really just a simple and perhaps temporary measure,
using the simplest possible algorithm - send the data to a peer in the
floodfill netDb, wait 10 seconds, pick a random peer in the netDb and ask them
for the entry to be sent, verifying its proper insertion / distribution. If the
verification peer doesn't reply, or they don't have the entry, the sender
repeats the process. When the peer in the floodfill netDb receives a netDb
store from a peer not in the floodfill netDb, they send it to all of the peers
in the floodfill netDb.
</p><p>
At one point, the Kademlia
search/store functionality was still in place. The peers
considered the floodfill peers as always being 'closer' to every key than any
peer not participating in the netDb. We fell back on the Kademlia
netDb if the floodfill peers fail for some reason or another.
However, Kademlia was then disabled completely (see below).
<p>
More recently, Kademlia was partially reintroduced in late 2009, as a way
to limit the size of the netdb each floodfill router must store.
<h3>The Introduction of the Floodfill Algorithm</h3>
<p>
Floodfill was introduced in release 0.6.0.4, keeping Kademlia as a backup algorithm.
</p>
<p>
(Adapted from posts by jrandom in the old Syndie, Nov. 26, 2005)
<br />
As I've often said, I'm not particularly bound to any specific technology -
what matters to me is what will get results. While I've been working through
various netDb ideas over the last few years, the issues we've faced in the last
few weeks have brought some of them to a head. On the live net,
with the netDb redundancy factor set to 4 peers (meaning we keep sending an
entry to new peers until 4 of them confirm that they've got it) and the
per-peer timeout set to 4 times that peer's average reply time, we're
<b>still</b> getting an average of 40-60 peers sent to before 4 ACK the store.
That means sending 36-56 times as many messages as should go out, each using
tunnels and thereby crossing 2-4 links. Even further, that value is heavily
skewed, as the average number of peers sent to in a 'failed' store (meaning
less than 4 people ACKed the message after 60 seconds of sending messages out)
was in the 130-160 peers range.
</p><p>
This is insane, especially for a network with only perhaps 250 peers on it.
</p><p>
The simplest answer is to say "well, duh jrandom, it's broken. fix it", but
that doesn't quite get to the core of the issue. In line with another current
effort, it's likely that we have a substantial number of network issues due to
restricted routes - peers who cannot talk with some other peers, often due to
NAT or firewall issues. If, say, the K peers closest to a particular netDb
entry are behind a 'restricted route' such that the netDb store message could
reach them but some other peer's netDb lookup message could not, that entry
would be essentially unreachable. Following down those lines a bit further and
taking into consideration the fact that some restricted routes will be created
with hostile intent, its clear that we're going to have to look closer into a
long term netDb solution.
</p><p>
There are a few alternatives, but two worth mentioning in particular. The
first is to simply run the netDb as a Kademlia DHT using a subset of the full
network, where all of those peers are externally reachable. Peers who are not
participating in the netDb still query those peers but they don't receive
unsolicited netDb store or lookup messages. Participation in the netDb would
be both self-selecting and user-eliminating - routers would choose whether to
publish a flag in their routerInfo stating whether they want to participate
while each router chooses which peers it wants to treat as part of the netDb
(peers who publish that flag but who never give any useful data would be
ignored, essentially eliminating them from the netDb).
</p><p>
Another alternative is a blast from the past, going back to the DTSTTCPW
(Do The Simplest Thing That Could Possibly Work)
mentality - a floodfill netDb, but like the alternative above, using only a
subset of the full network. When a user wants to publish an entry into the
floodfill netDb, they simply send it to one of the participating routers, wait
for an ACK, and then 30 seconds later, query another random participant in the
floodfill netDb to verify that it was properly distributed. If it was, great,
and if it wasn't, just repeat the process. When a floodfill router receives a
netDb store, they ACK immediately and queue off the netDb store to all of its
known netDb peers. When a floodfill router receives a netDb lookup, if they
have the data, they reply with it, but if they don't, they reply with the
hashes for, say, 20 other peers in the floodfill netDb.
</p><p>
Looking at it from a network economics perspective, the floodfill netDb is
quite similar to the original broadcast netDb, except the cost for publishing
an entry is borne mostly by peers in the netDb, rather than by the publisher.
Fleshing this out a bit further and treating the netDb like a blackbox, we can
see the total bandwidth required by the netDb to be:<pre>
recvKBps = N * (L + 1) * (1 + F) * (1 + R) * S / T
</pre>where<pre>
N = number of routers in the entire network
L = average number of client destinations on each router
(+1 for the routerInfo)
F = tunnel failure percentage
R = tunnel rebuild period, as a fraction of the tunnel lifetime
S = average netDb entry size
T = tunnel lifetime
</pre>Plugging in a few values:<pre>
recvKBps = 1000 * (5 + 1) * (1 + 0.05) * (1 + 0.2) * 2KB / 10m
= 25.2KBps
</pre>That, in turn, scales linearly with N (at 100,000 peers, the netDb must
be able to handle netDb store messages totaling 2.5MBps, or, at 300 peers,
7.6KBps).
</p><p>
While the floodfill netDb would have each netDb participant receiving only a
small fraction of the client generated netDb stores directly, they would all
receive all entries eventually, so all of their links should be capable of
handling the full recvKBps. In turn, they'll all need to send
<tt>(recvKBps/sizeof(netDb)) * (sizeof(netDb)-1)</tt> to keep the other
peers in sync.
</p><p>
A floodfill netDb would not require either tunnel routing for netDb operation
or any special selection as to which entries it can answer 'safely', as the
basic assumption is that they are all storing everything. Oh, and with regards
to the netDb disk usage required, its still fairly trivial for any modern
machine, requiring around 11MB for every 1000 peers <tt>(N * (L + 1) *
S)</tt>.
</p><p>
The Kademlia netDb would cut down on these numbers, ideally bringing them to K
over M times their value, with K = the redundancy factor and M being the number
of routers in the netDb (e.g. 5/100, giving a recvKBps of 126KBps and 536MB at
100,000 routers). The downside of the Kademlia netDb though is the increased
complexity of safe operation in a hostile environment.
</p><p>
What I'm thinking about now is to simply implement and deploy a floodfill netDb
in our existing live network, letting peers who want to use it pick out other
peers who are flagged as members and query them instead of querying the
traditional Kademlia netDb peers. The bandwidth and disk requirements at this
stage are trivial enough (7.6KBps and 3MB disk space) and it will remove the
netDb entirely from the debugging plan - issues that remain to be addressed
will be caused by something unrelated to the netDb.
</p><p>
How would peers be chosen to publish that flag saying they are a part of the
floodfill netDb? At the beginning, it could be done manually as an advanced
config option (ignored if the router is not able to verify its external
reachability). If too many peers set that flag, how do the netDb participants
pick which ones to eject? Again, at the beginning it could be done manually as
an advanced config option (after dropping peers which are unreachable). How do
we avoid netDb partitioning? By having the routers verify that the netDb is
doing the flood fill properly by querying K random netDb peers. How do routers
not participating in the netDb discover new routers to tunnel through? Perhaps
this could be done by sending a particular netDb lookup so that the netDb
router would respond not with peers in the netDb, but with random peers outside
the netDb.
</p><p>
I2P's netDb is very different from traditional load bearing DHTs - it only
carries network metadata, not any actual payload, which is why even a netDb
using a floodfill algorithm will be able to sustain an arbitrary amount of
eepsite/IRC/bt/mail/syndie/etc data. We can even do some optimizations as I2P
grows to distribute that load a bit further (perhaps passing bloom filters
between the netDb participants to see what they need to share), but it seems we
can get by with a much simpler solution for now.
</p><p>
One fact may be worth digging
into - not all leaseSets need to be published in the netDb! In fact, most
don't need to be - only those for destinations which will be receiving
unsolicited messages (aka servers). This is because the garlic wrapped
messages sent from one destination to another already bundles the sender's
leaseSet so that any subsequent send/recv between those two destinations
(within a short period of time) work without any netDb activity.
</p><p>
So, back at those equations, we can change L from 5 to something like 0.1
(assuming only 1 out of every 50 destinations is a server). The previous
equations also brushed over the network load required to answer queries from
clients, but while that is highly variable (based on the user activity), it's
also very likely to be quite insignificant as compared to the publishing
frequency.
</p><p>
Anyway, still no magic, but a nice reduction of nearly 1/5th the bandwidth/disk
space required (perhaps more later, depending upon whether the routerInfo
distribution goes directly as part of the peer establishment or only through
the netDb).
</p>
<h3>The Disabling of the Kademlia Algorithm</h3>
<p>
Kademlia was completely disabled in release 0.6.1.20.
</p><p>
(this is adapted from an IRC conversation with jrandom 11/07)
<br />
Kademlia requires a minimum level of service that the baseline could not offer (bandwidth, cpu),
even after adding in tiers (pure kad is absurd on that point).
Kademlia just wouldn't work. It was a nice idea, but not for a hostile and fluid environment.
</p>
<h3>Current Status</h3>
<p>The netDb plays a very specific role in the I2P network, and the algorithms
have been tuned towards our needs. This also means that it hasn't been tuned
to address the needs we have yet to run into. I2P is currently
fairly small (a few hundred routers).
There were some calculations that 3-5 floodfill routers should be able to handle
10,000 nodes in the network.
The netDb implementation more than adequately meets our
needs at the moment, but there will likely be further tuning and bugfixing as
the network grows.</p>
<h3>Update of Calculations 03-2008</h3>
<p>Current numbers:
<pre>
recvKBps = N * (L + 1) * (1 + F) * (1 + R) * S / T
</pre>where<pre>
N = number of routers in the entire network
L = average number of client destinations on each router
(+1 for the routerInfo)
F = tunnel failure percentage
R = tunnel rebuild period, as a fraction of the tunnel lifetime
S = average netDb entry size
T = tunnel lifetime
</pre>
Changes in assumptions:
<ul>
<li>L is now about .5, compared to .1 above, due to the popularity of i2psnark
and other apps.
<li>F is about .33, but bugs in tunnel testing are fixed in 0.6.1.33, so it will get much better.
<li>Since netDb is about 2/3 5K routerInfos and 1/3 2K leaseSets, S = 4K.
RouterInfo size is shrinking in 0.6.1.32 and 0.6.1.33 as we remove unnecessary stats.
<li>R = tunnel build period: 0.2 was a very low - it was maybe 0.7 -
but build algorithm improvements in 0.6.1.32 should bring it down to about 0.2
as the network upgrades. Call it 0.5 now with half the network at .30 or earlier.
</ul>
<pre> recvKBps = 700 * (0.5 + 1) * (1 + 0.33) * (1 + 0.5) * 4KB / 10m
~= 28KBps
</pre>
This just accounts for the stores - what about the queries?
<h3>The Return of the Kademlia Algorithm?</h3>
<p>
(this is adapted from <a href="{{ url_for('meetings_show', id=195') }}">the I2P meeting Jan. 2, 2007</a>)
<br />
The Kademlia netDb just wasn't working properly.
Is it dead forever or will it be coming back?
If it comes back, the peers in the Kademlia netDb would be a very limited subset
of the routers in the network (basically an expanded number of floodfill peers, if/when the floodfill peers
cannot handle the load).
But until the floodfill peers cannot handle the load (and other peers cannot be added that can), it's unnecessary.
</p>
<h3>The Future of Floodfill</h3>
<p>
(this is adapted from an IRC conversation with jrandom 11/07)
<br />
Here's a proposal: Capacity class O is automatically floodfill.
Hmm.
Unless we're sure, we might end up with a fancy way of DDoS'ing all O class routers.
This is quite the case: we want to make sure the number of floodfill is as small as possible while providing sufficient reachability.
If/when netDb requests fail, then we need to increase the number of floodfill peers, but atm, I'm not aware of a netDb fetch problem.
There are 33 "O" class peers according to my records.
33 is a /lot/ to floodfill to.
</p><p>
So floodfill works best when the number of peers in that pool is firmly limited?
And the size of the floodfill pool shouldn't grow much, even if the network itself gradually would?
3-5 floodfill peers can handle 10K routers iirc (I posted a bunch of numbers on that explaining the details in the old syndie).
Sounds like a difficult requirement to fill with automatic opt-in,
especially if nodes opting in cannot trust data from others.
e.g. "let's see if I'm among the top 5",
and can only trust data about themselves (e.g. "I am definitely O class, and moving 150 KB/s, and up for 123 days").
And top 5 is hostile as well. Basically, it's the same as the tor directory servers - chosen by trusted people (aka devs).
Yeah, right now it could be exploited by opt-in, but that'd be trivial to detect and deal with.
Seems like in the end, we might need something more useful than Kademlia, and have only reasonably capable peers join that scheme.
N class and above should be a big enough quantity to suppress risk of an adversary causing denial of service, I'd hope.
But it would have to be different from floodfill then, in the sense that it wouldn't cause humongous traffic.
Large quantity? For a DHT based netDb?
Not necessarily DHT-based.
</p>
<h3 id="todo">Floodfill TODO List</h3>
<p>
NOTE: The following is not current information.
See <a href="{{ site_url('docs/how/networkdatabase') }}">the main netdb page</a> for the current status and a list of future work</a>.
<p>
The network was down to only one floodfill for a couple of hours on March 13, 2008
(approx. 18:00 - 20:00 UTC),
and it caused a lot of trouble.
<p>
Two changes implemented in 0.6.1.33 should reduce the disruption caused
by floodfill peer removal or churn:
<ol>
<li>Randomize the floodfill peers used for search each time.
This will get you past the failing ones eventually.
This change also fixed a nasty bug that would sometimes drive the ff search code insane.
<li>Prefer the floodfill peers that are up.
The code now avoids peers that are shitlisted, failing, or not heard from in
half an hour, if possible.
</ol>
<p>
One benefit is faster first contact to an eepsite (i.e. when you had to fetch
the leaseset first). The lookup timeout is 10s, so if you don't start out by
asking a peer that is down, you can save 10s.
<p>
There <i>may</i> be anonymity implications in these changes.
For example, in the floodfill <b>store</b> code, there are comments that
shitlisted peers are not avoided, since a peer could be "shitty" and then
see what happens.
Searches are much less vulnerable than stores -
they're much less frequent, and give less away.
So maybe we don't think we need to worry about it?
But if we want to tweak the changes, it would be easy to
send to a peer listed as "down" or shitlisted anyway, just not
count it as part of the 2 we are sending to
(since we don't really expect a reply).
<p>
There are several places where a floodfill peer is selected - this fix addresses only one -
who a regular peer searches from [2 at a time].
Other places where better floodfill selection should be implemented:
<ol>
<li>Who a regular peer stores to [1 at a time]
(random - need to add qualification, because timeouts are long)
<li>Who a regular peer searches to verify a store [1 at a time]
(random - need to add qualification, because timeouts are long)
<li>Who a ff peer sends in reply to a failed search (3 closest to the search)
<li>Who a ff peer floods to (all other ff peers)
<li>The list of ff peers sent in the NTCP every-6-hour "whisper"
(although this may not longer be necessary due to other ff improvements)
</ol>
<p>
Lots more that could and should be done -
<ul>
<li>
Use the "dbHistory" stats to better rate a floodfill peer's integration
<li>
Use the "dbHistory" stats to immediately react to floodfill peers that don't respond
<li>
Be smarter on retries - retries are handled by an upper layer, not in
FloodOnlySearchJob, so it does another random sort and tries again,
rather than purposefully skipping the ff peers we just tried.
<li>
Improve integration stats more
<li>
Actually use integration stats rather than just floodfill indication in netDb
<li>
Use latency stats too?
<li>
More improvement on recognizing failing floodfill peers
</ul>
<p>
Recently completed -
<ul>
<li>
[In Release 0.6.3]
Implement automatic opt-in
to floodfill for some percentage of class O peers, based on analysis of the network.
<li>
[In Release 0.6.3]
Continue to reduce netDb entry size to reduce floodfill traffic -
we are now at the minimum number of stats required to monitor the network.
<li>
[In Release 0.6.3]
Manual list of floodfill peers to exclude?
(<a href="{{ site_url('docs/how/threatmodel') }}#blocklist">blocklists</a> by router ident)
<li>
[In Release 0.6.3]
Better floodfill peer selection for stores:
Avoid peers whose netDb is old, or have a recent failed store,
or are forever-shitlisted.
<li>
[In Release 0.6.4]
Prefer already-connected floodfill peers for RouterInfo stores, to
reduce number of direct connections to floodfill peers.
<li>
[In Release 0.6.5]
Peers who are no longer floodfill send their routerInfo in response
to a query, so that the router doing the query will know he
is no longer floodfill.
<li>
[In Release 0.6.5]
Further tuning of the requirements to automatically become floodfill
<li>
[In Release 0.6.5]
Fix response time profiling in preparation for favoring fast floodfills
<li>
[In Release 0.6.5]
Improve blocklisting
<li>
[In Release 0.7]
Fix netDb exploration
<li>
[In Release 0.7]
Turn blocklisting on by default, block the known troublemakers
<li>
[Several improvements in recent releases, a continuing effort]
Reduce the resource demands on high-bandwidth and floodfill routers
</ul>
<p>
That's a long list but it will take that much work to
have a network that's resistant to DOS from lots of peers turning the floodfill switch on and off.
Or pretending to be a floodfill router.
None of this was a problem when we had only two ff routers, and they were both up
24/7. Again, jrandom's absence has pointed us to places that need improvement.
</p><p>
To assist in this effort, additional profile data for floodfill peers are
now (as of release 0.6.1.33) displayed on the "Profiles" page in
the router console.
We will use this to analyze which data are appropriate for
rating floodfill peers.
</p>
<p>
The network is currently quite resilient, however
we will continue to enhance our algorithms for measuring and reacting to the performance and reliability
of floodfill peers. While we are not, at the moment, fully hardened to the potential threats of
malicious floodfills or a floodfill DDOS, most of the infrastructure is in place,
and we are well-positioned to react quickly
should the need arise.
</p>
{% endblock %}

View File

@@ -0,0 +1,245 @@
{% extends "global/layout.html" %}
{% block title %}Tunnel Discussion{% endblock %}
{% block content %}
Note: This document contains older information about alternatives to the
current tunnel implementation in I2P,
and speculation on future possibilities. For current information see
<a href="tunnel-alt.html">the tunnel page</a>.
<p>
That page documents the current tunnel build implementation as of release 0.6.1.10.
The older tunnel build method, used prior to release 0.6.1.10, is documented on
<a href="tunnel.html">the old tunnel page</a>.
<h3 id="config">Configuration Alternatives</h3>
<p>Beyond their length, there may be additional configurable parameters
for each tunnel that can be used, such as a throttle on the frequency of
messages delivered, how padding should be used, how long a tunnel should be
in operation, whether to inject chaff messages, and what, if any, batching
strategies should be employed.
None of these are currently implemented.
</p>
<h3><a name="tunnel.padding">Padding Alternatives</a></h3>
<p>Several tunnel padding strategies are possible, each with their own merits:</p>
<ul>
<li>No padding</li>
<li>Padding to a random size</li>
<li>Padding to a fixed size</li>
<li>Padding to the closest KB</li>
<li>Padding to the closest exponential size (2^n bytes)</li>
</ul>
<p>These padding strategies can be used on a variety of levels, addressing the
exposure of message size information to different adversaries. After gathering
and reviewing some <a href="http://dev.i2p.net/~jrandom/messageSizes/">statistics</a>
from the 0.4 network, as well as exploring the anonymity tradeoffs, we're starting
with a fixed tunnel message size of 1024 bytes. Within this however, the fragmented
messages themselves are not padded by the tunnel at all (though for end to end
messages, they may be padded as part of the garlic wrapping).</p>
<h3><a name="tunnel.fragmentation">Fragmentation Alternatives</a></h3>
<p>To prevent adversaries from tagging the messages along the path by adjusting
the message size, all tunnel messages are a fixed 1024 bytes in size. To accommodate
larger I2NP messages as well as to support smaller ones more efficiently, the
gateway splits up the larger I2NP messages into fragments contained within each
tunnel message. The endpoint will attempt to rebuild the I2NP message from the
fragments for a short period of time, but will discard them as necessary.</p>
<p>Routers have a lot of leeway as to how the fragments are arranged, whether
they are stuffed inefficiently as discrete units, batched for a brief period to
fit more payload into the 1024 byte tunnel messages, or opportunistically padded
with other messages that the gateway wanted to send out.</p>
<h3><a name="tunnel.alternatives">More Alternatives</a></h3>
<h4><a name="tunnel.reroute">Adjust tunnel processing midstream</a></h4>
<p>While the simple tunnel routing algorithm should be sufficient for most cases,
there are three alternatives that can be explored:</p>
<ul>
<li>Have a peer other than the endpoint temporarily act as the termination
point for a tunnel by adjusting the encryption used at the gateway to give them
the plaintext of the preprocessed I2NP messages. Each peer could check to see
whether they had the plaintext, processing the message when received as if they
did.</li>
<li>Allow routers participating in a tunnel to remix the message before
forwarding it on - bouncing it through one of that peer's own outbound tunnels,
bearing instructions for delivery to the next hop.</li>
<li>Implement code for the tunnel creator to redefine a peer's "next hop" in
the tunnel, allowing further dynamic redirection.</li>
</ul>
<h4><a name="tunnel.bidirectional">Use bidirectional tunnels</a></h4>
<p>The current strategy of using two separate tunnels for inbound and outbound
communication is not the only technique available, and it does have anonymity
implications. On the positive side, by using separate tunnels it lessens the
traffic data exposed for analysis to participants in a tunnel - for instance,
peers in an outbound tunnel from a web browser would only see the traffic of
an HTTP GET, while the peers in an inbound tunnel would see the payload
delivered along the tunnel. With bidirectional tunnels, all participants would
have access to the fact that e.g. 1KB was sent in one direction, then 100KB
in the other. On the negative side, using unidirectional tunnels means that
there are two sets of peers which need to be profiled and accounted for, and
additional care must be taken to address the increased speed of predecessor
attacks. The tunnel pooling and building process outlined below should
minimize the worries of the predecessor attack, though if it were desired,
it wouldn't be much trouble to build both the inbound and outbound tunnels
along the same peers.</p>
<h4><a name="tunnel.backchannel">Backchannel communication</a></h4>
<p>At the moment, the IV values used are random values. However, it is
possible for that 16 byte value to be used to send control messages from the
gateway to the endpoint, or on outbound tunnels, from the gateway to any of the
peers. The inbound gateway could encode certain values in the IV once, which
the endpoint would be able to recover (since it knows the endpoint is also the
creator). For outbound tunnels, the creator could deliver certain values to the
participants during the tunnel creation (e.g. "if you see 0x0 as the IV, that
means X", "0x1 means Y", etc). Since the gateway on the outbound tunnel is also
the creator, they can build a IV so that any of the peers will receive the
correct value. The tunnel creator could even give the inbound tunnel gateway
a series of IV values which that gateway could use to communicate with
individual participants exactly one time (though this would have issues regarding
collusion detection)</p>
<p>This technique could later be used deliver message mid stream, or to allow the
inbound gateway to tell the endpoint that it is being DoS'ed or otherwise soon
to fail. At the moment, there are no plans to exploit this backchannel.</p>
<h4><a name="tunnel.variablesize">Variable size tunnel messages</a></h4>
<p>While the transport layer may have its own fixed or variable message size,
using its own fragmentation, the tunnel layer may instead use variable size
tunnel messages. The difference is an issue of threat models - a fixed size
at the transport layer helps reduce the information exposed to external
adversaries (though overall flow analysis still works), but for internal
adversaries (aka tunnel participants) the message size is exposed. Fixed size
tunnel messages help reduce the information exposed to tunnel participants, but
does not hide the information exposed to tunnel endpoints and gateways. Fixed
size end to end messages hide the information exposed to all peers in the
network.</p>
<p>As always, its a question of who I2P is trying to protect against. Variable
sized tunnel messages are dangerous, as they allow participants to use the
message size itself as a backchannel to other participants - e.g. if you see a
1337 byte message, you're on the same tunnel as another colluding peer. Even
with a fixed set of allowable sizes (1024, 2048, 4096, etc), that backchannel
still exists as peers could use the frequency of each size as the carrier (e.g.
two 1024 byte messages followed by an 8192). Smaller messages do incur the
overhead of the headers (IV, tunnel ID, hash portion, etc), but larger fixed size
messages either increase latency (due to batching) or dramatically increase
overhead (due to padding). Fragmentation helps amortize the overhead, at the
cost of potential message loss due to lost fragments.</p>
<p>Timing attacks are also relevant when reviewing the effectiveness of fixed
size messages, though they require a substantial view of network activity
patterns to be effective. Excessive artificial delays in the tunnel will be
detected by the tunnel's creator, due to periodic testing, causing that entire
tunnel to be scrapped and the profiles for peers within it to be adjusted.</p>
<h3><a name="tunnel.building.alternatives">Alternatives</a></h3>
Reference:
<a href="http://www-users.cs.umn.edu/~hopper/hashing_it_out.pdf">Hashing it out in Public</a>
<h4 id="tunnel.building.old">Old tunnel build method</h4>
The old tunnel build method, used prior to release 0.6.1.10, is documented on
<a href="tunnel.html">the old tunnel page</a>.
This was an "all at once" or "parallel" method,
where messages were sent in parallel to each of the participants.
<h4><a name="tunnel.building.telescoping">One-Shot Telescopic building</a></h4>
NOTE: This is the current method.
<p>One question that arose regarding the use of the exploratory tunnels for
sending and receiving tunnel creation messages is how that impacts the tunnel's
vulnerability to predecessor attacks. While the endpoints and gateways of
those tunnels will be randomly distributed across the network (perhaps even
including the tunnel creator in that set), another alternative is to use the
tunnel pathways themselves to pass along the request and response, as is done
in <a href="http://www.torproject.org/">Tor</a>. This, however, may lead to leaks
during tunnel creation, allowing peers to discover how many hops there are later
on in the tunnel by monitoring the timing or <a
href="http://dev.i2p.net/pipermail/2005-October/001057.html">packet count</a> as
the tunnel is built.</p>
<h4><a name="tunnel.building.telescoping">"Interactive" Telescopic building</a></h4>
Build the hops one at a time with a message through the existing part of the tunnel for each.
Has major issues as the peers can count the messages to determine their location in the tunnel.
<h4><a name="tunnel.building.nonexploratory">Non-exploratory tunnels for management</a></h4>
<p>A second alternative to the tunnel building process is to give the router
an additional set of non-exploratory inbound and outbound pools, using those for
the tunnel request and response. Assuming the router has a well integrated view
of the network, this should not be necessary, but if the router was partitioned
in some way, using non-exploratory pools for tunnel management would reduce the
leakage of information about what peers are in the router's partition.</p>
<h4><a name="tunnel.building.exploratory">Exploratory request delivery</a></h4>
<p>A third alternative, used until I2P 0.6.1.10, garlic encrypts individual tunnel
request messages and delivers them to the hops individually, transmitting them
through exploratory tunnels with their reply coming back in a separate
exploratory tunnel. This strategy has been dropped in favor of the one outlined
above.</p>
<h4 id="history">More History and Discussion</a></h4>
Before the introduction of the Variable Tunnel Build Message,
there were at least two problems:
<ol>
<li>
The size of the messages (caused by an 8-hop maximum, when the typical tunnel length is 2 or 3 hops...
and current research indicates that more than 3 hops does not enhance anonymity);
<li>
The high build failure rate, especially for long (and exploratory) tunnels, since all hops must agree or the tunnel is discarded.
</ol>
The VTBM has fixed #1 and improved #2.
<p>
Welterde has proposed modifications to the parallel method to allow for reconfiguration.
Sponge has proposed using 'tokens' of some sort.
<p>
Any students of tunnel building must study the historical record leading up to the current method,
especially the various anonymity vulnerabilities that may exist in various methods.
The mail archives from October 2005 at <a href="http://zzz.i2p/archives/2005-10/">zzz.i2p</a> or
<a href="http://osdir.com/ml/network.i2p/2005-10/">osdir.com</a> are particularly helpful.
As stated on <a href="tunnel-alt-creation.html">the tunnel creation specification</a>,
the current strategy came about during a discussion on the I2P mailing list between
Michael Rogers, Matthew Toseland (toad), and jrandom regarding the predecessor attack.
See:<a href="http://osdir.com/ml/network.i2p/2005-10/msg00138.html">Summary</a> and
<a href="http://osdir.com/ml/network.i2p/2005-10/msg00129.html">Reasoning</a>.
<p>
The build changes in 0.6.1.10, released February 2006, were the last incompatible change in i2p;
i.e., all releases since 0.6.1.10 are backward compatible. Any tunnel build change would cause a similar 'flag day',
unless we implemented code so that the build originator would only use the new method if all participants
(and build/reply endpoints/gateways) supported it.
<h4><a name="ordering">Peer ordering alternatives</a></h4>
A less strict ordering is also possible, assuring that while
the hop after A may be B, B may never be before A. Other configuration options
include the ability for just the inbound tunnel gateways and outbound tunnel
endpoints to be fixed, or rotated on an MTBF rate.</p>
<h2><a name="tunnel.mixing">Mixing/batching</a></h2>
<p>What strategies should be used at the gateway and at each hop for delaying,
reordering, rerouting, or padding messages? To what extent should this be done
automatically, how much should be configured as a per tunnel or per hop setting,
and how should the tunnel's creator (and in turn, user) control this operation?
All of this is left as unknown, to be worked out for
<a href="http://www.i2p.net/roadmap#3.0">I2P 3.0</a></p>
{% endblock %}