new blockfile spec and dev guidelines; other typos

This commit is contained in:
zzz
2012-01-28 18:33:49 +00:00
parent 4e5170f29d
commit e74bf19f10
6 changed files with 331 additions and 3 deletions

View File

@@ -0,0 +1,210 @@
{% extends "_layout.html" %}
{% block title %}I2P Blockfile Specification{% endblock %}
{% block content %}
<h2>
Blockfile and Hosts Database Specification
</h2>
<p>
Page last updated January 2012, current as of router version 0.8.12
<h3>Overview</h3>
<p>
This document specifies
the I2P blockfile file format
and the tables in the hostsdb.blockfile used by the Blockfile <a href="naming.html">Naming Service</a>.
</p><p>
The blockfile provides fast Destination lookup in a compact format. While the blockfile page overhead is substantial,
the destinations are stored in binary rather than in Base 64 as in the hosts.txt format.
In addition, the blockfile provides the capability of arbitrary metadata storage
(such as added date, source, and comments) for each entry.
The metadata may be used in the future to provide advanced addressbook features.
The blockfile storage requirement is a modest increase over the hosts.txt format, and the blockfile provides
approximately 10x reduction in lookup times.
</p><p>
A blockfile is simply on-disk storage of multiple sorted maps (key-value pairs),
implemented as skiplists.
The blockfile format is adopted from the
<a href="http://www.metanotion.net/software/sandbox/block.html">Metanotion Blockfile Database</a>.
First we will define the file format, then the use of that format by the BlockfileNamingService.
<h3>Blockfile Format</h3>
<p>
The original blockfile spec was modified to add magic numbers to each page.
The file is structured in 1024-byte pages. Pages are numbered starting from 1.
The "superblock" is always at page 1, i.e. starting at byte 0 in the file.
The metaindex skiplist is always at page 2, i.e. starting at byte 1024 in the file.
All 2- and 4-byte integer values are signed and negative values are illegal.
</p>
<p>
Superblock format:
</p>
<pre>
Byte Contents
0-5 Magic number 0x3141de493250 "1A" 0xde "I2P"
6 Major version 0x01
7 Minor version 0x01
8-15 File length Total length in bytes
16-19 First free list page
20-21 Mounted flag 0x01 = yes
22-23 Span size max number of key/value pairs per span (16 for hostsdb)
24-1023 unused
</pre>
<p>
Skip list block page format:
</p>
<pre>
Byte Contents
0-7 Magic number 0x536b69704c697374 "SkipList"
8-11 First span page
12-15 First level page
16-19 Size (total number of keys - may only be valid at startup)
20-23 Spans (total number of spans - may only be valid at startup)
24-27 Levels (total number of levels - may only be valid at startup)
28-1023 unused
</pre>
<p>
Skip level block page format is as follows.
All levels have a span. Not all spans have levels.
</p>
<pre>
Byte Contents
0-7 Magic number 0x42534c6576656c73 "BSLevels"
8-9 Max height
10-11 Current height
12-15 Span page
16- Next level pages ('height' pages, 4 bytes each)
</pre>
<p>
Skip span block page format is as follows.
Key/value structures are sorted by key within each span and across all spans.
Key/value structures are sorted by key within each span.
Spans other than the first span may not be empty.
</p>
<pre>
Byte Contents
0-3 Magic number 0x5370616e "Span"
4-7 First continuation page or 0
8-11 Previous span page or 0
12-15 Next span page or 0
16-17 Max keys (16 for hostsdb)
18-19 Size (current number of keys)
24-1023 key/value structures
</pre>
<p>
Span Continuation block page format:
</p>
<pre>
Byte Contents
0-3 Magic number 0x434f4e54 "CONT"
4-7 Next continuation page or 0
8-1023 key/value structures
</pre>
<p>
Key/value structure format is as follows.
Key and value lengths must not be split across pages, i.e. all 4 bytes must be on the same page.
If there is not enough room the last 1-3 bytes of a page are unused and the lengths will
be at offset 8 in the continuation page.
Key and value data may be split across pages.
Max key and value lengths are 65535 bytes.
</p>
<pre>
Byte Contents
0-1 key length in bytes
2-3 value length in bytes
4- key data
value data
</pre>
<p>
Free list block page format:
</p>
<pre>
Byte Contents
0-7 Magic number 0x2366724c69737423 "#frList#"
8-11 Next free list block or 0 if none
12-15 Number of valid free pages in this block (0 - 252)
16-1023 Free pages (4 bytes each), only the first (valid number) are valid
</pre>
<p>
Free page block format:
</p>
<pre>
Byte Contents
0-7 Magic number 0x7e2146524545217e "~!FREE!~"
8-1023 unused
</pre>
<p>
The metaindex (located at page 2) is a mapping of US-ASCII strings to 4-byte integers.
The key is the name of the skiplist and the value is the page index of the skiplist.
</p>
<h3>Blockfile Naming Service Tables</h3>
<p>
The tables created and used by the BlockfileNamingService are as follows.
The maximum number of entries per span is 16.
</p>
<h4>Properties Skiplist</h4>
<p>
"%%__INFO__%%" is the master database skiplist with String/Properties key/value entries containing only one entry:
</p>
<pre>
"info": a Properties (UTF-8 String/String Map), serialized as a <a href="common_structures_spec#type_Mapping">Mapping</a>:
"version": "2"
"created": Java long time (ms)
"upgraded": Java long time (ms) (as of database version 2)
"lists": Comma-separated list of host databases, to be
searched in-order for lookups. Almost always "privatehosts.txt,userhosts.txt,hosts.txt".
</pre>
<h4>Reverse Lookup Skiplist</h4>
<p>
"%%__REVERSE__%%" is the reverse lookup skiplist with Integer/Properties key/value entries
(as of database version 2):
</p>
<pre>
The skiplist keys are 4-byte Integers, the first 4 bytes of the hash of the Destination.
The skiplist values are each a Properties (a UTF-8 String/String Map) serialized as a <a href="common_structures_spec#type_Mapping">Mapping</a>
There may be multiple entries in the properties, each one is a reverse mapping,
as there may be more than one hostname for a given destination,
or there could be collisions with the same first 4 bytes of the hash.
Each property key is a hostname.
Each property value is the empty string.
</pre>
<h4>hosts.txt, userhosts.txt, and privatehosts.txt Skiplists</h4>
<p>
For each host database, there is a skiplist containing
the hosts for that database.
The keys/values in these skiplists are as follows:
</p>
<pre>
key: a UTF-8 String (the hostname)
value: a DestEntry, which is a Properties (a UTF-8 String/String Map) serialized as a <a href="common_structures_spec#type_Mapping">Mapping</a>
followed by a binary Destination (serialized <a href="common_structures_spec#struct_Destination">as usual</a>).
</pre>
<p>
The DestEntry Properties typically contains:
</p>
<pre>
"a": The time added (Java long time in ms)
"s": The original source of the entry (typically a file name or subscription URL)
others: TBD
</pre>
<p>
Hostname keys are stored in lower-case and always end in ".i2p".
{% endblock %}

View File

@@ -39,7 +39,7 @@ Updated January 2012, current as of router version 0.8.12
</p>
<h4>Contents</h4>
<p>
1 or more bytes where the first byte is the number of bytes(not characters!) in the string and the remaining 0-255 bytes are the non-null terminated UTF-8 encoded character array
1 or more bytes where the first byte is the number of bytes (not characters!) in the string and the remaining 0-255 bytes are the non-null terminated UTF-8 encoded character array
</p>
<h2 id="type_Boolean">Boolean</h2>
@@ -256,6 +256,9 @@ Some documentation says that the strings may not include '=' or ';' but this enc
Strings are defined to be UTF-8 but in the current implementation, I2CP uses UTF-8 but I2NP does not.
For example,
UTF-8 strings in a RouterInfo options mapping in a I2NP Database Store Message will be corrupted.
<li>
Mappings contained in I2NP messages (i.e. in a RouterAddress or RouterInfo)
must be sorted by key so that the signature will be invariant.
</ul>
<h4><a href="http://docs.i2p-projekt.de/javadoc/net/i2p/data/DataHelper.html">Javadoc</a></h4>

View File

@@ -0,0 +1,111 @@
{% extends "_layout.html" %}
{% block title %Developer Guidelines and Coding Style{% endblock %}
{% block content %}
<p>
Read the <a href="newdevelopers.html">new developers guide</a> first.
</p>
<h2>Basic Guidelines and Coding Style</h2>
<p>
Most of the following should be common sense for anybody who has worked on open source or in a commercial
programming envrionment.
The following applies mostly to the main development branch i2p.i2p.
Guidelines for other branches, plugins, and external apps may be substantially different;
check with the appropriate developer for guidance.
</p>
<ul>
<li>
Please don't just "write code". If you can, participate in other development activities, including:
development discussions and support on IRC, zzz.i2p, and forum.i2p; testing;
bug reporting and responses; documentation; code reviews; etc.
</li><li>
Coding style throughout most of the code is 4-spaces for indentation. Do not use tabs.
Do not reformat code. If your IDE or editor wants to reformat everything, get control of it.
Yes, we know 4 spaces is a pain, but perhaps you can configure your editor appropriately.
In some places, the coding style is different.
Use common sense. Emulate the style in the file you are modifying.
</li><li>
Active devs should be available periodically on IRC #i2p-dev.
Be aware of the current release cycle.
Adhere to release milestones such as feature freeze, tag freeze, and
the checkin deadline for a release.
Do not check in major changes into the main i2p.i2p branch late in the release cycle.
If a project will take you more than a couple days, create your own branch in monotone
and do the development there so you do not block releases.
</li><li>
Have a basic understanding of distributed source control systems, even if you haven't
used monotone before. Ask for help if you need it.
Once pushed, checkins are forever, there is no undo. Please be careful.
If you have not used monotone before, start with baby steps.
Check in some small changes and see how it goes.
</li><li>
Test your changes before checking them in.
If you prefer the checkin-before-test development model,
use your own development branch (e.g. i2p.i2p.yourname.test)
and propagate back to i2p.i2p once it is working well.
Do not break the build. Do not cause regressions.
In case you do (it happens), please do not vanish for a long period after
you push your change.
</li><li>
If your change is non-trivial, or you want people to test it and need good test reports
to know whether your change was tested or not, add a checkin comment to history.txt
and increment the build revision in RouterVersion.java.
</li><li>
Ensure that you have the latest monotonerc file in _MTN.
Do not check in on top of untrusted revisions.
</li><li>
Ensure that you pull the latest revision before you check in.
If you inadvertently diverge, merge and push as soon as possible.
Don't routinely make others merge for you.
Yes, we know that monotone says you should push and then merge,
but in our experience, in-workspace merge works just as well as in-database merge,
without creating a merge revision.
</li><li>
Only check in code that you wrote yourself.
Before checking in any code or library jars from other sources,
justify why it is necessary,
verify the license is compatible,
and obtain approval from the lead developer.
</li><li>
For any images checked in from external sources,
it is your responsibility to first verify the license is compatible.
Include the license and source information in the checkin comment.
</li><li>
New classes and methods require at least brief javadocs. Add @since release-number.
</li><li>
Classes in core/ (i2p.jar) and portions of i2ptunnel are part of our official API.
There are several out-of-tree plugins and other applications that rely on this API.
Be careful not to make any changes that break compatibility.
Don't add methods to the API unless they are of general utility.
Javadocs for API methods should be clear and complete.
If you add or change the API, also update the documentation on the website (i2p.www branch).
</li><li>
Tag strings for translation where appropriate.
Don't change existing tagged strings unless really necessary, as it will break existing translations.
Do not add or change tagged strings after the "tag freeze" in the release cycle so that
translators have a chance to update before the release.
</li><li>
Use generics and concurrent classes where possible. I2P is a highly multi-threaded application.
</li><li>
We require Java 6 to build but only Java 5 to run I2P.
Do not use Java 6 classes or methods without handling the class not found exceptions
and providing alternate Java 5 code. See classes in net.i2p.util for examples.
</li><li>
Explicitly convert between primitive types and classes;
don't rely on autoboxing/unboxing.
</li><li>
Managing Trac tickets is everybody's job, please help.
Monitor trac.i2p2.i2p for tickets you have been assigned or can help with.
Asssign, categorize, comment on, fix, or close tickets if you can.
</li><li>
Close a ticket when you think you've fixed it.
We don't have a test department to verify and close tickets.
If you arent sure you fixed it, close it and add a note saying
"I think I fixed it, please test and reopen if it's still broken".
Add a comment with the dev build number or revision and set
the milestone to the next release.
</li>
</ul>
{% endblock %}

View File

@@ -5,7 +5,7 @@
<p>
Following is an index to the technical documentation for I2P.
This page was last updated in January 2012 and is accurate for router version 0.8.11.
This page was last updated in January 2012 and is accurate for router version 0.8.12.
</p><p>
This index is ordered from the highest to lowest layers.
The higher layers are for "clients" or applications;
@@ -38,6 +38,7 @@ If you find any inaccuracies in the documents linked below, please
<li><a href="updates.html">Router software updates</a></li>
<li><a href="bittorrent.html">Bittorrent over I2P</a></li>
<li><a href="i2pcontrol.html">I2PControl Plugin API</a></li>
<li><a href="blockfile.html">hostsdb.blockfile Format</a></li>
</ul>
<h3>Application Layer API and Protocols</h3>
@@ -181,6 +182,8 @@ Time synchronization and NTP
</li><li>
<a href="monotone.html">Monotone Guide</a>
</li><li>
<a href="dev-guidelines.html">Developer Guidelines</a>
</li><li>
<a href="http://docs.i2p-projekt.de/javadoc/">Javadocs</a> (standard internet)
Note: always verify that javadocs are current by checking the release number.
</li><li>

View File

@@ -399,7 +399,7 @@ checking.
<p>
See <a href="udp.html#keys">the SSU specification</a> for details.
<p>
WARNING - I2P's HMAC-HD5-128 used in SSU is apparently non-standard.
WARNING - I2P's HMAC-MD5-128 used in SSU is apparently non-standard.
Apparently, an early version of SSU used HMAC-SHA256, and then it was switched
to MD5-128 for performance reasons, but left the 32-byte buffer size intact.
See HMACGenerator.java and

View File

@@ -152,6 +152,7 @@
<h2 id="get-to-know-us">Get to know us!</h2>
<p>
The developers hang around on IRC. They can be reached on the Freenode network, and on the I2P internal networks. The usual place to look is #i2p. Join the channel and say hi!
We also have <a href="dev-guidelines.html">additional guidelines for regular developers</a>.
</p>
<h2 id="translations">Translations</h2>