i2p.166.host-aware-proxy/PROP166.rst

Proposal for a Host-Aware HTTP Proxy Tunnel Type
------------------------------------------------

This is a proposal to resolve the “Shared Identity Problem” in
conventional HTTP-over-I2P usage by introducing a new HTTP proxy tunnel
type. This tunnel type has supplemental behavior which is intended to
prevent or limit the utility of tracking conducted by server operators,
against user-agents(browsers) and the I2P Client Application itself.

What is the “Shared Identity” problem?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The “Shared Identity” problem occurs when a user-agent on a
cryptographically addressed overlay network shares a cryptographic
identity with another user-agent. This occurs, for instance, when a
Firefox and GNU Wget are both configured to use the same HTTP Proxy. In
this scenario, it is possible for the server to collect and store the
cryptographic address(Destination) used to reply to the activity. It can
treat this as a “Fingerprint” which is always 100% unique, because it is
cryptographic in origin. This means that the linkability observed by the
Shared Identity problem is perfect.

But is it a problem?
^^^^^^^^^^^^^^^^^^^^

The shared identity problem is a problem when user-agents that speak the
same protocol desire unlinkability. `It was first mentioned in the
context of HTTP in this Reddit
Thread <https://old.reddit.com/r/i2p/comments/579idi/warning_i2p_is_linkablefingerprintable/>`__,
with the deleted comments accessible courtesy of
`pullpush.io <https://api.pullpush.io/reddit/search/comment/?link_id=579idi>`__.
*At the time* I was one of the most active respondents, and *at the
time* I believed the issue was small. In the past 8 years, the situation
and my opinion of it have changed, with the emergence of Mastodon and
Matrix servers inside of I2P, the threat posed by malicious destination
correlation grows considerably as these sites are in a position to
“profile” specific users. `An example implementation of the Shared
Identity attack on HTTP
User-Agents <https://github.com/eyedeekay/colluding_sites_attack/>`__

The Shared Identity is not useful against a user who is using I2P to
obfuscate geolocation. It also cannot be used to break I2P’s routing.

-  It is impossible to use the Shared Identity problem to geolocate an
   I2P user.
-  It is impossible to use the Shared Identity problem to link I2P
   sessions if they are not contemporary.

However, it is possible to use it to degrade the anonymity of an I2P
user in circumstances which are probably very common. One reason they
are common is becase we encourage the use of Firefox, a web browser
which supports “Tabbed” operation.

-  It is *always* possible to produce a fingerprint from the Shared
   Identity problem in *any* web browser which supports requesting
   third-party resources.
-  Disabling Javascript accomplishes **nothing** against the Shared
   Identity problem.

How you view the severity of the Shared Identity problem as it applies
to the I2P HTTP proxy depends on where you(or more to the point, a
“user” with potentially uninformed expectationss) think the “contextual
identity” for the application lies. There are several possibilities:

1. HTTP is both the Application and the Contextual Identity - This is
   how it works now. All HTTP Applications share an identity.
2. The Process is the Application and the Contextual Identity - This is
   how it works when an application uses an API like SAMv3 or I2CP,
   where an application creates it’s identity and controls it’s
   lifetime.
3. HTTP is the Application, but the Contextual Identity is controlled
   with the “Authentication Hack” - Interesting possibility detailed at
   the end of this proposal, not the object of this proposal
4. HTTP is the Application, but the Host is the Contextual Identity
   -This is the object of this proposal, which treats each Host as a
   potential “Web Application” and treats the threat surface as such.

It also depends on who you think your attackers are and what you would
like to prevent. Someone in a position to carry out this attack would be
a person in a position to have multiple sites “collude” in order to
collect the destinations of I2P Clients, in order to correlate activity
on one site with activity on another. This is a fairly basic form of
profile-building on the clear web where organizations can correlate
interactions on their site with interations on networks they control. On
I2P, because the cryptographic destination is unique, this technique can
sometimes be even more reliable, albeit without the additional power of
geolocation. Any service which hosts user accounts would be able to
correlate them with activity across any sites they control using the
Shared Identity problem. Mastodon, Gitlab, or even simple Forums could
be attackers in disguise as long as they operate more than one service
and have an interest in creating a profile for a user. This surveillance
could be conducted for stalking, financial gain, or intelligence-related
reasons.

Is it Solvable?
^^^^^^^^^^^^^^^

It is probably not possible to make a proxy which intelligently responds
to every possible case in which it’s operation could weaken the
anonymity of an application. However, it is possible to build a proxy
which intelligently responds to a specific application which behaves in
a predictable way. For instance, in modern Web Browsers, it is expected
that users will have multiple tabs open, where they will be interacting
with multiple web sites, which will be distinguished by hostname. This
allows us to improve upon the behavior of the HTTP Proxy for this type
of HTTP user-agent by making the behavior of the proxy match the
behavior of the user-agent by giving each host it’s own Destination when
used with the HTTP Proxy. This change makes it impossible to use the
Shared Identity problem to derive a fingerprint which can be used to
correlate client activity with 2 hosts, because the 2 hosts will simply
no longer share a return identity.

Description:
^^^^^^^^^^^^

A new HTTP Proxy will be created and added to Hidden Services
Manager(I2PTunnel). The new HTTP Proxy will operate as a “multiplexer”
of HTTP Proxies. The multiplexer itself has no destination. Each
individual HTTP Proxy which becomes part of the multiplex has it’s own
local destination, random local port, and it’s own tunnel pool. HTTP
proxies are created on-demand by the multiplexer, where the “demand” is
the first visit to the new host. It is possible to optimize the creation
of the HTTP proxies before inserting them into the multiplexer by
creating one or more in advance and storing them outside the multiplexer

An additional HTTP proxy, with it’s own destination, is set up as the
carrier of an “Outproxy” for any site which does *not* have an I2P
Destination, for example any Clearnet site. This effectively makes all
Outproxy usage a single Contextual Identity, with the caveat that
configuring multiple Outproxies for the tunnel will cause the normal
"Sticky" outproxy rotation, where each outproxy only gets requests for a
single site. This is *almost* the equivalent behavior as isolating
HTTP-over-I2P proxies by destination, on the clear internet.

Resource Considerations:
''''''''''''''''''''''''

The new HTTP proxy requires additional resources compared to the
existing HTTP proxy. It will:

-  Potentially build more tunnels
-  Build tunnels more often
-  Occupy more ports

Each of these requires:

-  Local computing resources
-  Network resources from peers

Settings:
'''''''''

In order to minimize the impact of the increased resource usage, the
proxy should be configured to use as little as possible. Proxies which
are part of the multiplexer(not the parent proxy) should be configured
to:

-  Multiplexed I2PTunnels build 1 tunnel in, 1 tunnel out in their
   tunnel pools
-  Multiplexed I2PTunnels take 3 hops by default.
-  Close tunnels after 10 minutes of inactivity
-  I2PTunnels started by the Multiplexer share the lifespan of the
   Multiplexer. Multiplexed tunnels are not “Destructed” until the
   parent Multiplexer is.

Diagrams:
^^^^^^^^^

The diagram below represents the current operation of the HTTP proxy,
which corresponds to “Possibility 1.” under the “Is it a problem”
section. As you can see, the HTTP proxy interacts with I2P sites
directly using only one destination. In this scenario, HTTP is both the
application and the contextual identity.

.. code:: md

   **Current Situation: HTTP is the Application, HTTP is the Contextual Identity**
                                             __-> Outproxy <-> i2pgit.org
                                            /
   Browser <-> HTTP Proxy(one Destination) <---> idk.i2p
                                            \__-> translate.idk.i2p
                                             \__-> git.idk.i2p

The diagram below represents the operation of a host-aware HTTP proxy,
which corresponds to “Possibility 4.” under the “Is it a problem”
section. In this secenario, HTTP is the application, but the Host
defines the contextual identity, wherein each I2P site interacts with a
different HTTP proxy with a unique destination per-host. This prevents
operators of multiple sites from being able to distinguish when the same
person is visiting multiple sites which they operate.

.. code:: md

   **After the Change: HTTP is the Application, Host is the Contextual Identity**
                                                        __-> HTTP Proxy(Destination A - Outproxies Only) <--> i2pgit.org
                                                       /
   Browser <-> HTTP Proxy Multiplexer(No Destination) <---> HTTP Proxy(Destination B) <--> idk.i2p
                                                       \__-> HTTP Proxy(Destination C) <--> translate.idk.i2p
                                                        \__-> HTTP Proxy(Destination C) <--> git.idk.i2p

Status:
^^^^^^^

A working Java implementation of the host-aware proxy which conforms to
this proposal is available at idk's fork under the branch:
i2p.i2p.2.6.0-browser-proxy-post-keepalive Link in citations.
Implementations with varying capabilities have been written in Go using
the SAMv3 library, they may be useful for embedding in other Go
applications of for go-i2p but are unsuitable for Java I2P.
Additionally, they lack good support for working interactively with
encrypted leaseSets.

Addendum: SOCKS


A similar shared identity problem exists in the SOCKS proxy as well.
However, there, it is harder to solve in part due to the reasons
described on the “SOCKS Tips” page on the I2P site. In particular, it
requires much more effort to determine internal destinations and
outgoing hostnames. However, there is a way which works well, and which
has the additional value of being possible to implement as an HTTP proxy
as well. This could allow an HTTP Proxy and a SOCKS proxy to work in
unison, providing clients with the same identity on a per-host basis.
This in turn could allow for efficient, unlinkable WebRTC inside of I2P.

The drawback, however, is that it requires some basic cooperation on the
part of the client. In lieu of isolating by-host, the client should send
an “Isolation String” as if it were a part of the username and password
sent to the SOCKS proxy server. For instance, if the SOCKS proxy
required username and password, then the isolation string would be
appended after the password as a third component. The username and
password would be authenticated first, and upon success, the isolation
string would be used to add a SOCKS proxy to the multiplex. If the SOCKS
proxy server required no username and password, *any* string would be a
valid “Isolation String.”

This could allow for better and more sophisticated isolation in some
circumstances, because the isolation string need not consist of only a
hostname or destination. A wrapper could be created for ``torsocks``,
``i2psocks`` which would pass this isolation string to the SOCKS proxy
it would use. It would be aware of it’s own arguments, giving it the
ability to generate the isolation string on the fly based on the input.
``i2psocks curl http://idk.i2p"`` could produce an authentication string
like ``curlhttpidk`` giving it a destination which exists only for the
time it takes to run the application. ``curl`` is merely an example,
this approach would work for applications with longer lifetimes too.

.. code:: md

   **Hypothetical Future: SOCKS is the Application, Contextual Identity is decided by the app or perhaps a wrapper**
                                                                              __-> SOCKS Proxy(Isolation String firefoxi2pgitorg) <--> i2pgit.org
                                                                             /
   Browser <-> SOCKS Proxy Multiplexer(No Destination, No Isolation String) <---> SOCKS Proxy(Isolation String curlidk) <--> idk.i2p
                                                                             \__-> SOCKS Proxy(Isolation String firefoxtranslateidk) <--> translate.idk.i2p
                                                                              \__-> SOCKS Proxy(Isolation String firefoxgitidk) <--> git.idk.i2p

Citations:
''''''''''

https://old.reddit.com/r/i2p/comments/579idi/warning_i2p_is_linkablefingerprintable/
https://api.pullpush.io/reddit/search/comment/?link_id=579idi
https://github.com/eyedeekay/colluding_sites_attack/
https://en.wikipedia.org/wiki/Shadow_profile
https://github.com/eyedeekay/si-i2p-plugin/
https://github.com/eyedeekay/eeproxy/
https://geti2p.net/en/docs/api/socks
https://i2pgit.org/idk/i2p.www/-/compare/master...166-identity-aware-proxies?from_project_id=17
https://i2pgit.org/idk/i2p.i2p/-/tree/i2p.i2p.2.6.0-browser-proxy-post-keepalive?ref_type=heads