269 lines
14 KiB
ReStructuredText
269 lines
14 KiB
ReStructuredText
![]() |
Proposal for a Host-Aware HTTP Proxy Tunnel Type
|
|||
|
------------------------------------------------
|
|||
|
|
|||
|
This is a proposal to resolve the “Shared Identity Problem” in
|
|||
|
conventional HTTP-over-I2P usage by introducing a new HTTP proxy tunnel
|
|||
|
type. This tunnel type has supplemental behavior which is intended to
|
|||
|
prevent or limit the utility of tracking conducted by server operators,
|
|||
|
against user-agents(browsers) and the I2P Client Application itself.
|
|||
|
|
|||
|
What is the “Shared Identity” problem?
|
|||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|||
|
|
|||
|
The “Shared Identity” problem occurs when a user-agent on a
|
|||
|
cryptographically addressed overlay network shares a cryptographic
|
|||
|
identity with another user-agent. This occurs, for instance, when a
|
|||
|
Firefox and GNU Wget are both configured to use the same HTTP Proxy. In
|
|||
|
this scenario, it is possible for the server to collect and store the
|
|||
|
cryptographic address(Destination) used to reply to the activity. It can
|
|||
|
treat this as a “Fingerprint” which is always 100% unique, because it is
|
|||
|
cryptographic in origin. This means that the linkability observed by the
|
|||
|
Shared Identity problem is perfect.
|
|||
|
|
|||
|
But is it a problem?
|
|||
|
^^^^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
The shared identity problem is a problem when user-agents that speak the
|
|||
|
same protocol desire unlinkability. `It was first mentioned in the
|
|||
|
context of HTTP in this Reddit
|
|||
|
Thread <https://old.reddit.com/r/i2p/comments/579idi/warning_i2p_is_linkablefingerprintable/>`__,
|
|||
|
with the deleted comments accessible courtesy of
|
|||
|
`pullpush.io <https://api.pullpush.io/reddit/search/comment/?link_id=579idi>`__.
|
|||
|
*At the time* I was one of the most active respondents, and *at the
|
|||
|
time* I believed the issue was small. In the past 8 years, the situation
|
|||
|
and my opinion of it have changed, with the emergence of Mastodon and
|
|||
|
Matrix servers inside of I2P, the threat posed by malicious destination
|
|||
|
correlation grows considerably as these sites are in a position to
|
|||
|
“profile” specific users. `An example implementation of the Shared
|
|||
|
Identity attack on HTTP
|
|||
|
User-Agents <https://github.com/eyedeekay/colluding_sites_attack/>`__
|
|||
|
|
|||
|
The Shared Identity is not useful against a user who is using I2P to
|
|||
|
obfuscate geolocation. It also cannot be used to break I2P’s routing.
|
|||
|
|
|||
|
- It is impossible to use the Shared Identity problem to geolocate an
|
|||
|
I2P user.
|
|||
|
- It is impossible to use the Shared Identity problem to link I2P
|
|||
|
sessions if they are not contemporary.
|
|||
|
|
|||
|
However, it is possible to use it to degrade the anonymity of an I2P
|
|||
|
user in circumstances which are probably very common. One reason they
|
|||
|
are common is becase we encourage the use of Firefox, a web browser
|
|||
|
which supports “Tabbed” operation.
|
|||
|
|
|||
|
- It is *always* possible to produce a fingerprint from the Shared
|
|||
|
Identity problem in *any* web browser which supports requesting
|
|||
|
third-party resources.
|
|||
|
- Disabling Javascript accomplishes **nothing** against the Shared
|
|||
|
Identity problem.
|
|||
|
|
|||
|
How you view the severity of the Shared Identity problem as it applies
|
|||
|
to the I2P HTTP proxy depends on where you(or more to the point, a
|
|||
|
“user” with potentially uninformed expectationss) think the “contextual
|
|||
|
identity” for the application lies. There are several possibilities:
|
|||
|
|
|||
|
1. HTTP is both the Application and the Contextual Identity - This is
|
|||
|
how it works now. All HTTP Applications share an identity.
|
|||
|
2. The Process is the Application and the Contextual Identity - This is
|
|||
|
how it works when an application uses an API like SAMv3 or I2CP,
|
|||
|
where an application creates it’s identity and controls it’s
|
|||
|
lifetime.
|
|||
|
3. HTTP is the Application, but the Contextual Identity is controlled
|
|||
|
with the “Authentication Hack” - Interesting possibility detailed at
|
|||
|
the end of this proposal, not the object of this proposal
|
|||
|
4. HTTP is the Application, but the Host is the Contextual Identity
|
|||
|
-This is the object of this proposal, which treats each Host as a
|
|||
|
potential “Web Application” and treats the threat surface as such.
|
|||
|
|
|||
|
It also depends on who you think your attackers are and what you would
|
|||
|
like to prevent. Someone in a position to carry out this attack would be
|
|||
|
a person in a position to have multiple sites “collude” in order to
|
|||
|
collect the destinations of I2P Clients, in order to correlate activity
|
|||
|
on one site with activity on another. This is a fairly basic form of
|
|||
|
profile-building on the clear web where organizations can correlate
|
|||
|
interactions on their site with interations on networks they control. On
|
|||
|
I2P, because the cryptographic destination is unique, this technique can
|
|||
|
sometimes be even more reliable, albeit without the additional power of
|
|||
|
geolocation. Any service which hosts user accounts would be able to
|
|||
|
correlate them with activity across any sites they control using the
|
|||
|
Shared Identity problem. Mastodon, Gitlab, or even simple Forums could
|
|||
|
be attackers in disguise as long as they operate more than one service
|
|||
|
and have an interest in creating a profile for a user. This surveillance
|
|||
|
could be conducted for stalking, financial gain, or intelligence-related
|
|||
|
reasons.
|
|||
|
|
|||
|
Is it Solvable?
|
|||
|
^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
It is probably not possible to make a proxy which intelligently responds
|
|||
|
to every possible case in which it’s operation could weaken the
|
|||
|
anonymity of an application. However, it is possible to build a proxy
|
|||
|
which intelligently responds to a specific application which behaves in
|
|||
|
a predictable way. For instance, in modern Web Browsers, it is expected
|
|||
|
that users will have multiple tabs open, where they will be interacting
|
|||
|
with multiple web sites, which will be distinguished by hostname. This
|
|||
|
allows us to improve upon the behavior of the HTTP Proxy for this type
|
|||
|
of HTTP user-agent by making the behavior of the proxy match the
|
|||
|
behavior of the user-agent by giving each host it’s own Destination when
|
|||
|
used with the HTTP Proxy. This change makes it impossible to use the
|
|||
|
Shared Identity problem to derive a fingerprint which can be used to
|
|||
|
correlate client activity with 2 hosts, because the 2 hosts will simply
|
|||
|
no longer share a return identity.
|
|||
|
|
|||
|
Description:
|
|||
|
^^^^^^^^^^^^
|
|||
|
|
|||
|
A new HTTP Proxy will be created and added to Hidden Services
|
|||
|
Manager(I2PTunnel). The new HTTP Proxy will operate as a “multiplexer”
|
|||
|
of HTTP Proxies. The multiplexer itself has no destination. Each
|
|||
|
individual HTTP Proxy which becomes part of the multiplex has it’s own
|
|||
|
local destination, random local port, and it’s own tunnel pool. HTTP
|
|||
|
proxies are created on-demand by the multiplexer, where the “demand” is
|
|||
|
the first visit to the new host. It is possible to optimize the creation
|
|||
|
of the HTTP proxies before inserting them into the multiplexer by
|
|||
|
creating one or more in advance and storing them outside the multiplexer
|
|||
|
|
|||
|
An additional HTTP proxy, with it’s own destination, is set up as the
|
|||
|
carrier of an “Outproxy” for any site which does *not* have an I2P
|
|||
|
Destination, for example any Clearnet site. This effectively makes all
|
|||
|
Outproxy usage a single Contextual Identity, with the caveat that
|
|||
|
configuring multiple Outproxies for the tunnel will cause the normal
|
|||
|
"Sticky" outproxy rotation, where each outproxy only gets requests for a
|
|||
|
single site. This is *almost* the equivalent behavior as isolating
|
|||
|
HTTP-over-I2P proxies by destination, on the clear internet.
|
|||
|
|
|||
|
Resource Considerations:
|
|||
|
''''''''''''''''''''''''
|
|||
|
|
|||
|
The new HTTP proxy requires additional resources compared to the
|
|||
|
existing HTTP proxy. It will:
|
|||
|
|
|||
|
- Potentially build more tunnels
|
|||
|
- Build tunnels more often
|
|||
|
- Occupy more ports
|
|||
|
|
|||
|
Each of these requires:
|
|||
|
|
|||
|
- Local computing resources
|
|||
|
- Network resources from peers
|
|||
|
|
|||
|
Settings:
|
|||
|
'''''''''
|
|||
|
|
|||
|
In order to minimize the impact of the increased resource usage, the
|
|||
|
proxy should be configured to use as little as possible. Proxies which
|
|||
|
are part of the multiplexer(not the parent proxy) should be configured
|
|||
|
to:
|
|||
|
|
|||
|
- Multiplexed I2PTunnels build 1 tunnel in, 1 tunnel out in their
|
|||
|
tunnel pools
|
|||
|
- Multiplexed I2PTunnels take 3 hops by default.
|
|||
|
- Close tunnels after 10 minutes of inactivity
|
|||
|
- I2PTunnels started by the Multiplexer share the lifespan of the
|
|||
|
Multiplexer. Multiplexed tunnels are not “Destructed” until the
|
|||
|
parent Multiplexer is.
|
|||
|
|
|||
|
Diagrams:
|
|||
|
^^^^^^^^^
|
|||
|
|
|||
|
The diagram below represents the current operation of the HTTP proxy,
|
|||
|
which corresponds to “Possibility 1.” under the “Is it a problem”
|
|||
|
section. As you can see, the HTTP proxy interacts with I2P sites
|
|||
|
directly using only one destination. In this scenario, HTTP is both the
|
|||
|
application and the contextual identity.
|
|||
|
|
|||
|
.. code:: md
|
|||
|
|
|||
|
**Current Situation: HTTP is the Application, HTTP is the Contextual Identity**
|
|||
|
__-> Outproxy <-> i2pgit.org
|
|||
|
/
|
|||
|
Browser <-> HTTP Proxy(one Destination) <---> idk.i2p
|
|||
|
\__-> translate.idk.i2p
|
|||
|
\__-> git.idk.i2p
|
|||
|
|
|||
|
The diagram below represents the operation of a host-aware HTTP proxy,
|
|||
|
which corresponds to “Possibility 4.” under the “Is it a problem”
|
|||
|
section. In this secenario, HTTP is the application, but the Host
|
|||
|
defines the contextual identity, wherein each I2P site interacts with a
|
|||
|
different HTTP proxy with a unique destination per-host. This prevents
|
|||
|
operators of multiple sites from being able to distinguish when the same
|
|||
|
person is visiting multiple sites which they operate.
|
|||
|
|
|||
|
.. code:: md
|
|||
|
|
|||
|
**After the Change: HTTP is the Application, Host is the Contextual Identity**
|
|||
|
__-> HTTP Proxy(Destination A - Outproxies Only) <--> i2pgit.org
|
|||
|
/
|
|||
|
Browser <-> HTTP Proxy Multiplexer(No Destination) <---> HTTP Proxy(Destination B) <--> idk.i2p
|
|||
|
\__-> HTTP Proxy(Destination C) <--> translate.idk.i2p
|
|||
|
\__-> HTTP Proxy(Destination C) <--> git.idk.i2p
|
|||
|
|
|||
|
Status:
|
|||
|
^^^^^^^
|
|||
|
|
|||
|
A working Java implementation of the host-aware proxy which conforms to
|
|||
|
this proposal is available at idk's fork under the branch:
|
|||
|
i2p.i2p.2.6.0-browser-proxy-post-keepalive Link in citations.
|
|||
|
Implementations with varying capabilities have been written in Go using
|
|||
|
the SAMv3 library, they may be useful for embedding in other Go
|
|||
|
applications of for go-i2p but are unsuitable for Java I2P.
|
|||
|
Additionally, they lack good support for working interactively with
|
|||
|
encrypted leaseSets.
|
|||
|
|
|||
|
Addendum: SOCKS
|
|||
|
|
|||
|
|
|||
|
A similar shared identity problem exists in the SOCKS proxy as well.
|
|||
|
However, there, it is harder to solve in part due to the reasons
|
|||
|
described on the “SOCKS Tips” page on the I2P site. In particular, it
|
|||
|
requires much more effort to determine internal destinations and
|
|||
|
outgoing hostnames. However, there is a way which works well, and which
|
|||
|
has the additional value of being possible to implement as an HTTP proxy
|
|||
|
as well. This could allow an HTTP Proxy and a SOCKS proxy to work in
|
|||
|
unison, providing clients with the same identity on a per-host basis.
|
|||
|
This in turn could allow for efficient, unlinkable WebRTC inside of I2P.
|
|||
|
|
|||
|
The drawback, however, is that it requires some basic cooperation on the
|
|||
|
part of the client. In lieu of isolating by-host, the client should send
|
|||
|
an “Isolation String” as if it were a part of the username and password
|
|||
|
sent to the SOCKS proxy server. For instance, if the SOCKS proxy
|
|||
|
required username and password, then the isolation string would be
|
|||
|
appended after the password as a third component. The username and
|
|||
|
password would be authenticated first, and upon success, the isolation
|
|||
|
string would be used to add a SOCKS proxy to the multiplex. If the SOCKS
|
|||
|
proxy server required no username and password, *any* string would be a
|
|||
|
valid “Isolation String.”
|
|||
|
|
|||
|
This could allow for better and more sophisticated isolation in some
|
|||
|
circumstances, because the isolation string need not consist of only a
|
|||
|
hostname or destination. A wrapper could be created for ``torsocks``,
|
|||
|
``i2psocks`` which would pass this isolation string to the SOCKS proxy
|
|||
|
it would use. It would be aware of it’s own arguments, giving it the
|
|||
|
ability to generate the isolation string on the fly based on the input.
|
|||
|
``i2psocks curl http://idk.i2p"`` could produce an authentication string
|
|||
|
like ``curlhttpidk`` giving it a destination which exists only for the
|
|||
|
time it takes to run the application. ``curl`` is merely an example,
|
|||
|
this approach would work for applications with longer lifetimes too.
|
|||
|
|
|||
|
.. code:: md
|
|||
|
|
|||
|
**Hypothetical Future: SOCKS is the Application, Contextual Identity is decided by the app or perhaps a wrapper**
|
|||
|
__-> SOCKS Proxy(Isolation String firefoxi2pgitorg) <--> i2pgit.org
|
|||
|
/
|
|||
|
Browser <-> SOCKS Proxy Multiplexer(No Destination, No Isolation String) <---> SOCKS Proxy(Isolation String curlidk) <--> idk.i2p
|
|||
|
\__-> SOCKS Proxy(Isolation String firefoxtranslateidk) <--> translate.idk.i2p
|
|||
|
\__-> SOCKS Proxy(Isolation String firefoxgitidk) <--> git.idk.i2p
|
|||
|
|
|||
|
Citations:
|
|||
|
''''''''''
|
|||
|
|
|||
|
https://old.reddit.com/r/i2p/comments/579idi/warning_i2p_is_linkablefingerprintable/
|
|||
|
https://api.pullpush.io/reddit/search/comment/?link_id=579idi
|
|||
|
https://github.com/eyedeekay/colluding_sites_attack/
|
|||
|
https://en.wikipedia.org/wiki/Shadow_profile
|
|||
|
https://github.com/eyedeekay/si-i2p-plugin/
|
|||
|
https://github.com/eyedeekay/eeproxy/
|
|||
|
https://geti2p.net/en/docs/api/socks
|
|||
|
https://i2pgit.org/idk/i2p.www/-/compare/master...166-identity-aware-proxies?from_project_id=17
|
|||
|
https://i2pgit.org/idk/i2p.i2p/-/tree/i2p.i2p.2.6.0-browser-proxy-post-keepalive?ref_type=heads
|