TOC 
OpenPGP Working GroupD. Gillmor
Internet-DraftJ. Rollins
Updates: 4880 (if approved)Independent
Intended status: InformationalM. Anderson
Expires: May 24, 2010Riseup Networks
 M. Goins
 Openflows Community Technology Lab
 November 20, 2009


Standardized Method for Hashing OpenPGP User IDs
openpgp-hashed-userids

Status of this Memo

By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on May 24, 2010.

Abstract

This memo proposes a standard method for forming hashed User IDs in OpenPGP Certificates for users who want to take advantage of public OpenPGP infrastructure without exposing their User IDs to public enumeration. It also discusses implementation considerations to simplify the use of these User IDs in the existing OpenPGP Web of Trust.



Table of Contents

1.  Introduction
    1.1.  Requirements Language
2.  Hashed User ID Format
3.  Choice of Hash Algorithm
4.  Implementation Considerations
    4.1.  User ID Canonicalization
        4.1.1.  Domain Name and URL Scheme Case-insensitivity
        4.1.2.  IP Addresses
            4.1.2.1.  IPv4 Addresses
            4.1.2.2.  IPv6 Addresses
        4.1.3.  Human Names
        4.1.4.  Other Case-insensitivity
    4.2.  Avoiding Loops
    4.3.  Unusual Hash Algorithms
    4.4.  User Interaction
    4.5.  Local Storage
    4.6.  Interaction with Trust Signature Regular Expressions
5.  Rationales for decisions
6.  Acknowledgements
7.  IANA Considerations
8.  Security Considerations
9.  References
    9.1.  Normative References
    9.2.  Informative References
§  Authors' Addresses
§  Intellectual Property and Copyright Statements




 TOC 

1.  Introduction

OpenPGP (Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R. Thayer, “OpenPGP Message Format,” November 2007.) [RFC4880] certificates are traditionally available through a publicly-accessible lookup (using keyservers or other transports). This public lookup mechanism provides a number of useful features, including certificate and signature revocation, metadata updates (including expiry), and third-party certification. However, public certificate retrieval combined with the multilateral nature of the Web of Trust (WoT) allow for trivial enumeration of all OpenPGP certificates in the well-connected set. Since OpenPGP certificates are easily mapped to their corresponding real-world entities by their User IDs, the real-world identities of keyholders in the well-connected set are also trivially-enumerable.

Many established best-practices discourage public enumeration of real-world entities. As OpenPGP User IDs grow to encompass a wide range of real-world entities, some users will be reluctant to adopt OpenPGP certificates because of these concerns. Allowing the binding between certificates and real-world entities to be a one-way binding enables users to take advantage of the benefits of OpenPGP infrastructure without exposing real-world identities to trivial public enumeration.

This document describes a standard method for forming hashed User IDs in OpenPGP Certificates to address these concerns. The specific form of User ID described is entirely compatible with existing OpenPGP implementations, though implementations aware of this specification may want to make modifications to offer a superior user experience when encountering hashed User IDs.

Crucial to this standard is the existence of a predictable one-way mapping from cleartext User IDs to certificates. But when this mechanism is in use, the reverse mapping (from certificates to cleartext User IDs) should be computationally infeasible. This is accomplished by passing a canonicalized version of the User ID through a standard digest algorithm and formatting the result in an unambiguous way.

Note that User IDs hashed by this mechanism can be searched for neither by substring nor by regular expression. They may only be found by direct lookup with an exact, full-text match on the User ID. Thus, if Bob uses this mechanism to hash his User ID, and if Alice already knows Bob's full identity, she can trivially find his key on the public keyservers or from a set of employee keys published elsewhere by Bob's employer. But if Mallory wants to get a list of everyone employed at Bob's place of work, she will be unable to retrieve Bob's information by the same means of retrieval. Note that this mechanism does not prevent Mallory from finding Bob's certificate if Mallory already knows Bob's identity.

New uses of OpenPGP certificates like [RFC5081] (Mavrogiannopoulos, N., “Using OpenPGP Keys for Transport Layer Security (TLS) Authentication,” November 2007.) suggest other uses for OpenPGP User IDs beyond simple [RFC5322] (Resnick, P., Ed., “Internet Message Format,” October 2008.) e-mail addresses. And section 5.11 of [RFC4880] (Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R. Thayer, “OpenPGP Message Format,” November 2007.) explicitly states that there are no restrictions on the content of the User ID field as long as it is a UTF-8 (Yergeau, F., “UTF-8, a transformation format of ISO 10646,” November 2003.) [RFC3629] string. These novel uses raise similar concerns about real-world entity publication and enumeration as do traditional User IDs. This mechanism affords the same protections to all compliant forms of OpenPGP User IDs.

The OpenPGP standard (Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R. Thayer, “OpenPGP Message Format,” November 2007.) [RFC4880] allows for the use of arbitrary UTF-8 text strings as User IDs. These User IDs are useful for the public Web of Trust (WoT) because they allow easy human-readable certification, revocation, and expiration of associated certificates on the public keyservers. However, there may be reluctance to take advantage of this infrastructure because of enumeration of the association of the OpenPGP user IDs and their real-world counterparts on the public keyservers.

OpenPGP and the WoT provide powerful tools for cryptographic purposes, but can also be used to reveal otherwise-hidden information. Some potential consequences include: exposing the real-world identity associated with an e-mail address, exposing the e-mail address associated with a real-world identity, offline derivation of social and business relationship maps, and service or host enumeration. The WoT is not typically considered to be a repository for hidden information, since the User IDs themselves are generally not obscured in any way. As a result of this information being available and un-obscured (on OpenPGP keyservers or elsewhere) it can be used to trivially expose a comprehensive listing of all cryptographically authenticated identity information related to individuals, their social relationships, organizations, services, and hosts.

Individuals wish to utilize the cryptographic identity verification mechanisms provided by the OpenPGP WoT. However, some are uncomfortable publishing their identity due to the ready availability to that information being provided by the WoT. In response to those concerns these individuals tend to utilize various problematic mechanisms to obscure this information, or more worrisome simply not participate in the WoT in any meaningful way.

Many are deeply troubled with publishing a complicated dossier detailing social connections. The process of exchanging OpenPGP signatures with individuals provides to the public not only the identities of those individuals, but also the social relationship between the two parties doing the exchange. These concerns are amplified due to how trivial it is to build a relationship map with the readily available, unobscured data in the WoT. In the context of building a public key infrastructure, this mapping has functional use, but for many it raises privacy concerns. These concerns have been likened to the difference between letting a random person call your company and ask for Jenny Smith's phone number versus sending them a copy of your entire corporate phone directory.

The practice of using the User ID fields in OpenPGP keys for service and host authentication results in similar concerns around publishing easily enumerated information about internals of a network related to a given domain.

Enumeration concerns are not unique to OpenPGP, in fact two common best practices in DNS configuration, namely "Split Horizon" and restricted zone transfers are undertaken explicitly for the purposes of limiting enumeration of the entire list of names or other information contained in zones. Because DNS stores a wealth of information regarding the configuration of a network, being able to enumerate such information is an invaluable resource for would-be attackers because this list can give important, and detailed information about your internal infrastructure that may not be otherwise published. Separating DNS into external and internal views ("Split Horizon") and ensuring that only approved slave servers can transfer zones from your primary name server is an important mechanism to restrict remote users to only be able to look up records for domain names they already know, one at a time.

These best practices have evolved over time into legal requirements. European countries are bound by the EU Data Protection Directive, and as a result the DENIC http://en.wikipedia.org/wiki/DENIC (manager of the .de top-level country-code domain for Germany) has stated that zone enumeration violates Germany's Federal Data Protection Act. Additionally, the information obtained through zone enumeration can be used as a key for multiple WHOIS queries which can reveal registrant data. Data which many registrars are under strict legal obligations to protect under various contracts.

The impact of these legal consideration forced DNSSEC, as defined in RFCs [RFC4033] (Arends, R., Austein, R., Larson, M., Massey, D., and S. Rose, “DNS Security Introduction and Requirements,” March 2005.) through [RFC4035] (Arends, R., Austein, R., Larson, M., Massey, D., and S. Rose, “Protocol Modifications for the DNS Security Extensions,” March 2005.), to address this issue. DNSSEC has the goal of increasing security, however contrary to these best practices, forces exposure of zone information. Although the IETF DNS Extensions working group originally considered zone enumeration to be a non-issue by arguing that DNS data is considered public, the significant concerns raised to the working group by large registrars about the legality of zone enumerability resulted in the creation of [RFC5155] (Laurie, B., Sisson, G., Arends, R., and D. Blacka, “DNS Security (DNSSEC) Hashed Authenticated Denial of Existence,” March 2008.) "DNSSEC Hashed Authentication Denial of Existence" to specifically address this issue. In this scheme, instead of including the name directly (which would enable zone enumeration), the record includes a cryptographically hashed value of the name.

The OpenPGP WoT, as is typically implemented in keyservers is not the only mechanism available for worldwide public key infrastructure. However, contrary to the typical OpenPGP WoT implementation, other implementations have addressed enumeration concerns. For example, [RFC4398] (Josefsson, S., “Storing Certificates in the Domain Name System (DNS),” March 2006.) describes how to distribute email certificates that DNSSEC can validate, making it possible to use DNSSEC as a worldwide public key infrastructure for email addresses. However, [RFC4398] (Josefsson, S., “Storing Certificates in the Domain Name System (DNS),” March 2006.) acknowledges this configuration as unlikely for most organizations to implement due to enumerability concerns: "If an organization chooses to issue certificates for its employees, placing CERT RRs in the DNS by owner name, and if DNSSEC (with NSEC) is in use, it is possible for someone to enumerate all employees of the organization. This is usually not considered desirable, for the same reason that enterprise phone listings are not often publicly published and are even marked confidential."

This memo suggests a standard way to obscure the OpenPGP User ID such that entities who know the User ID they are looking for can use the cryptographic infrastructure, but entities without knowledge of the User ID in question can't enumerate the User IDs in use through the WoT alone.

This is accomplished by establishing a standard hashed format for User IDs, which can be used by compliant OpenPGP clients willing to offer this feature. As the hashed User ID is itself a [RFC4880] (Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R. Thayer, “OpenPGP Message Format,” November 2007.) conformant UTF-8 textual string, tools can also make use of older, compliant clients by specifying the hashed User ID directly.

FIXME: describe threat model of snooping keyserver operator? does this is defend against such an attacker?

FIXME: describe threat model of snooping on the wire between keyserver and OpenPGP client. Is this the best defense against such an attacker?

FIXME: discuss what it means to sign a hashed User ID

FIXME: discuss querying in cleartext if the hashed UID fails to find anything



 TOC 

1.1.  Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.) [RFC2119].



 TOC 

2.  Hashed User ID Format

The standard hashed User ID MUST be a single line of ASCII text with three fields, delimited by the '#' character:

For example, the User ID "Mary User <mary@example.net>" would instead be represented (stored, transmitted, etc) as "hash#SHA256#41992fd90a113fdbb700ee3f7b7c4e8ba6ee14ace7a2cc6f5c26c1e702138647"



 TOC 

3.  Choice of Hash Algorithm

OpenPGP implementations creating new hashed User IDs have a choice of which hash algorithm to use. Based on current understanding of the hash algorithms available, and the specific requirements of this application, implementations SHOULD use the SHA256 algorithm (as specified in [FIPS180] (National Institute of Standards and Technology, “Secure Hash Signature Standard (SHS) (FIPS PUB 180-2).,” 2002.)) to generate new hashed User IDs.

FIXME: discuss clients querying keyservers

FIXME: discuss clients searching local keyring storage

FIXME: discuss migration to new hash algorithms



 TOC 

4.  Implementation Considerations



 TOC 

4.1.  User ID Canonicalization

Some types of User ID (such as those containing domain names inside of [RFC5322] (Resnick, P., Ed., “Internet Message Format,” October 2008.) e-mail addresses) have components that can be represented in various ways with the same semantic content. For a hashed User ID to be retrievable, a canonical form of the User ID SHOULD be used when creating and looking up the hashed User ID. This section attempts to establish reasonable canonical forms for relatively-common types of User ID.



 TOC 

4.1.1.  Domain Name and URL Scheme Case-insensitivity

User IDs may include DNS names internally, for example in [RFC5322] (Resnick, P., Ed., “Internet Message Format,” October 2008.) e-mail addresses or [RFC3986] (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.) URLs. [RFC4343] (Eastlake, D., “Domain Name System (DNS) Case Insensitivity Clarification,” January 2006.) indicates that DNS names are case-insensitive. Any substring within a User ID representing a DNS name MUST be canonicalized to its lower-case representation before hashing.

Section 3.1 of [RFC3986] (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.) indicates that the scheme part of a Uniform Resource Locator (URL) is also case-insensitive, and that the canonical form is lower-case. Any substring within a User ID representing a URL scheme MUST be canonicalized to its lower-case representation before hashing.

For example:

The [RFC5322] (Resnick, P., Ed., “Internet Message Format,” October 2008.) e-mail address "Mary User <Mary@EXAMPLE.NET>" would be canonicalized to "Mary User <Mary@example.net>" before hashing.

The [RFC3986] (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.) URL "HTTPS://FOO.Example.NET" would be canonicalized to "https://foo.example.net" before hashing.



 TOC 

4.1.2.  IP Addresses

Some User IDs (for example, those containing URLs) may include a host's IP address. IP addresses MUST be canonicalized before hashing.



 TOC 

4.1.2.1.  IPv4 Addresses

Canonicalized IPv4 addresses MUST be represented as 4 dot-separated decimal numbers, without any leading zeroes.

For example:

An IPv4 address with leading zeros such as "192.000.002.123" MUST be canonicalized to "192.0.2.123".



 TOC 

4.1.2.2.  IPv6 Addresses

IPv6 addresses MUST be canonicalized to the shortest representation that does not contain an elision of a group of zeroes. Point 1 of [RFC4291] (Hinden, R. and S. Deering, “IP Version 6 Addressing Architecture,” February 2006.), section 2.2 provides an example of an address in its shorted representation without dropping a group of zeroes.

For Example:

The address with the full hexadecimal representation "2001:0db8:0000:0000:0000:0000:0000:0001" MUST be canonicalized as "2001:db8:0:0:0:0:0:1".



 TOC 

4.1.3.  Human Names

Human names are, for obvious reasons, hard to canonicalize. Therefore, this document makes no specific suggestions for a "standard" way to canonicalize human names.



 TOC 

4.1.4.  Other Case-insensitivity

FIXME: discuss other canonicalization (IDN?)



 TOC 

4.2.  Avoiding Loops

Client tools that handle hashed User IDs should be able to recognize that a User ID is already hashed. If the client tool recognizes that a given User ID matches the specification of User ID hashing outlined in this document, it should not re-hash the User ID for the purpose of creating, looking up, sign, etc. such User IDs.



 TOC 

4.3.  Unusual Hash Algorithms

FIXME: how many queries are worth doing?



 TOC 

4.4.  User Interaction

FIXME: suggest ways to cleanly interact with users -- display unhashed User IDs?



 TOC 

4.5.  Local Storage

FIXME: should compliant implementations store local copies of the unhashed User IDs for future convenience?



 TOC 

4.6.  Interaction with Trust Signature Regular Expressions

Section 5.2.3.14 of [RFC4880] (Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R. Thayer, “OpenPGP Message Format,” November 2007.) describes the use of regular expressions in a trust signature. When interpreting a hashed User ID where the cleartext of the User ID is known, trust signatures should be considered to be applied to the cleartext User ID, not to the hashed User ID.



 TOC 

5.  Rationales for decisions

Why a User ID instead of a new User Attribute Type?

Why a text string instead of numeric representation of hash algo?

Why the fixed-string prefix?

Why hex instead of Base64?



 TOC 

6.  Acknowledgements

Thanks for significant discussion: FIXME.



 TOC 

7.  IANA Considerations

This memo includes no request to IANA.



 TOC 

8.  Security Considerations

FIXME: if there is not fall-back to clear text UIDs, there is potential for denial of a service attack against users who do *not* publish hashed UIDs. attackers can publish hashed versions of the original users UID, which would prevent the original users key from ever being found. the original user could get around this by publishing a hashed UID along side the non-hashed ID.

FIXME: Does hashing User IDs protect against a keyserver operator snooping traffic?

FIXME: Are there better, or additional defenses that one can take against an attacker who is snooping on the wire between the keyserver and the OpenPGP client?

FIXME: Discuss inevitable relative hash algorithm strength obsolescence as cryptographic research advances

FIXME: discuss signing weakly-hashed User IDs with stronger hashes

FIXME: discuss local storage of non-hashed User IDs.



 TOC 

9.  References



 TOC 

9.1. Normative References

[FIPS180] National Institute of Standards and Technology, “Secure Hash Signature Standard (SHS) (FIPS PUB 180-2).,” 2002.
[RFC2119] Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML).
[RFC4880] Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R. Thayer, “OpenPGP Message Format,” RFC 4880, November 2007 (TXT).


 TOC 

9.2. Informative References

[RFC3629] Yergeau, F., “UTF-8, a transformation format of ISO 10646,” STD 63, RFC 3629, November 2003 (TXT).
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” STD 66, RFC 3986, January 2005 (TXT, HTML, XML).
[RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. Rose, “DNS Security Introduction and Requirements,” RFC 4033, March 2005 (TXT).
[RFC4035] Arends, R., Austein, R., Larson, M., Massey, D., and S. Rose, “Protocol Modifications for the DNS Security Extensions,” RFC 4035, March 2005 (TXT).
[RFC4291] Hinden, R. and S. Deering, “IP Version 6 Addressing Architecture,” RFC 4291, February 2006 (TXT).
[RFC4343] Eastlake, D., “Domain Name System (DNS) Case Insensitivity Clarification,” RFC 4343, January 2006 (TXT).
[RFC4398] Josefsson, S., “Storing Certificates in the Domain Name System (DNS),” RFC 4398, March 2006 (TXT).
[RFC5081] Mavrogiannopoulos, N., “Using OpenPGP Keys for Transport Layer Security (TLS) Authentication,” RFC 5081, November 2007 (TXT).
[RFC5155] Laurie, B., Sisson, G., Arends, R., and D. Blacka, “DNS Security (DNSSEC) Hashed Authenticated Denial of Existence,” RFC 5155, March 2008 (TXT).
[RFC5322] Resnick, P., Ed., “Internet Message Format,” RFC 5322, October 2008 (TXT, HTML, XML).


 TOC 

Authors' Addresses

  Daniel Kahn Gillmor
  Independent
  XXXXX XXXXX St.
  Brooklyn, NY XXXXX
  USA
Phone:  +1 718 XXX XXXX
Email:  dkg@fifthhorseman.net
  
  Jameson Graef Rollins
  Independent
  XXXXX XXXXX St.
  Brooklyn, NY XXXXX
  USA
Phone:  +1 718 XXX XXXX
Email:  jrollins@finestructure.net
  
  Micah Anderson
  Riseup Networks
  PO Box 4282
  Seattle, WA 98194
  USA
Phone:  +1 206 279 5902
Email:  micah@riseup.net
  
  Matthew James Goins
  Openflows Community Technology Lab
  XXXXX XXXXX St.
  Brooklyn, NY XXXXX
  USA
Phone:  +1 612 XXX XXXX
Email:  mjgoins@openflows.com


 TOC 

Full Copyright Statement

Intellectual Property