cookie名称和值中允许的字符是什么?它们与URL或某个公共子集相同吗?

我问的原因是,我最近遇到了一些奇怪的行为与cookie有-在他们的名字,我只是想知道这是特定于浏览器或我的代码是错误的。


我认为这通常是特定于浏览器的。为了安全起见,base64编码了一个JSON对象,并将所有内容存储在其中。这样你只需要解码和解析JSON。base64中使用的所有字符在大多数浏览器(如果不是所有浏览器)中都可以正常运行。

cookie规范有两个版本 1. 版本0 cookie,即Netscape cookie, 2. 版本1又名RFC 2965 cookie 在版本0中,cookie的名称和值部分是字符序列,如果不与双引号一起使用,则不包括分号、逗号、等号和空格 版本1要复杂得多,你可以在这里检查 在这个版本中,除了名称不能以$符号开头之外,名称值部分的规格几乎相同

根据古老的Netscape cookie_spec,整个NAME=VALUE字符串是:

不包括分号、逗号和空格的字符序列。

应该可以工作,在我这里的浏览器中似乎是可以的;你在哪里有问题?

综上所述:

=是合法的,但可能有歧义。浏览器总是将字符串中的第一个=符号的名称和值分开,所以实际上你可以在value中放入=符号,而不是name。

这里没有提到什么,因为Netscape在编写规范方面很糟糕,但似乎一直受到浏览器的支持:

either the NAME or the VALUE may be empty strings if there is no = symbol in the string at all, browsers treat it as the cookie with the empty-string name, ie Set-Cookie: foo is the same as Set-Cookie: =foo. when browsers output a cookie with an empty name, they omit the equals sign. So Set-Cookie: =bar begets Cookie: bar. commas and spaces in names and values do actually seem to work, though spaces around the equals sign are trimmed control characters (\x00 to \x1F plus \x7F) aren't allowed

没有提到的和浏览器完全不一致的是非ascii (Unicode)字符:

在Opera和谷歌Chrome中,它们被编码为UTF-8的Cookie头; 在IE中,使用机器的默认代码页(特定于语言环境,从不使用UTF-8); Firefox(和其他基于mozilla的浏览器)单独使用每个UTF-16代码点的低字节(因此ISO-8859-1是OK的,但其他任何内容都是混乱的); Safari只是拒绝发送任何包含非ascii字符的cookie。

所以实际上你根本不能在cookie中使用非ascii字符。如果您想使用Unicode、控制码或其他任意字节序列,cookie_spec要求您使用自己选择的特别编码方案,并建议使用url编码(由JavaScript的encodeURIComponent生成)作为合理的选择。

就实际的标准而言,已经有一些试图编纂cookie行为的尝试,但迄今为止还没有一个能真正反映现实世界。

RFC 2109 was an attempt to codify and fix the original Netscape cookie_spec. In this standard many more special characters are disallowed, as it uses RFC 2616 tokens (a - is still allowed there), and only the value may be specified in a quoted-string with other characters. No browser ever implemented the limitations, the special handling of quoted strings and escaping, or the new features in this spec. RFC 2965 was another go at it, tidying up 2109 and adding more features under a ‘version 2 cookies’ scheme. Nobody ever implemented any of that either. This spec has the same token-and-quoted-string limitations as the earlier version and it's just as much a load of nonsense. RFC 6265 is an HTML5-era attempt to clear up the historical mess. It still doesn't match reality exactly but it's much better then the earlier attempts—it is at least a proper subset of what browsers support, not introducing any syntax that is supposed to work but doesn't (like the previous quoted-string).

在6265中,cookie名称仍然指定为RFC 2616令牌,这意味着您可以从字母加上:

!#$%&'*+-.^_`|~

In the cookie value it formally bans the (filtered by browsers) control characters and (inconsistently-implemented) non-ASCII characters. It retains cookie_spec's prohibition on space, comma and semicolon, plus for compatibility with any poor idiots who actually implemented the earlier RFCs it also banned backslash and quotes, other than quotes wrapping the whole value (but in that case the quotes are still considered part of the value, not an encoding scheme). So that leaves you with the alphanums plus:

!#$%&'()*+-./:<=>?@[]^_`{|}~

在现实世界中,我们仍然使用最原始、最糟糕的Netscape cookie_spec,因此使用cookie的代码应该准备好面对几乎任何情况,但是对于生成cookie的代码,建议坚持使用RFC 6265中的子集。

在ASP。在写入cookie之前,可以使用System.Web.HttpUtility对cookie值进行安全编码,并在读取时将其转换回原始形式。

// Encode
HttpUtility.UrlEncode(cookieData);

// Decode
HttpUtility.UrlDecode(encodedCookieData);

这将阻止&号和等号将一个值写入cookie时分割成一堆名称/值对。

几年前,MSIE 5或5.5(可能两者都有)在HTML块中有一些严重的“-”问题,如果你能相信的话。虽然它没有直接的关系,但自从我们在cookie中存储了一个MD5散列(只包含字母和数字)来查找服务器端数据库中的所有其他内容以来。

你不能在cookie的值字段中放入“;”,在大多数浏览器中,将设置的名称是“;”之前的字符串…

更新的rfc6265发布于2011年4月:

cookie-header = "Cookie:" OWS cookie-string OWS
cookie-string = cookie-pair *( ";" SP cookie-pair )
cookie-pair  = cookie-name "=" cookie-value
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )

cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                   ; US-ASCII characters excluding CTLs,
                   ; whitespace DQUOTE, comma, semicolon,
                   ; and backslash

如果你看看@bobince的答案,你会发现新的限制更加严格。

IE和Edge还有一个有趣的问题。名称中包含超过1个句点的cookie似乎会被无声地删除。 所以 如此:

cookie_name_a = valuea

而这个会被放弃

cookie.name.a = valuea

在这里,尽可能简短地说。专注于那些不需要逃跑的角色:

For cookie:

abdefghijklmnqrstuvxyzABDEFGHIJKLMNQRSTUVXYZ0123456789!#$%&'()*+-./:<>?@[]^_`{|}~

为url

abdefghijklmnqrstuvxyzABDEFGHIJKLMNQRSTUVXYZ0123456789.-_~!$&'()*+,;=:@

对于cookie和url(交集)

abdefghijklmnqrstuvxyzABDEFGHIJKLMNQRSTUVXYZ0123456789!$&'()*+-.:@_~

这就是你的回答。

注意,对于cookie, =已被删除,因为它是 通常用于设置cookie值。

对于url this the =被保留。十字路口显然没有。

var chars = "abdefghijklmnqrstuvxyz"; chars += chars.toUpperCase() + "0123456789" + "!$&'()*+-.:@_~";

事实证明,转义仍然会发生,而且还会发生意想不到的情况,特别是在Java cookie环境中,如果遇到最后一个字符,cookie就会用双引号包装。

安全起见,就用A-Za-z1-9。这就是我要做的。

这是简单的:

A <cookie-name> can be any US-ASCII characters except control characters (CTLs), spaces, or tabs. It also must not contain a separator character like the following: ( ) < > @ , ; : \ " / [ ] ? = { }. A <cookie-value> can optionally be set in double quotes and any US-ASCII characters excluding CTLs, whitespace, double quotes, comma, semicolon, and backslash are allowed. Encoding: Many implementations perform URL encoding on cookie values, however it is not required per the RFC specification. It does help satisfying the requirements about which characters are allowed for though.

链接:https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie指令

One more consideration. I recently implemented a scheme in which some sensitive data posted to a PHP script needed to convert and return it as an encrypted cookie, that used all base64 values I thought were guaranteed 'safe". So I dutifully encrypted the data items using RC4, ran the output through base64_encode, and happily returned the cookie to the site. Testing seemed to go well until a base64 encoded string contained a "+" symbol. The string was written to the page cookie with no trouble. Using the browser diagnostics I could also verify the cookies was written unchanged. Then when a subsequent page called my PHP and obtained the cookie via the $_COOKIE array, I was stammered to find the string was now missing the "+" sign. Every occurrence of that character was replaced with an ASCII space.

考虑到从那时起,我读到过许多类似的未解决的抱怨,描述这种情况,经常使用base64在cookie中“安全地”存储任意数据,我认为我应该指出问题,并提供我公认的笨拙的解决方案。

在你对一段数据做了任何你想做的加密之后,然后使用base64_encode使它“cookie安全”,通过这个运行输出字符串…

// from browser to PHP. substitute troublesome chars with 
// other cookie safe chars, or vis-versa.  

function fix64($inp) {
    $out =$inp;
    for($i = 0; $i < strlen($inp); $i++) {
        $c = $inp[$i];
        switch ($c) {
            case '+':  $c = '*'; break; // definitly won't transfer!
            case '*':  $c = '+'; break;

            case '=':  $c = ':'; break; // = symbol seems like a bad idea
            case ':':  $c = '='; break;

            default: continue;
            }
        $out[$i] = $c;
        }
    return $out;
    }

Here I'm simply substituting "+" (and I decided "=" as well) with other "cookie safe" characters, before returning the encoded value to the page, for use as a cookie. Note that the length of the string being processed doesn't change. When the same (or another page on the site) runs my PHP script again, I'll be able to recover this cookie without missing characters. I just have to remember to pass the cookie back through the same fix64() call I created, and from there I can decode it with the usual base64_decode(), followed by whatever other decryption in your scheme.

我可以在PHP中做一些设置,允许cookie中使用的base64字符串被传输回PHP而不会损坏。与此同时,这是可行的。“+”可能是一个“合法”的cookie值,但如果您希望能够将这样的字符串传输回PHP(在我的例子中是通过$_COOKIE数组),我建议重新处理以删除违规字符,并在恢复后恢复它们。还有很多其他“饼干安全”的角色可供选择。

最后我用了

cookie_value = encodeURIComponent(my_string);

and

my_string = decodeURIComponent(cookie_value);

这似乎对各种角色都适用。除此之外,我还遇到了一些奇怪的问题,即使是那些不是分号或逗号的字符。

如果你稍后使用变量,你会发现像path这样的东西实际上会让重音字符通过,但它实际上并不匹配浏览器路径。为此,您需要对它们进行URIEncode。比如这样:

  const encodedPath = encodeURI(myPath);
  document.cookie = `use_pwa=true; domain=${location.host}; path=${encodedPath};`

因此,“允许的”字符可能比规范中规定的要多。但为了安全起见,您应该遵守规范,并使用uri编码的字符串。