Color parsing brainteaser

Thursday 18 January 2007This is exactly 18 years old. Be careful.

I was experimenting with HTML color names the other day. This HTML snippet:

<p><font color='red'>&#x2588; RED</font></p>

got me some red text:

█ RED

(I’m using the Unicode FULL BLOCK 2588 character to get a solid swatch without having to worry about antialiasing effects). Adding a less well-known name got me another color:

<p><font color='red'>&#x2588; RED</font></p>
<p><font color='seagreen'>&#x2588; SEAGREEN</font></p>

█ RED

█ SEAGREEN

What if the name had a space in it?

<p><font color='red'>&#x2588; RED</font></p>
<p><font color='seagreen'>&#x2588; SEAGREEN</font></p>
<p><font color='sea green'>&#x2588; SEA GREEN</font></p>

To my surprise, the “sea green” line was blue! It seems the color name parser in your typical HTML browser is very forgiving, just like the rest of HTML interpretation. Given a nonsense string like “sxbxxsree”, it will decide that some subset of the characters indicate a color, and it will use them.

I played around with more randomized color names, and ended up with a brainteaser on my hands: how exactly does the browser interpret these bogus color names? Note that IE and Firefox don’t agree, though they are close.

I can’t show the effect inline here, because the bogus colors have no effect if the page has real color specifications someplace else, so have a look at the results in a separate page.

Here’s what different browsers show:

ColorFirefoxIE7Opera
red█ #ff0000█ #ff0000█ #ff0000
seagreen█ #2e8b57█ #2e8b57█ #2e8b57
sea green█ #0e00ee█ #0e00ee█ #0ea00e
sxbxxsreen█ #0000e0█ #0000e0█ #00b000
sxbxxsree█ #00000e█ #0b00ee█ #00b000
sxbxxsrn█ #000000█ #0b0000█ #00b000
sxbxeen█ #000e00█ #0bee00█ #00b0ee
sreen█ #00ee00█ #00ee00█ #00ee00
ffff00█ #ffff00█ #ffff00█ #ffff00
xf8000█ #0f8000█ #0f8000█ #0f8000

At least in the case of Firefox, I could go digging into the source to try to find what it is actually doing, but it is a head-scratcher. Clearly, it is interpreting the accidental hex characters, but how does it decide which ones to use where?

This is one of those fascinating cases where a black box reveals something of its insides through how it behaves when broken. Neurologists study people with head injuries for similar reasons!

» 11 reactions

Comments

[gravatar]
This looks like it could be applied to enhance Damien's CAPTCHA idea.
[gravatar]
I am happy to report that Opera has a slightly more logical outlook. (I don't think I can show colours here, but:)

red: #ff0000
seagreen: #2e8b57
sea green: #0ea00e
sxbxxsreen: #00b000
sxbxxsree: #00b000
sxbxxsrn: #00b000
sxbxeen: #00b0ee
sreen: #00ee00
ffff00: #ffff00
xf8000: #0f8000

For those who can't mentally translate hexcodes into colours, asidefrom the red and yellow there are various pleasant shades of green, except for sxbxeen which is close to cyan.

Opera and FF seem to agree that "Ned" is black and "Ned Batchelder" is green (though aqua-green and grass-green respectively).
[gravatar]
Pete: the synchronicity with Damien's post hadn't occurred to me, but there might be some interesting use there.

Cathy: thanks. I've updated the table above to include the Opera colors.
[gravatar]
On a related note. Yesterday my coworker discovered that "LightGrey" works in ie7 but "LightGray" doesn't which would be fine except "Gray" works but "Grey" doesn't. I may have the a vs e working / not working backwards as im, thankfully, on linux and can't test it. :) but the point remains.
[gravatar]
I think I've figured out Firefox.

Looking at "sea green", you can get "0e00ee" from reading a pair of hex digits, treating non-hex as '0', and then skipping a character before reading the next pair. So you'd get "0e" for "se", skip 'a', "00" for " g", skip 'r', and "ee" for "ee".

Looking at "sxbxxsree", you can get "00000e" from the same rule: "00" for "sx", skip 'b', "00" for "xx", skip 's', "0e" for "re".

Looking at "sxbxxsreen", you can get "0000e0" from almost the same rule, except you now skip two characters between pairs. So you'd get "00" for "sx", skip "bx", "00" for "xs", skip "re", and "e0" for "en".

Thus, the length of the string determines how many characters to skip. So the algorithm appears to basically be: split the whole string into three equal parts for the red, green, and blue components; and use the first two characters of each of these parts as the hex digits for the component (treating non-hex digits as '0'). The only modification necessary is to right-pad with '0' when the string cannot be evenly divided. So "sxbxxsreen" has length 10, which is not divisible by 3, but if we treat it as "sxbxxsreen00" we have the parts "sxbx", "xsre", and "en00", and then we consider "sx" as red, "xs" as green, and "en" as blue.

Can anyone find any exceptions? I haven't thought about IE, yet.
[gravatar]
Firefox's rules match up pretty well with X's color parsing, which is designed for screen-depth-independence; if you had a display with a 36-bit colorspace, you'd use 3 hex digits for each component. If such a color were used on a 24-bit display (which is what most of us have now), only the 8 most important bits would be used.

If that's it, then "#see" should be equivalent to "#00eeee" or "#00e0e0".
[gravatar]
dak, it looks like you are right. nsColor.cpp has the code, and it is doing what you say: Split the name into three parts (no more than 4 chars each, though), then interpret chars as hex nibbles, with unknowns becoming zeros.
[gravatar]
I think I've figured out IE7, as well. It's almost the same as Firefox's rule except that it has an additional step: After padding (if necessary) and splitting the string into three groups, consider the first character of each group and throw it out if it's not valid hexadecimal, repeat until you have a valid group (or just two characters left). I've not seen an upper limit on the length of the groups (as you point out Firefox has).

My guess as to why this happens is that it allows the browsers to interpret colors where you've delimited/grouped the components: "aa bb cc"; "aa,bb,cc"; and "aa, bb, cc" would all be #aabbcc. IE7 would even accept "(aa)(bb)(cc)".
[gravatar]
Sylvain Galineau 3:19 PM on 24 Jan 2007
It's generally dubbed 'FlexHex' and each browser has a slightly different way to do it. As noted above, the general pattern is to treat missing and invalid hex digits as 0. The problem of course is when the string is too long or too short.

One Sam Schinke looked into it a few years ago.
[gravatar]
I did some experiments related to this topic a couple of months ago. My findings are at http://bgok.net/stupidwebtricks/colorkeywords.html. In summary:

- IE doesn't handle the 'greys', except lightgrey as noted above by masukomi

- The CSS inherit keyword doesn't always work as expected in IE.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.