This site will look better in a browser that supports Web standards, but it is accessible to any browser or Internet device.

August 8, 2008

1993
2001
2002
2003
2004
2005
2006
2007
2008
January
February
March
April
May
June
July
August
Su M Tu W Th F Sa
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            
September
October
November
December
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020

Confirmation Bias of Sort

I'm not sure what the concept is called, but basically I'm talking about the fact that when you shovel something new into your brain you start seeing it more often because it stands out now. I never noticed how many Hondas (and now VWs) were on the road until I bought my own. Now whenever I see a black VW with a satellite antenna I check the license plate to make sure my car hasn't been stolen. (So far, so good.)

And now that I've had to dip my toe into the world of character encoding I see that more frequently too. Well, actually, I see it less frequently but it stands out more because when I do see it I know why it's there.

Ever see a blog entry, or even an online article, with this mishmash in it: — ? That's a long dash (—) with two errors in it. First off, it got sent into the content management system as UTF-8 but interpreted as characters. Then when it got spit out, the final character was interpreted as a curly-quote instead of a non-printing character because the browser assumes you're looking at a page made on Windows.

â is character 0xE2
€ is 0x80
” is really 0x201D but in Windows it shows up as 0x94

Convert those into binary and you get 0b11100010.10000000.10010100 -- a UTF-8 three-byte character. Strip out the markers and you're left with xxxx0010.xx000000.xx010100 which reogranizes into 0b00100000.00010100. In hex that's 0x2014, in decimal it's 8212, the code for a long dash.

Curly quotes, when typed correctly, can get misinterpreted as well, they show up as � (the question mark is part of the wrongly-encoded character and may show up as an empty box instead).

I'm just moderately amused that one character that itself gets done wrong from time to time manages to show up in another character done wrong thanks to using the wrong character set. I wonder if a character ever shows up in its own UTF-8 encoding. Could set up a neat little infinite loop there...

This page's URL is http://jasonfleshman.org

This page last updated Jul 19, 2019 3:34:19 PM.