~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
Can You Raed Tihs? / 4 months ago
~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-

There's an internet rumor that I imagine most people are familiar with that usually goes something like this:

The phaonmneal pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it denos't mtater in waht oredr the ltetres in a wrod are, the olny iprnoatmt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm.

Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Amzanig huh?

At first glance it appears simultaneously counterintuitive and self-evident, which is always a fun experience. Sometime back in the day I read (probably on Language Log) that the claim is somewhat misleading, and that properties of the English language generally and the sample paragraph specifically contribute significantly to the effect (rather than it being something wholly attributable to the way the mind processes text). The counter-claim says that because English (supposedly) has shorter words than many languages, and its writing system has explicit vowels, reading scrambled words is a lot easier than it would be in German or Hebrew. The other factor is that the sample paragraph itself is not a good representation, both because it contains shorter words than many styles of writing, and because the self-referentiality of the paragraph gives significant contextual clues.

I don't know much about the truth value of the counter claims, since I was just pulling some heresay out of my long-term memory, but the latter part is definitely something we can test. I slapped together a quick javascript applet to help, so any old text can be pasted in and shuffled about.

Short words make reading scrambled text a lot easier, because any word with less than four letters remains in its unscrambled form, and all four letter words merely have two adjacent letters swapped. English has enough short germanic words that you can get away with most of your words being some six letters or less, which keeps all of the letters from straying very far from their correct locations. However, if you scramble the words in more formal writing that tends toward longer words, I think it starts to break down. I'll go find the first two news stories that pop up on Google News and scramble their first paragraphs:

Wtih the srgtlgue oevr haelctrahe eteirnng an eevn tgheuor pshae, Prenidset Ombaa has hit btoh a milnotese and a seped bmup in his deul pusiurt of a moajr ohauverl of the ninota's miaedcl sstyem and a rietrbh of pvigsssorriem in Acriema. Husoe appoavrl of the liltsogeain Sradtuay eevn if Dmaeotcrs cluod mvoe it no freahtr--was a sgianl acmcpmilheosnt taht has eedlud pntdieesrs for daceeds. But the cosle vtoe and the extroneis it took to srucee a mojitray wree leadn wtih wnnirag snigs as the isuse moevs to the Stnaee. Eevn thoguh the Hsoue is a batosin of lesailbirm, the heltah crae oeruhval was a tgeuhor slel tahn etpecxed and the blil tnured out to be mroe ctearosvvine in its prcie tag, mroe liimted in the socpe of its gnoernvemt-run inuarcsne ooptin, and tihetgr in its rtnisitreocs on aotrobin funindg tahn mnay Drtacemos had hoepd
and
An Amry chpaialn aeksd mnoerurs Sanduy to pary for the aeucscd Frot Hood steohor, cianllg on tehm to fcous lses on why the tegardy hppeaend and mroe on hleping ecah oethr toguhrh "the vllaey of the sdaohw of dknarses." "Lrod, all tsohe anurod us sercah for mivote, sercah for maineng, serach for sitemnhog, soenmoe to bmlae. Taht is so fntrusriatg," Col. Farnk Jacsokn tlod a gourp of aobut 120 ppeole getehrad at one of the pos'ts chpael. "Taody, we psuae to haer form you. So Lrod, as we pary totgeher, we fcuos on tnighs we konw."
I think the first one is definitely harder, and it seems to have longer words as well, which I'm sure is most of the reason. I don't know what the "rietrbh of pvigsssorriem" is, but now I'm definitely worried about it.

So go scramble some long-worded passages, show them to your friends, and convince them that not being able to read it indicates some kind of brain damage.

--------------------
:::Comments:::

\__________ Rachelle -- 3 months ago __________/
Yeah, I think the original paragraph's claims are only true with short-word and familiar text. The first one you scrambled I could mostly read, but I was definitely thinking about the unscrambling; my eye was not just recognizing the words. rebirth of progressivism. Quite worrying, indeed.
--------------------
\__________ Rachelle -- 3 months ago __________/
it has something to do with the way the letters are scrambled, too. For example 'prenidset' looks a lot more like 'president' than 'pntdieesrs' looks like 'presidents,' probably because it preserves the order of the vowels alternated with consonants, and the second letter is also unscrambled. So if we go back to the original paragraph, we see that two of the longer words, "phenomenal" and "according" are scrambled in a pretty easy way. For 'phenomenal,' the ph are still together--a BIG clue considering they make a different sound together than separately. With 'according,' the two c's are kept together toward the beginning of the word, and the 'ing' are in the final three positions. Which makes me wonder if the paragraph was rigged. I think you should put it through your program a couple of times to see if it can be harder. Also, is 'rscheearch' a mistake? I assume it's supposed to be research, but then there's an extra ch. Also, I just noticed, either I'm wrong that the word is phenomenal, or it's spelled incorrectly in the original paragraph. Also the first sentence has some grammar problems. We've been duped by a bunch of idiots!
--------------------
\__________ Me -- 3 months ago __________/
I hadn't noticed the spelling problems - I pasted the paragraph from somebody's facebook profile. I wanted to google it to see if there was any more "official" version, but unfortunately that paragraph is quite difficult to google :). I also noticed the weak scramblings in that paragraph, and I meant to say something about it. I'll reproduce the original (with correct spellings) and post a few scrambles so we can see how readable it is. Of course, already knowing the paragraph will make it harder to judge.
--------------------
\__________ Me -- 3 months ago __________/
The peonhaenml pewor of the hmaun mnid: acrocidng to a recresah at Cgrbimdae Usentiirvy, it dn'esot mttear in waht oedrr the ltteers in a wrod are, the olny imrpoantt tnihg is taht the fsirt and lsat lteetr be in the rgiht pcale. The rset can be a taotl mses and you can stlil raed it whuotit a peorblm. Tihs is bcaseue the hmuan mnid deos not raed ervey leettr by iltsef, but the wrod as a whloe. Azanimg, huh?
--------------------
\__________ Me -- 3 months ago __________/
The pnhemoeanl pweor of the haumn mnid: adrnocicg to a rceeasrh at Cdgbairme Uvriestiny, it dosen't maettr in waht oerdr the lettres in a wrod are, the olny itorpanmt tihng is taht the fsirt and lsat letetr be in the rgiht palce. The rset can be a tatol mses and you can sitll raed it wohtuit a prolbem. Tihs is bceusae the huamn mnid deos not raed ervey lteter by ielstf, but the wrod as a wlhoe. Azimang, huh?
--------------------
\__________ Me -- 3 months ago __________/
I think some of those are definitely harder. I'm sure I couldn't figure out Cdgbairme in one pass. Usentiirvy is kind of a cool word. I wonder what language it could come from.
--------------------
\__________ Rachelle -- 3 months ago __________/
Hawaiian?
--------------------
\__________ Me -- 3 months ago __________/
I don't think I've ever heard Hawaiian before.
--------------------
\__________ Bruce -- about 1 month ago __________/
I wrote a similar app.:) punctuation was the tricky part. One thing that few people realize is how much syntax contributes to content and context. A friend of mine has a tee-shirt, on the back: | on the front: furiously __ | colorless sleep ______ | green ideas ______ | ideas green ______ | sleep colorless __ | furiously the first column is a list of words, the second column is a very weird sentence. Why? the syntax we expect in the english language. Once we see a pattern, the list of words which we will accept in later positions gets smaller. So we can read 'good' text fast, and adjust for an amazing degree of mangling.
--------------------
\__________ Bruceagain -- about 1 month ago __________/
Tee-shirt - column of words on each side. BACK: [furiously sleep ideas green colorless] FRONT: [colorless green ideas sleep furiously]
--------------------
\__________ Me -- about 1 month ago __________/
Thanks - I apologize for the format-erasing. If I had enough free time and less of a million other things I also wanted to do, I'd put fixing that at the top of my priority list :)
--------------------
\__________ Me -- about 1 month ago __________/
Looking back at the source code I wrote, apparently I handled punctuation with the regular expression /\b[a-zA-Z\']+\b/ which basically just allows for an apostrophe in the word, and takes advantage of the "\b" anchor. Are there some weird punctuation cases I'm not thinking of that this wouldn't cover?
--------------------
\__________ Bruce -- about 1 month ago __________/
No, your method should handle any valid punctuation that I know of. I either didn't think to use regex or I couldn't get it to work. So I just space-parsed the string and had to deal with ordinary punctuation at the ends of my words. I ignored the apostrophe so it scrambled with the letters. I'm sure one version had the apostrophe and the letter on each side staying put. :(
--------------------
\__________ Bruce -- about 1 month ago __________/
Back to the topic :) Here is the original paragraph slightly reworded, scrambled, and then reversed by word. Is each word harder to decode with no syntax flow?
--------------------
\__________ Example -- about 1 month ago __________/
Azminag! Aeolubltsy wrods. eirtne rahetr lteetrs, ivdaduniil raed do'enst mnid hamun the bescaue is Tihs ppeole. agaevre to rlbaadee riamen wlil wrod the and sabemlcrd clleopmety be can ltrtees rinanimeg The lotnoiacs. orgiinal teihr in raimen lrtetes ednnig and biingnneg the taht is tnihg inrtpaomt the ocucr, wrod a in ltreets the oedrr waht in maettr dso'net it Uieivrstny, Cbrmiadge at rasheceerr a to arcndicog mnid, hmaun the of peowr pmahoennel The
--------------------
\__________ Me -- about 1 month ago __________/
That's a good point. I wish there were an easy way to scramble the same paragraph two ways and read each of them for the first time. I keep recognizing words by familiarity with the paragraph.
--------------------
(New comment)