Humans, not chatbots, find capitalization tricky
Loading...
Recent news stories repeat the claim that Large Language Models (LLMs) such as ChatGPT can be tricked by capital letters. The articles allege that questions containing unexpected capitalization – “What is a fUNny namE For a small cat?” for example – will confuse LLMs and prompt incorrect answers. At least in ChatGPT’s case, this isn’t true – the chatbot offered “Whisker-doodle” in response to my question, because “It’s a whimsical name that suits a small cat’s lively and mischievous nature.”
The idea seems to come from a May 16 paper written by computer scientists in which they note that LLMs are stymied by questions like this: “isCURIOSITY waterARCANE wetTURBULENT orILLUSION drySAUNA?” According to the paper, humans can read the question contained in the lowercase letters and answer “wet,” while ChatGPT replies that these words don’t “seem to form a coherent question or statement.” This is a much trickier puzzle than my cat question. Given how widely the claim is circulating, it seems that we enjoy the idea that ChatGPT is so constrained by the rules of proper spelling and grammar that a couple of out-of-place capitals can cause it to short-circuit.
Humans, of course, have the opposite problem. We can easily understand “I aTe a SandWich” and “i live in america,” but we sometimes have trouble remembering capitalization rules. And we encounter fewer situations in which we need to remember these rules. This week and next, let’s take a look at why we started using capitals, and why, in certain genres of writing, we have already stopped.
The earliest European manuscripts don’t differentiate between uppercase and lowercase letters; all letters are the same size and come in only one shape. At first they were written in Square Capitals, shaped like letters in inscriptions on ancient Roman buildings, as well as our modern English capital letters. They were beautiful but quite laborious for scribes to write. The manuscripts were also difficult to read, since the letter forms and sizes gave no clues about where sentences began and ended, and very little punctuation was employed.
In order to write faster, scribes developed the uncial script, which had more rounded letters, some of which, such as a and e, are shaped like the small letters we use to write English today. Uncial was still a majuscule (“capital”) script, as the letters were all the same size and had only one form. In order to highlight the organization of a manuscript, though, scribes would make the initial letter of a new chapter or section much larger than the others, which was the first step in the development of a divide between upper and lowercase letters.