{"id":151,"date":"2025-03-06T23:33:07","date_gmt":"2025-03-06T23:33:07","guid":{"rendered":"https:\/\/nils-becker.org\/?p=151"},"modified":"2025-03-06T23:39:29","modified_gmt":"2025-03-06T23:39:29","slug":"wie-lang-koennen-woerter-in-chatgpt-sein-und-wie-erfindet-es-neue","status":"publish","type":"post","link":"https:\/\/nils-becker.org\/?p=151&lang=en","title":{"rendered":"How Long Can Words Be for ChatGPT? And How Does It Invent New Ones?"},"content":{"rendered":"\n<p>Have you ever wondered how ChatGPT can create new words even though it has a fixed vocabulary? Or what the longest word in its vocabulary is? &#x1f914;<\/p>\n\n\n\n<p>Here are the answers. &#x1f913;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Tokens Instead of Words<\/h3>\n\n\n\n<p>To understand this, we first need to look at the concept of tokens. ChatGPT doesn\u2019t work with words directly but with tokens\u2014small units of characters that are often shorter than a word.<\/p>\n\n\n\n<p>ChatGPT\u2019s vocabulary consists of <strong>100,277 tokens<\/strong>, including special tokens used for commands (e.g., to mark the end of a text). On average, an English word is split into <strong>1.34 tokens<\/strong>, while a German word is about <strong>1.78 tokens<\/strong> long. For example, the word <em>Captain<\/em> doesn\u2019t exist as a single token but is broken down into <strong>\u201ccapt\u201d<\/strong> and <strong>\u201cain\u201d<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How Does ChatGPT Create New Words?<\/h3>\n\n\n\n<p>Since the model works with individual tokens, it can generate new words by combining existing tokens. This is how creative neologisms emerge. The smallest tokens are individual characters\u2014so in theory, ChatGPT could also &#8222;invent&#8220; completely new terms by stringing together random characters.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How Long Can a Word Be?<\/h3>\n\n\n\n<p>This depends on the context length. ChatGPT can process a maximum of <strong>4,096 tokens<\/strong> at once\u2014so in theory, a single word could be that long! But practically speaking, that wouldn&#8217;t make much sense.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">And the Longest Token in the Vocabulary?<\/h3>\n\n\n\n<p>The answer is unexpected: <strong>It&#8217;s a token consisting of 128 spaces.<\/strong> &#x1f604; (Token ID <strong>58040<\/strong>)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Have you ever wondered how ChatGPT can create new words even though it has a fixed vocabulary? Or what the longest word in its vocabulary is? &#x1f914; Here are the answers. &#x1f913; Tokens Instead of Words To understand this, we first need to look at the concept of tokens. ChatGPT doesn\u2019t work with words directly [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"class_list":["post-151","post","type-post","status-publish","format-standard","hentry","category-nicht-kategorisiert-en"],"_links":{"self":[{"href":"https:\/\/nils-becker.org\/index.php?rest_route=\/wp\/v2\/posts\/151","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nils-becker.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nils-becker.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nils-becker.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nils-becker.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=151"}],"version-history":[{"count":2,"href":"https:\/\/nils-becker.org\/index.php?rest_route=\/wp\/v2\/posts\/151\/revisions"}],"predecessor-version":[{"id":156,"href":"https:\/\/nils-becker.org\/index.php?rest_route=\/wp\/v2\/posts\/151\/revisions\/156"}],"wp:attachment":[{"href":"https:\/\/nils-becker.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=151"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nils-becker.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=151"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nils-becker.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=151"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}