gh-135676: Lexical analysis: Reword String literals and related sections (GH-135942)

encukou · blaisep · StanFromIreland · web-flow · commit 777159fa318f · 2025-07-23T15:57:54.000Z
Co-authored-by: Blaise Pabon &lt;blaise@gmail.com&gt;
Co-authored-by: Stan Ulbrych &lt;89152624+StanFromIreland@users.noreply.github.com&gt;
Co-authored-by: Adam Turner &lt;9087854+AA-Turner@users.noreply.github.com&gt;
diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst
@@ -133,13 +133,18 @@ Literals
 
 Python supports string and bytes literals and various numeric literals:
 
-.. productionlist:: python-grammar
-   literal: `stringliteral` | `bytesliteral` | `NUMBER`
+.. grammar-snippet::
+   :group: python-grammar
+
+   literal: `strings` | `NUMBER`
 
 Evaluation of a literal yields an object of the given type (string, bytes,
 integer, floating-point number, complex number) with the given value.  The value
 may be approximated in the case of floating-point and imaginary (complex)
-literals.  See section :ref:`literals` for details.
+literals.
+See section :ref:`literals` for details.
+See section :ref:`string-concatenation` for details on ``strings``.
+
 
 .. index::
    triple: immutable; data; type
@@ -152,6 +157,58 @@ occurrence) may obtain the same object or a different object with the same
 value.
 
 
+.. _string-concatenation:
+
+String literal concatenation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Multiple adjacent string or bytes literals (delimited by whitespace), possibly
+using different quoting conventions, are allowed, and their meaning is the same
+as their concatenation::
+
+   >>> "hello" 'world'
+   "helloworld"
+
+Formally:
+
+.. grammar-snippet::
+   :group: python-grammar
+
+   strings: ( `STRING` | fstring)+ | tstring+
+
+This feature is defined at the syntactical level, so it only works with literals.
+To concatenate string expressions at run time, the '+' operator may be used::
+
+   >>> greeting = "Hello"
+   >>> space = " "
+   >>> name = "Blaise"
+   >>> print(greeting + space + name)   # not: print(greeting space name)
+   Hello Blaise
+
+Literal concatenation can freely mix raw strings, triple-quoted strings,
+and formatted string literals.
+For example::
+
+   >>> "Hello" r', ' f"{name}!"
+   "Hello, Blaise!"
+
+This feature can be used to reduce the number of backslashes
+needed, to split long strings conveniently across long lines, or even to add
+comments to parts of strings. For example::
+
+   re.compile("[A-Za-z_]"       # letter or underscore
+              "[A-Za-z0-9_]*"   # letter, digit or underscore
+             )
+
+However, bytes literals may only be combined with other byte literals;
+not with string literals of any kind.
+Also, template string literals may only be combined with other template
+string literals::
+
+   >>> t"Hello" t"{name}!"
+   Template(strings=('Hello', '!'), interpolations=(...))
+
+
 .. _parenthesized:
 
 Parenthesized forms
diff --git a/Doc/reference/grammar.rst b/Doc/reference/grammar.rst
@@ -10,11 +10,8 @@ error recovery.
 
 The notation used here is the same as in the preceding docs,
 and is described in the :ref:`notation <notation>` section,
-except for a few extra complications:
+except for an extra complication:
 
-* ``&e``: a positive lookahead (that is, ``e`` is required to match but
-  not consumed)
-* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match)
 * ``~`` ("cut"): commit to the current alternative and fail the rule
   even if this fails to parse
 
diff --git a/Doc/reference/introduction.rst b/Doc/reference/introduction.rst
@@ -145,15 +145,23 @@ The definition to the right of the colon uses the following syntax elements:
 * ``e?``: A question mark has exactly the same meaning as square brackets:
   the preceding item is optional.
 * ``(e)``: Parentheses are used for grouping.
+
+The following notation is only used in
+:ref:`lexical definitions <notation-lexical-vs-syntactic>`.
+
 * ``"a"..."z"``: Two literal characters separated by three dots mean a choice
   of any single character in the given (inclusive) range of ASCII characters.
-  This notation is only used in
-  :ref:`lexical definitions <notation-lexical-vs-syntactic>`.
 * ``<...>``: A phrase between angular brackets gives an informal description
   of the matched symbol (for example, ``<any ASCII character except "\">``),
   or an abbreviation that is defined in nearby text (for example, ``<Lu>``).
-  This notation is only used in
-  :ref:`lexical definitions <notation-lexical-vs-syntactic>`.
+
+.. _lexical-lookaheads:
+
+Some definitions also use *lookaheads*, which indicate that an element
+must (or must not) match at a given position, but without consuming any input:
+
+* ``&e``: a positive lookahead (that is, ``e`` is required to match)
+* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match)
 
 The unary operators (``*``, ``+``, ``?``) bind as tightly as possible;
 the vertical bar (``|``) binds most loosely.
diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst