on
Escaping "Escape Hell": The Case for Flexible String Delimiters
Almost all programming languages support the basic string data type, a sequence of characters that is not interpreted as part of the language’s own syntax. A string literal appears literally in the source code (let fruit = "apple"), and many programming languages use the double quotation mark (") as a delimiter to mark the beginning and end of a given string literal. Sometimes, a string literal needs to contain special characters, such as a newline or the delimiter itself. These characters are usually expressed using an escape character, often \, that modifies the succeeding character (e.g. "Hello \\\" World" → Hello \" World).
In principle, any string literal can be expressed using delimiters and escape characters. The computer does not care much, but the programmer will inevitably encounter the limits of this approach, and sooner or later enter escape hell. The result is absurd statements, like an escaped regular expression matching a UNC name that begins with 8 backslashes, "\\\\\\\\".
In order to mitigate the shortcomings of escaped strings, many programming languages offer raw string literals, which do not allow escape sequences, at the cost of being unable to express the delimiter itself in the string (e.g. in Python the escaped string literal "(?:\\d{1,3}\\.?){4}" vs. the raw string literal r"(?:\d{1,3}\.?){4}").
And then, some languages offer flexible delimiters, which allow defining a delimiter such that it does not appear in the string’s content. This post makes a case for flexible string delimiters.
Flexible Delimiters
Flexible delimiters eliminate the need for escaping entirely. Rust designed this nicely1, so did Zig2. Python, Javascript and Go did not.
Long before Rust and Zig, SQL had the same idea with Dollar-Quoted String Constants. And of course, Bash’s heredoc is another fine example of this (with its peculiar syntax <<EOF ... EOF). Amongst modern programming languages, Rust IMO offers the nicest solution, allowing for extending the base delimiter (r#"..."#, r##"..."##, …) as much as needed. Even YAML, though not a programming language, and for all its flaws, supports block scalars, a mechanism that relies on indentation to delimit a string.
The most common need for flexibly delimited strings is when building commands for some other (interpreted) language with a similar character set as the host language. Consider Grafana Alloy, Grafana’s Opentelemetry collector, written in Go. Alloy’s config file format uses Go’s syntax for strings and raw strings3. Alloy is packed with great features, one of which is to select logs using a LogQL query. However, LogQL contains regular expressions, and is itself interpreted by a Go program (Loki).
Consider the simple log format [level] - msg, and the task to process logs using a regular expression. You’re forced to write a LogQL selector string like this:
selector = "{service_name =~ \"some.*_api\"} |~ \"\\\\[(?P<level>\\S+)\\\\] - (?P<message>.*)\""
# or
selector = `{service_name =~ "some.*_api"} |~ "\\[(?P<level>\\S+)\\] - (?P<message>.*)"`
What a pain!4 In Rust you could easily express this string literal without this mess:
let selector = r##"{service_name =~ "some.*_api"} |~ `\[(?P<level>\S+)\] - (?P<message>.*)`"##;
This is so much nicer! In my opinion, the lack of flexible delimiters is one of the most painful aspects of Go.
Conclusion
Flexible delimiters shine when embedding DSLs or regexes, where escaping otherwise obscures meaning. They improve readability, reduce errors, and save debugging time. All modern programming languages should support them.
-
See the official RFC ↩︎
-
Via
Zig’s multiline string literals ↩︎ -
Goraw literals make little difference here, because you cannot nest raw literals. In this example, you need to escape twice forGo, and once forRegex(Alloy (Go) → Loki (Go) → Regex). If you were toPOSTthis to theLogQL-capable API directly, you would writeselector = "{service_name =~ "some.*_api"} |~ `\[(?P<level>\S+)\] - (?P<message>.*)`, but you cannot easily express this string inGowithout using string concatenation or the hack above. That’s ugly to write and difficult to read, and this is the entire point of this blog post. ↩︎