* Hyphens connect things, such as compound words: double-decker, cut-and-dried, 212-555-5555.
* EN dashes make a range between things: Boston–San Francisco flight, 10–20 years: both connect not only the endpoints, but define that all the space between is included. (Compare the last usage with the phone number example under Hyphens.)
* EM dashes break things, such as sentences or thoughts: 'What the—!'; A paragraph should express one idea—but rules are made to be broken.
Unicode has the original ASCII hyphen-minus (U+002d), as well as a dedicated hyphen (U+2010), other functional hyphens such as soft and non-breaking hyphens, and a dedicated minus sign (U+2212), and some variations of minus such as subscript, superscript, etc.
There's also the figure dash "‒" (U+2012), essentally a hyphen-minus that's the same width as numbers and used aesthetically for typsetting, afaik. And don't overlook two-em-dashes "⸺" and three-em-dashes "⸻" and horizontal bars "―", the latter used like quotation marks!
Some style guides recommend "space, en dash, space" for this, and I prefer that myself – mainly because some software doesn't treat em dashes correctly as word separators for double click selection purposes.
For example, I'm pretty sure that at least some Kindle models would highlight both the word before and after the em dash when selecting one of them, which makes using the dictionary very annoying.
(I also now looked up and found out that in Spanish, apparently, you are supposed to put space only on one side of the dash, when used as a direct speech separator.)
That's an interesting practicality but I don't think it's the cause of the rule: The rule probably long predates automated line breaking. Also, I think automatic line breaking will break compound words at the hyphen; it doesn't require spaces (which is also obvious from a software development point of view: the logic is relatively simple either way):
Lorem ipsum dolor sit amet, consectetur adipiscing double-
decker lorem ipsum dolor sit amet, consectetur ...
If you want to not have a line break, you shouldn't rely on arbitrary behavior. You should use non-breaking characters like non-breaking spaces and word joiners.
Why yes, I did list the opposite behavior as an advantage of each. Because that, too, is up to individual preference. :-)
I've definitely noticed the behavior you describe on some layout engines, too, and it's another reason why I personally prefer "foo – bar" style.
I've come to the conclusion it boils down to which style manual one follows. I've taken a careful look at numbers of high-end books which no doubt have been carefully typeset and I've found EM dashes with and without spaces.
It seems there is no definitive rule but I might be wrong.
For what it's worth, I was in the last class in my high school to learn typing on IBM Selectric typewriters. We were taught to type two spaces, two hyphens, then two spaces. Incidentally, we were taught two spaces after periods and colons. To this day, I find it hard to read text that doesn't have proper spacing after periods. (HTML and WYSIWYG word processors handle formatting, but e.g. fixed-font text editors don't)
The old typewriter typefaces were monospaced, ie. every character was the same width, but this is no longer the case. Virtually all typefaces today are proportionally spaced, not monospaced. So it’s redundant to leave extra room after periods.
The first keyboard I used was my dad's typewriter, and I don't recall it having any 'dash' other that the minus sign.
The conversion of '--' to an en dash and '---' to an em dash is done by the TeX compiler, and appears in the rendered file, but I think that most TeX editors don't change the TeX code itself. (This is distinct from XeTeX-based compilers, which can handle non-ASCII Unicode characters like the em dash '—' directly in the source.)
(I think that the article's point is that, in some fonts, -- (two hyphens) is literally the (approximate) size of an em dash, not that it is always understood as meaning an em dash. At least in my font, --- (three hyphens) is far too long to literally look like an em dash:
---
--
—
–
(in order, three hyphens, two hyphens, em dash, en dash).)
(A nice thing in (La)TeX is that one could follow the "two spaces after a full-stop" rule, which then has the advantage of being an explicit marking for sentence boundaries (which your editor might be able to navigate; Emacs has a convention of assuming two spaces after a sentence-ending '.'), but then the TeX typesetting will take care of making it look right. I lost the habit of actually doing this, for better or worse, except when flycheck/checkdoc/package-linter.el makes me do it for docstrings.)
The effect of the double space is, I suspect, a product of the reader's expectations: if you expect it, its absence creates mental work, detracting from readability; if you don't expect it, its presence is what creates mental work.
You can tell when I've edited something on both a phone and a physical keyboard, based on the inconsistent use of spaces.
Hard habit to break. I learned it so long ago too.
Haha I learned to type organically, and it was only in my mid-40s that I retrained myself to type the correct way. It took something like 40 hours of practice on keybr.com before I could get close enough to my regular typing speed, such that I could switch over to the 'correct' method without it impacting my work.Retraining myself to stop doing double-spaces took maybe a week.
The last paragraph of the article also addressed the subjective nature of spacing around the em dash:
> Spacing around an em dash varies. Most newspapers insert a space before and after the dash, and many popular magazines do the same, but most books and journals omit spacing, closing whatever comes before and after the em dash right up next to it.
As far as the selection detail, did you mean that you replace an em dash used like a comma or parenthesis with spaces and an en dash for specific highlight performance issues? Surely the spaces and an em dash would alleviate the selection highlight behavior and not muddy the waters of when to use an em vs. an en dash?
It's funny that they omit to mention the possibility of setting it off with a thin space ' ' or hair space ' ' (those are the thin-space and hair-space Unicode characters, though they show up full width for me), which I thought was preferred typographic practice.
(On Googling, maybe the reason that they don't mention it is that I was imagining it; I can't find any evidence for my belief.)
Interestingly, at least in my browser and grabbing the direct link to the comment with curl, show the bytes as 0x20 for both. Perhaps the comment submission handler, or even the browser, collated your more specific U+2009 (thin) and U+200A (hair) spaces into the regular U+0020 space?
Probably! I think HN strips out emoji; maybe it just takes the safest approach and strips out all non-white-listed Unicode.
<word> <space> <dash> <space> <word>
Outside of journalism, usually there is no padding, only, <word> <dash> <word>
I'm with you: For searches, the spaces make the words easier to parse. Those rules predate computers, I would guess.That one I’d usually parse as a hyphen, as in e.g. well-known. “Word space dash space word” is much clearer, in my view.
> The AP Style Manual, a/the leading source for US journalism
One of the things I can easily get away with by not being a US journalist :)
self-fulfilling
self—fulfilling
One of these looks very, very wrong.
Basically, we didn’t like some things in AP but we wanted to make it easy for journalists to copy/paste.
That’s one thing I really like about English: There’s no central authority decreeing what’s right and what’s wrong top down, and it feels like there is some room for individual preferences and experimentation.
Very refreshing, compared to e.g. German, which has more than one semi-official authority gate keeping “correctness” in speech and writing.
Which one does that? I threw up a little in my mouth and wish to avoid such style guides in the future!
It’s very common outside of America, even in English.
Are there cases where the dedicated hyphen (U+2010) is preferred over the hyphen-minus?
As far as appearance goes, almost all fonts I've looked at make U+2010 identical to U+002D (i.e., they don't put any 'minus' into the 'hyphen-minus'), but a few make U+2010 a smidgeon shorter.
It looks much better though and more visible: −1 vs -1. I wish hyphen was a separate symbol from the ascii start, or that monospace fonts didn't tend to shorten "-" cause it makes little sense in monospace anyway.
— In the context of automatic text processing, it unambiguously indicates the function of a hyphen, as opposed to a minus
— Fonts can choose to make the hyphen-minus a bit wider than a regular hyphen, to accommodate the usage as a minus sign. In that case, U+2010 would be typographically more appropriate for a hyphen, similar to how U+2212 usually is typographically more appropriate for a minus sign.
To me, it feels like it is the same purpose as the EM dashes.
And I discovered the EM with ChatGPT, I've never seen it before.
https://thenarrativearc.org/blog/2020/2/4/epic-grammar-battl...
They're frequently used in skilled and professional grade writing.
Here's an example sentence: Semicolons must have independent clauses—phrases that could form a full sentence on their own—on both sides of them; they are essentially alternatives for periods. Em dashes don't require independent clauses on either side.
In the italicized sentence,
* phrases that could form a full sentence on their own is not an independent clause but is valid between em dashes. on both sides of them, after the em dashes, is also not an independent clause. (The em dashes function like commas or parentheses here.)
* The parts before and after the semicolon are independent clauses. You could replace the semicolon with a period and you'd have perfectly valid grammar. I just chose to connect the two sentences a bit more.
I don't know if you can use em dashes as the parent comment describes, connecting three independent clauses:
* My favorite fruit is peaches—they are very sweet—I eat them all summer.
I think the above is wrong; it should be one of the following:
* My favorite fruit is peaches—they are very sweet—and I eat them all summer.: The last section is a dependent clause made by "and", not an independent clause.
* My favorite fruit is peaches—they are very sweet; I eat them all summer.: One both sides of the semicolon are independent clauses; I could replace the semicolon with a period.
Maybe there are examples I'm not thinking of? I infer that the rule might be that the punctution following the em-dashed clauses should be the punctuation that would have been used without the em-dashed clause, but that's based on very limited evidence.
Semicolons are generally alternatives to periods, when you want more connection between the two sentences. Like periods, semicolons must have two full sentences—that is, what could be full sentences—on either side of them; the potential 'full sentences' are properly called independent clauses. (A dependent clause needs the rest of the sentence to form valid grammar; it can't function on its own. For example, in this paragraph's first sentence, when you want more connection between the two sentences is a dependent clause. Often they follow commas.)
Another use of semicolons is for lists in a paragraph where one of the list items has a comma in it (similar to the parsing problem for CSVs where some records contain commas): I only like wine; beer, but only ales; and orange juice.
Which can be fun when parsing CSV files from various sources. I've hit numbers with U2010 or others where you would expect a hyphen-minus should be. Presumably someone² has copied a negative number from a document where one of the alternate symbols was used, and pasted it into everyone's favourite data-mangler¹ which interpreted it as a string, and so on down the chain.
--------
[1] Excel. Sometimes a joy, sometimes the bane of my existence.
[2] It is surprising, horrifying even, how much manual manipulation of data goes on in banking, where you might naturally assume everything is more automated these days. Sometimes a laborious manual process done regularly is seen as cheaper than paying for it to be automated…
If you practice your skills, you will reap the rewards.
But, no, now it's a problem because the majority of people's experience with writing is graded essays. And because LLMs emulate professionals, it's now a red flag if students write too much like professionals. What a joke.
It's a hard one to answer: We could look at published Emily Dickinson books from the time, but did Dickinson really pay that close attention to or have that much control over the type?
We could look at Dickinson's actual personal documents, but if they were handewritten, distinguishing dashes could be difficult even if there was intention there.
I interpret her marks—
as breathless pauses—
that— having no unicode—
should be given to m—
and space—
This list of authors punctuation quirks is interesting though.
https://lithub.com/the-punctuation-marks-loved-and-hated-by-...
However this is the kind of rule that "existed" for a while and most likely will go away as most people can't be bothered with the difference and it all looks similar anyway
Or maybe who knows, it will keep going on because chatgpt knows it
Re last paragraph: dashes, etc. are confusing for perhaps most of us who aren't, say, typesetters, myself included. I use EM dashes a lot usually without a space between words and sometimes with spaces when I think the typography calls for it—or for extra emphasis.
Essentially, most of us guess the rules and often this doesn't matter much but it can in certain circumstances.
For example, in say machine conversion/transliteration. The ASCII dash is often used as a substitute for Unicode minus sign because it's easy to select [it's my usual practice], and anyway many don't know there is an actual difference. Whilst a human will usually know the difference by its use or context a machine may take the literal interpretation which could lead to say a numerical calculation error.
This problem has annoyed me for a long while. Why is it that wordprocessors and editors do not highlight these characters and query whether the usage is correct? Surely this ought not to be that difficult.
Another example is Roman numerals. The average person will enter say an uppercase 'I' for the Roman numeral one. Here's a typical example which is incorrect:
WWII
Here I entered the normal ASCII 'I' because it was too involved to find the correct Unicode character for Roman numeral one.
I'd like to know what others who are in typography, machine learning etc. think about this, and why WP programs and editors don't have simple ergonomics that allow for easy selection of the correct character.
† On a related matter, you'll note I've used single quotes whereas mmooss uses double quotes. This tell me that mmooss is likely in the US whereas I'm not. Again, this is not really a major problem for humans but it can be in transliteration, etc. Also, it's unclear (at least to me) what the default is for quoting quotes, i.e.: "" versus "' (right, I've refrained from using triple quotes).
Again, this seems country specific with I believe the US favoring double followed by single. Even when these rules are defined do people strictly adhere to them?
One may have a bunch of key ranges each associated with a value or one may have a key that should be "rounded" to the nearest key or retreave the one below or above it.
It feels like something basic enough to have in a language and I found it oddly complicated to write myself. Comparing it with all values doesn't seem like a very good solution.
Not that I know many languages.
ー
Absolutely proper and correct use of em dashes, en dashes, and hyphens is, to me, the most obvious tell of the LLM writer. In fact, I think that you can use it to date internet writing in general. For it seems to me that real em dashes were uncommon pre-2022.
Of course, I guess it's entirely possible—even accounting for OS—that this test remains statistically useful. It makes me kinda sad that my (very much human-generated) writing fails the Turing test....
----
[2] Though having checked just now, the sequences for en-dash and em-dash don't seem to be working. Perhaps one of my custom macros is interfering somehow… (it is behaving overall, ellipsis just worked as did the following diacritic and other symbols: áèîöūñ±⁰¹²∞¡¿‽π⬚). I'll have to poke at it later and see what is ary.
(Windows probably has some way, but those are rarely discoverable.)
1. Install and configure this extra tool, which also by default enables a ton of other things you may not want, and may as well be a third-party tool even though it's technically built by Microsoft
2. Do a Google search and copy-paste (!)
3. Use a keyboard shortcut to bring up a symbol picker, then click on the tab containing the en and em dashes, then click to type them in
I mean, come on.
Released back in 2019 for Windows 10.
-- into —
If OP wrote their post on an iPhone, they would have inadvertently appeared as an LLM by their own test.
It is maddening that the whole world uses typewriter keyboards with some facelift in the era of Unicode and even blasphemous full color emoji font rendering. What has changed in decades? Windows logo key, power keys, media keys, IE and Outlook logo keys — all Microsoft's fancies.
So initially IBM made some ad hoc decisions on what keys would be suitable for a single user office computer (as opposed to data input and admin terminals they had). Then everyone copied that, because sending unexpected scan codes could lead to bad things (random BIOS and program code couldn't care less about your ideas of forward compatibility). Then Windows became the “basic system” installed on most computers. Microsoft really pushed forward the internationalisation at the time, making a lot of national layouts and code pages (sometimes contradicting the national standards, for better or for worse). Then everyone copied what they decided. What's more important, even single byte code pages had the basic typographic symbols, anyone could've been using them for three decades, but they were not added to most physical keyboard layouts.
I wonder if that was because they wanted Word to seem more sophisticated than it was, and to make people think it was a requirement for “proper documents”, or because programmers still treated all non-ASCII symbols as free data markup constants that would “never appear in a regular text”.
(edit: apparently only on Mac, see reply below)
You just type two hyphens (--) and Word will convert it to an em dash.
Typing <word><hyphenminus><hyphenminus><word><space> yields an em dash.
Typing <word><space><hyphenminus><hyphenminus><space><word><space> yields an en dash.
That this has been true for some 3 or 4 decades makes me doubt all the comments that em dashes are a "tell" of LLM authorship. On the other hand, I guess when we confine this possibility to web content, I can see how people haven't used Office for web authoring lately, and whatever they do use (like web-based content management systems) don't tend to have this feature.
More importantly, typing just a single hyphen minus in this constellation triggers the autoreplace, too. (Typing the double hyphen is only necessary without spaces in order to distinguish between an intentional hyphen and an em dash.)
From TFA:
> August 1–August 31
From a top comment:
> Boston–San Francisco flight, 10–20 years
To achieve this using the replacement feature we're talking about would take something like <word><space><hyphenminus><space><word><space><alt+leftarrow><bksp><leftarrow><bksp><alt+rightarrow> which is ridiculous.
In professional typesetting, like a book, I sometimes see spaces flanking an em dash, however.
I almost never want that, and when typing "space, en dash, space", it happens quite easily and is usually impossible to tell visually.
In any case: non-breaking space (or otherwise suppressed linebreak, if you don’t use a space) is the rule.
I also personally prefer en dashes, surrounded by whitespace on both sides, over em dashes. Apparently some WYSIWYG software interprets two hyphens as an em dash, while other will interpret that as an en dash, so I'd rather just use the real thing if possible to avoid the ambiguity.
<Multi_key> <minus> <minus> <period> : "–" U2013 # EN DASH
<Multi_key> <minus> <minus> <minus> : "—" U2014 # EM DASH
More in /usr/share/X11/locale/en_US.UTF-8/Compose
(Side note: GTP says apostrophes should be used for pluralizing only for single letters to avoid confusion, but this seems more readable than "ens and ems" IMO.)
I was able to demonstrate my long use of them, prior to LLMs. And since I write in quarto markdown I don't need keyboard shortcuts.
In your support, though, calling the extension “smartypants” really hints at the target audience :)
I don't doubt there are publishing platforms that do it automatically as well, so I wouldn't count on seeing them as an indicator of generated output, even if it may be processed in some manner.
Money quote:
This issue does indeed have a history of provoking unhinged lunacy.
Serbian and Croatian XKB keyboard layouts have had em- and en-dashes since early 2000s even if they were not standardized: AltGr (right Alt) + hyphen (to the left of right Shift) produces an em-dash, and press Shift on top, and you get an en-dash.
This is how long I've had them easily accessible on any keyboard (I even have them converted to MacOS keyboard layouts for use with Karabiner).
If you use eg. a Japanese IME, you can also get it by typing a normal hyphen and selecting the em dash from the picker.
Imagine being an NPC (a human bot), flattering yourself with the thought that people who understand the language are language bots...
1: https://www.thenationalliteracyinstitute.com/post/literacy-s...
1: https://en.wikipedia.org/wiki/Literacy_in_the_United_States
See, e.g., Boss Szabo's blog: https://unenumerated.blogspot.com/2018/03/the-many-tradition...
Two chained hypens, as was pretty much the norm back then.
And did you just call me an NPC?!? It's not a matter of "understanding the language" at all. It's a matter of convenience and of a sort of evolved convention.
I, of course, used proper dashes in typeset documents, at least after I'd learnt about them in Knuth's The TeXbook. I have found myself occasionally use them in ASCII contexts just as ---. But I've never sought out the proper unicode character.
Compose--- produces —
Compose--. produces –
Lots of other characters like áăǎ°±€ are available through compose: https://whynothugo.nl/journal/2024/07/12/typing-non-english-...
> Absolutely proper and correct use of em dashes, en dashes, and hyphens is, to me, the most obvious tell of the LLM writer.
Or just someone who likes to use the right characters. There was a report a few months back about how writing from autistic kids keeps getting mislabelled as LLM simply because they use the correct specific terms.
Please stop associating being precise with being an LLM.
option-[-] for en dash –
shift-option-[-] for em dash —
I write poems a fair bit and use em dash a lot. (maybe too much and incorrectly)
Not sure when I started; my guess is that I got into the habit of using them in LaTeX when writing my thesis, and then at some point realized that they are easily reachable on standard macOS keyboard layouts (via "option" + "-").
On my Linux laptop, I confess to manually Googling them every time.
Hyphen -: -
En Dash –: alt -
Em Dash —: alt shift -
They have shortcuts for Í, Î, and Ï but not for many commonly used characters like arrows
¹ https://software.sil.org/ukelele/
² https://codeberg.org/datatravelandexperiments/kps-keyboard-l...
sigh
Thanks for the suggestions
You can remap Fn/Globe directly to it if you want. It's also accessible from the Input menu bar item if you show that.
> Both of those open the same GUI which combines emojis, stickers, and unicode symbols—preferring the first two categories over the last. To type out a unicode symbol it takes at least three clicks on top of me starting to type in the name of my symbol
Are you using the expanded Character Viewer window[0], or the default collapsed Emoji & Symbols pane[1]? Because the expanded Character Viewer lets you customise and reorder the categories[2] (though that doesn't affect search), including adding a full Unicode view[3]. And they both default to the search bar when opened (though the Character Viewer opens unfocused for some reason).
[0]: https://imgur.com/hTtrbcA
[1]: https://imgur.com/3L31DQu
Examples:
en and em are on -
Below are maybe Swiss specific?
~ is on N
@ is on G
| and \ and / are on 7
√ is on V
¥ is on Y and € is on E
∑ on W ( ∑ is a rotated W :)
etc.
I appreciate that the designer of the layout clearly attempted to make some kind of mnemonic connection to the degree they could. Makes it easier to discover and remember the key-combos, even without a cheat sheet.
modifiers:
opt-e+letter é (acute/aigu)
opt-`+letter è (grave)
opt-i+letter û (circumflex)
opt-u+letter ü (umlaut)
opt-n+letter ñ (for the mañana)
thank you for teaching me √
For other combos — see /usr/share/X11/locale/en_US.UTF-8/Compose
See also: System Settings > Keyboard > Key Bindings > Position of Compose key
Or, I mean, it does SOMETHING. I've never checked, and just always assumed I was getting the em dash.
I mean if it's an obvious break from their normal style, sure. But by itself? Every time I hear this argument, it just seems like sour grapes from poor writers.
Its not wrong for en-dashes (and en-dash set open—with space on either side—is generally an alternative to an em-dash set closed.) And its not wrong on the trailing side of an em-dash used in dialogue to show an abrupt stop mid-sentence if the stop is followed by a new sentence. And there's a few other particular uses, but, generally, setting an em-dash open is wrong.
> but newspapers often space them.
I've never seen a newspaper set em-dashes open, but I have seen them use en-dashes set open instead of using em-dashes at all. Given the space premium in print newspapers, em-dashes set open, which would consume enormous horizontal space, would, other concerns aside, be an odd choice.
It's not like you can reliably write these consistently by hand either without going over the top in length to make it extremely obvious.
-5--2°C
post-war-pre-digital era
See sections 10-O-15-Q
Try Our New York-London Flight Connection!
post-war - pre-digital era (not a sentence any sane person would use anyway).
See sections 10-O - 15-Q
Try our New York-London flight connection! (no kind of dash clears this one up without fixing capitalisation).
Try Our New York–London Flight Connection.
Or if it was New York:
Try Our New York – London Flight Connection.
Note the additional spaces. Agree on the capitalization though.
I'd wager serious money that if you put that on a sign and surveyed people, at least in the US, they'd all still conclude it is a "New York" to "London" flight.
What's the use of a communication tool, if it doesn't actually communicate anything to real people?
-5—2
That looks like dogshit.
It's a mistake in the first place to decide to use only dashes and no spaces to convey all of this lol
-5 - 2 (Everyone knows a sign has no space - if you are building your sign for idiots try some of these:)
-5 > 2 -5->2 -5 <-> 2 -5 to 2 -5...2 Between -5 and 2
blah blah blah
Rather, seeing too short of a dash is like putting two clashing colors together or wearing two pieces of clothes that don’t match. It just looks instantly off.
It’s just not aesthetically pleasing for me.
And your example shows how you can just use multiple dashes instead of having three different ones.
E.g. some English language rule says that a comma or ending period of a non-quoted sentence goes inside the quotes if there's something quoted at the end of that sentence. That rule feels anti-intellectual to me, as if there's some misunderstanding of how hierarchical placement in one-dimensional space works (since something that's not being quoted is being put inside quotes)
However, em dashes are a different case. The main reason why it's desirable to use em dashes (beside convention) is for clarity of purpose. The hyphen is already a very overloaded character; they're extensively used to denote ranges and link compound words. Importantly, both of those usages do not correspond to pauses in spoken language. If you're voicing a hyphen you're supposed to barrel on through it. An em dash is much closer to a parenthesis, comma, or semicolon. It's a meaningful break in the sentence, in the way that a hyphen isn't.
Now, if it were up to me I'd choose a different character to replace em dashes (maybe underscores), but that's a separate argument.
-proud dash luddite
> Dashes are used inside parentheses, and vice versa, to indicate parenthetical material within parenthetical material. ...
> The bakery’s reputation for scrumptious goods (ambrosial, even—each item was surely fit for gods) spread far and wide.
I wish it was more popular, it neatly indicates meaning so very well.
I agree with you completely.
There are cases when you want to follow certain guidelines, for sure. If you write for a publication that adheres to Meriam-Webster, you'd better stay consistent and figure out the right AltGr code to type the right dashes. However, for the 99.99% of written media today, none of that matters.
"I have too many water in the cup."
"How much people are in attendance?"
These sound obviously incorrect.
This is also true of "less" and "fewer". I use "less" everywhere.
Getting "much" and "many" right is completely different. They mean different things. Confusing them makes you sound stupid. Less vs fewer is the same. It often doesn't matter but in some cases it really grates on the ears (eg "there wasnt much people there" just sounds awful).
Dashes are not in the same category. They are orthographical conventions. They aren't really grammar. They are more like spelling. You can spell things wrong and say it doesn't matter because spelling is arbitrary and you can use the wrong dashes too, but it makes you look either uncaring or ignorant. If you want to give a good first impression, learn the basic conventions of written English and follow them.
i really like using em dashes -- for some reason, it feels "better" in my head than using something like a comma or a semi-colon.
https://en.wikipedia.org/wiki/Dash#Approximating_the_em_dash...
Suit yourself, but if you refuse to learn basic grammar you will be treated like you are stupid and uneducated. Like it or not, presentation matters. Getting the basics right, including things like spelling, grammar, etc, shows a basic attention to detail without which your services will likely do more harm than good.
actually it's "etc."
(I wouldn't usually be a pedant, but if you think the difference between "--" and "—" matters, you should probably try to get the basics right too.)
https://www.merriam-webster.com/dictionary/etc even redirects to the correct URL with a "."
This is all stuff you learn in school. Punctuation isn't obscure or niche. You may not have learnt about semicolons or em dashes in school but you should have and I did. As did anyone that has ever read a novel. There are two semicolons on the first page of the first Harry Potter book, a novel read by approximately every child of my generation. There are loads of examples of the proper use of dashes and other "obscure" punctuation marks in any professionally typeset text.
I was raised and educated in Africa, specifically the GCSE curriculum. I was taught to use etc.
"The em dash is the nineteenth-century standard, still prescribed in many editorial style books, but the em dash is too long for use with the best text faces. Like the oversized space between sentences, it belongs to the padded and corseted aesthetic of Victorian typography.
"Used as a phrase marker – thus – the en dash is set with a normal word space either side."
¹https://archive.org/details/isbn_9780881791327/page/80/mode/...
And I totally agree, space-set en dashes are vastly superior to em. I dislike the way it connects the word more closely to the word in the next clause than the phrase itself.
E.g. He left—no explanation. Vs. He left – no explanation.
To me, left—no feels like a weird gluing together than a separator for a different section.
Personally, I think en dashes are too small and look like a mistaken use of a hyphen. I really only use them in their Chicago Manual of Style recommended uses like date ranges.
But I agree that em dashes without spaces around them look wrong. They glue the adjoining words together when the whole point is that the clause is secondary and should be set aside from the surrounding text.
I ended up using em dashes with a little blob of CSS to put a tiny amount of space on either side.
"Used as a phrase marker—thus—the em dash is set without normal word spaces."
>the em dash is too long for use
above, the em-dash without spaces is smaller, at least in this typeface
I've taken to using dash offsets—just as an aside—in many places were I formerly used parentheses; I find it "less interrupts" the flow of the sentence.
* Use the minus sign /−/ (U+2212) when formatting numbers, because the default hyphen-minus /-/ (U+2D) just looks wrong: "It is −1 °C vs. -1 °C." Moreover, the correct minus has the same width as plus (− vs. +).
* Rare, but use the figure dash /‒/ (U+2012) or figure space / / (U+2007) if you need a placeholder character that is the same width as a single digit. For example, "Guess the PIN: 1‒34."
ChatGPT for example almost always uses them. I'm sure they are more common in academic writing, but its now super common on boards like Reddit.
Before LLMs, I think em-dashes mostly signaled that you read books and paid attention to details, to the extent they signaled anything.
We might even be entering some waves of counter-signaling.
[1] They'll never totally nail all of DFW's mannerisms, though.
The slight ambiguity if you don’t do that now irks me, having seen a way to eliminate it.
:)
When looking at the context of a given text, use of certain words or punctuation, can very well indicate AI use.
The "original" example was delve. There is no doubt that AI (did, or still does) use this word at a significantly higher frequency than the average person. I would say the same about em dashes.
When browsing a Reddit thread about a video game, if you encounter numerous comments written perfectly, especially those containing indicators like em dashes, the word delve, or similar language, it certainly can raise the question: am I genuinely seeing comments from users who write this way in this specific context, or is this content more likely produced by an LLM?
No, I learned about em dashes in school, I just literally don't know how to type them on my keyboard and I'm too stubborn to learn how to.
Only typography nerds and professional printers care about things like these. Popular media, even modern professional media, hasn't been paying all that much attention.
same thing happened with “delve” — these are just words and grammar, people use them
there is no accurate way to tell whether text came out of a neural network or not
Whenever I see that at the start of a paragraph I know that there's an 80% chance it was written by Gemini.
To what extent that distinction matters, I'm not sure.
That's not to say that generated content doesn't use them, just that using them as an indicator might require a bit of nuance based on where you're seeing them.
I wonder if it's a more recent phenomenon.
Eventually, people learn to include them out of habit—especially as most people see them as aesthetically nicer than a simple hyphen (-).
[1] https://www.chicagomanualofstyle.org/qanda/data/faq/topics/H...
Not all though. Many people on HN use em-dashes and other proper punctuation.
If the em dash has spaces around it -- as seen in AP style -- it was probably written by a real human, because that's how it comes out most conveniently on a word processor.
But if the em dash has no spaces around it--Chicago style--there's a good chance you're looking at LLM slop.
Just makes me roll my eyes really seeing a human use an em-dash. We've in the age of informality, and at least for me personally I've definitely filed the em-dash away as "a near guarantee the text was written by a machine". No matter how much and perhaps especially because HN commentators are coming out of the woodworks to insist they've been using it daily for years.
At least we have dedicated O/0, and l/1 keys now. But we still see a lot of "straight" quotes instead of “those smart quotes Microsoft Word likes to generate”. And dashes. Did you know there is a dedicated ellipsis character? This is often set with slightly more space between dots than ..., and it by definition never wraps across a line between those dots. You still see (C) instead of ©.
It is one of those things that doesn’t really matter for readability, but although they can’t necessarily put a finger on why, people may still notice that some documents or pages appear to be set with more care for details than others.
(edit: I guess if you don’t have to search on Google what the hell a ‘Microsoft Word’ is, then you’re officially old)
And the 1 and 8 aren't next to each other anymore, either. (See typewriters from the "18"00s.)
> those smart quotes
Fixing straight quotes is a hard problem[0]. My FOSS text editor, KeenWrite[1], includes my library, KeenQuotes[2], for replacing them at build time. It's not perfect, but can typeset my ~400 page novel without any errors.
> Did you know there is a dedicated ellipsis character?
Yes! Here's where it gets parsed:
https://gitlab.com/DaveJarvis/KeenQuotes/-/blob/main/src/mai...
Then emitted:
https://gitlab.com/DaveJarvis/KeenQuotes/-/blob/main/src/mai...
Then transformed into an HTML entity:
https://gitlab.com/DaveJarvis/KeenQuotes/-/blob/main/src/mai...
When typesetting Markdown, KeenWrite first converts the document to XHTML (i.e., XML), then invokes ConTeXt to convert XML into TeX macros. One of those macros handles the ellipses by converting it to \dots{}:
https://gitlab.com/DaveJarvis/keenwrite-themes/-/blob/main/x...
This renders as the Unicode character in the final document: …
> set with more care for details
Some of us old folks care about these details. ;-)
Who omits the 1 from the second number?! That is aweful!
You write pages 1,003–4, instead of typing out 1,003–1,004 which is just unnecessary.
Works the same with two digits, or even three: pp. 1,899–902.
This is standard practice and arguably clearer.
I've only ever seen it done with page ranges, though. I'm not sure if it's done with year ranges? E.g. 1984–5? Or 1989–92? You work with page ranges constantly in academia, I just don't see year ranges much in any form.
In speech, it's common, and misunderstandings are usually not a problem (if you're not monologuing on a recording) because someone will just ask; but in writing it looks like the range is the wrong way around. Maybe I expect more care in writing because the feedback loop is longer, or maybe it's just habit and I think it's wrong in writing because I never see it?
Quick, tell me how wide this range is, just as an order of magnitude:
285368737954–285368783645
Would be a lot easier if I only included the range at the end which had actually changed, wouldn't it?
That's why it's clearer. Now obviously that was an extreme example, but it's also easier to see at a glance that 1,387–9 is just three pages, as opposed to 1,387–1,389.
That's a change of about 50K, which isn't really that hard to notice.
"285368737954-83645" is... well I have to assume somewhere in the 10-100K range? Hold on a second while I line up the digits again... uh... let me rewrite that to "37,954 - 83,645", okay now I can read it. No, that wasn't any easier. I kept getting lost tracking where in the first number I was leaving off. Much easier to compare 737 vs 783 - digit groupings are really useful!
(I'll agree that 1387-9 is pretty reasonable, it just breaks down the longer the number is. Also, if the page count is important, you can just say "1387-1389 (3 pages)". This feels like the sort of shorthand you used to get on Twitter)
83645 is five digits, so certainly in the ~10,000 range.
Genuinely trying to think of an examples, since e.g. books aren't ever that long and search results don't have that many pages (that you'd all read and refer back to). A salary range, perhaps, can get into the seven digits in extreme cases (not that you care about any individual digit when you make a lifetime's worth of money in a bit more than a year): "Prospective salary is 2'423'000 to 2'432'000" seems to convey the relevant info as well as "Prospective salary is 2'423'000 to 9'000" does (except that I wouldn't understand the latter and ask what this second number means, but that's plausibly attributable to me as an individual not being used to it)
EDIT: I saw your explanation below, and you make a very good point.
Result:
> print pages in range from: 1, 003
> print pages in range to 4
Now have I have two errors to fix: page 1003 to page 1004. Not nice. Who formats like this?!
-------------------
Also, some RPG books or encyclopedias I own have chapter that span like this:
p. 630 to p. 70 (book 2)
To me, now is unclear, is that 70 with a reset page count, or 670 for book 2?
Since I just now learned that a quotation standard somewhere outside Germany exists that omits leading numbers, I now need to manually check where it ends.
TL;DR:
Don't make me think, and allow for automation. So just write on more number.
Yes, every time. The clarity for the reader is more important than the time I save by leaving out '12'.
literally yes
The perfect way to surround with hairspace.
Must be lonely at the top.
No, it is not “politically incorrect” to call people lacking curiosity and/or education like you see them.
No, someone's personal preferences or transitory fashions are not automatically promoted to the holy reference for the whole world.
If the em dash indicates an interruption (not a planned pause) of the actual speech, the em dashes go inside the quotes (often just one, before the closing quote).
If the em dash is the narrator interjecting with additional information, the em dashes go outside the quotes.
Besides this, the question of where to put spaces when multiple forms of punctuation are combined can be quite a complex topic.
So isMorePleasantToRead, is_more_pleasant_to_read or is·more·pleasant·to·read is up to you.
At least from the point of view of digital gymnastic, it’s not really any worst than camel or snake cases, though direct access to dash could be said to give a small facilitation for input in kebab case.
So it really depends on the keyboard layout used (or whatever input device facility is used). What’s you favorite input method lately? Does it really doesn’t provide a convenient way to input more than ASCII visible glyphs?
Plus, let’s be honest, identifiers are generally written in full expanse only once, then autocompletion is going to do it for us. And we all know we spend more time reading identifiers than declaring new ones.
python3 -c "some·identifier = 0; print(some·identifier)"
C echo -e '#include <stdio.h>\nint main() { int some·identifier = 0; printf("%d", some·identifier); return 0; }' | gcc -x c -o temp - && ./temp
C++ echo '#include <iostream>\nint main() { int some·identifier = 0; std::cout << some·identifier; return 0; }' | g++ -x c++ -o temp - && ./temp
Ruby ruby -e 'some·identifier = 0; puts some·identifier'
Javascript node -e 'let some·identifier = 0; console.log(some·identifier);'
Rust echo 'fn main() { let some·identifier = 0; println!("{}", some·identifier); }' > temp.rs && rustc temp.rs && ./temp
Go throw an invalid character U+00B7 '·' in identifierJava throw error: illegal character: '\u00b7'
C# is really annoyed with it apparently:
echo 'using System; class Program { static void Main() { int some·identifier = 0; Console.WriteLine(some·identifier); } }' > Program.cs && mcs Program.cs && mono Program.exe
Program.cs(1,60): error CS1056: Unexpected character `·'
Program.cs(1,60): error CS1525: Unexpected symbol `identifier', expecting `,', `;', or `='
Program.cs(1,99): error CS1056: Unexpected character `·'
Program.cs(1,99): error CS1525: Unexpected symbol `identifier'That’s it for the top in TIOB index I tested in the frame of this message.
MIDDLE DOT is Other_ID_Continue
I know less about the other languages but it wouldn't surprise me if they did similar things.
Writers adores their em dashes. While they can sometimes clarify a concept by adding more context, overusing them can hurt readability. I prefer to read Hemingway-esque sentences that just say what they want to say and end sharply. So that’s how I write too—and sometimes the overuse of em dashes directly conflicts with that, making the content sound as if the author is confused about what they wanted to convey.
FWIW, you can type an em dash on Mac with shift + option + hyphen.
That said, I don't even think you need the [shift] for em dash on Mac – just [option] + [hyphen] works for me.
on macOS:
- - => - (hyphen/minus)
- ⌥ - => – (en dash)
- ⇧ ⌥ - => — (em dash)
There are so many of these convenient typographical shortcuts that a long time ago I made Apple layouts for Windows and Linux.
And many are mnemonic too, like:
- of course ÷ (division) is ⌥ / (slash, which is poor man's division)
- of course ¿ is ⇧ ⌥ / because ⇧ / is ? so logically ⇧ ⌥ / is ⌥ ? which is ¿
- guess what ≤ ≥ ± ≠ are
- ¬ (logical negation) is ⌥ L because it's a L sideways
- £ (pound) is ⌥ 3 because ⇧ 3 is # (octothorpe, abused as sharp or pound - the other kind)
It never occurred to me that doing this correctly might make people think I use LLMs in my writing.
Edit: I'm sure the many typos protect me from that, actually.
How is a literal dictionary making fun of people who "wanna be official about things" lol. That's the entire basis for dictionaries themselves
In this case, they are calling out the prescriptivist definition but are implying that it may be overkill and offering the more commonly used alternative.
Personally, I am fond of using either a hair space or a thin space before and after the em dash. Not a full space!
To explore the various options, I wrote a little program to print the various combinations of dashes and spaces. I think what looks best depends a lot on what typeface you're using. But let's see how they look in the Verdana font used here. You should be able to paste this into your favorite word processor to see it in other fonts:
ASCII 0x2D hyphen-with no spaces
ASCII 0x2D hyphen - with U+200A hair spaces
ASCII 0x2D hyphen - with U+2009 thin spaces
ASCII 0x2D hyphen - with 0x20 full spaces
Unicode U+2010 hyphen‐with no spaces
Unicode U+2010 hyphen ‐ with U+200A hair spaces
Unicode U+2010 hyphen ‐ with U+2009 thin spaces
Unicode U+2010 hyphen ‐ with 0x20 full spaces
Unicode U+2013 en dash–with no spaces
Unicode U+2013 en dash – with U+200A hair spaces
Unicode U+2013 en dash – with U+2009 thin spaces
Unicode U+2013 en dash – with 0x20 full spaces
Unicode U+2014 em dash—with no spaces
Unicode U+2014 em dash — with U+200A hair spaces
Unicode U+2014 em dash — with U+2009 thin spaces
Unicode U+2014 em dash — with 0x20 full spaces
It looks like HN is really mangling this. Hair spaces are rendered wider than thin spaces?
If anyone wants to experiment, here is the Python code:
from dataclasses import dataclass
@dataclass
class Character:
char: str
name: str
DASHES = [
Character( "-", "ASCII 0x2D hyphen" ),
Character( "\u2010", "Unicode U+2010 hyphen" ),
Character( "\u2013", "Unicode U+2013 en dash" ),
Character( "\u2014", "Unicode U+2014 em dash" ),
]
SPACES = [
Character( "", "no" ),
Character( "\u200A", "U+200A hair" ),
Character( "\u2009", "U+2009 thin" ),
Character( "\x20", "0x20 full" ),
]
for dash in DASHES:
for space in SPACES:
print( f"{dash.name}{space.char}{dash.char}{space.char}with {space.name} spaces\n" )
And it shouldn’t be hard for an LLM to learn to use proper symbols when synthesizing content from the everyman. It’s not like it works on the level of literal copy and paste.
Yeah that's a big problem with LLMs at the moment - people are having an argument online, and one of them posts a piece of text from an LLM and says "see, Grok/ChatGPT/Claude/Gemini agrees! You're wrong!", which only reveals how much they did not do their homework and that they're arguing on the basis of hear-say, just like the LLM; the LLM has no mind of its own. If 90% of the internet says A, it will defend A to the death. What's not part of the equation is how much of that 90% has the same source, financed by a certain group.
We have a LOT of work to do soon.
0 0 000048 48 H LATIN CAPITAL LETTER H
1 1 00006F 6F o LATIN SMALL LETTER O
2 2 000077 77 w LATIN SMALL LETTER W
3 3 000020 20 SPACE
4 4 000074 74 t LATIN SMALL LETTER T
5 5 00006F 6F o LATIN SMALL LETTER O
6 6 000020 20 SPACE
7 7 000055 55 U LATIN CAPITAL LETTER U
8 8 000073 73 s LATIN SMALL LETTER S
9 9 000065 65 e LATIN SMALL LETTER E
10 10 000020 20 SPACE
11 11 000045 45 E LATIN CAPITAL LETTER E
12 12 00006D 6D m LATIN SMALL LETTER M
13 13 000020 20 SPACE
14 14 000044 44 D LATIN CAPITAL LETTER D
15 15 000061 61 a LATIN SMALL LETTER A
16 16 000073 73 s LATIN SMALL LETTER S
17 17 000068 68 h LATIN SMALL LETTER H
18 18 000065 65 e LATIN SMALL LETTER E
19 19 000073 73 s LATIN SMALL LETTER S
20 20 000020 20 SPACE
21 21 000028 28 ( LEFT PARENTHESIS
22 22 002013 E2 80 93 – EN DASH
23 25 000029 29 ) RIGHT PARENTHESIS
Ironic.ıt might simpıy not matter though, a miııimeter here and there, ı suppose.
Also Merriam-Webster:
1) they are too hard to type.
2) using them without surrounding thin space or hairspace breaks the horizontal rhythm and draws unnecessary attention to the punctuation; but thin and hair spaces are equally hard to type
3) Most people write markdown with mono space fonts, making these dashes and spaces indistinguishable.
At some point, many things I type into started replacing "--" with an em dash, but my precambrian computer typing muscle memory is fine with "hyphenhyphen" meaning "em dash".
I will admit right here in front of god & everybody that I'm pretty sure I've never typed an en dash at all.
There's room for both: when presentation matters I use them; when it doesn't, I don't.
Do not use the Unicode characters, or people will think you are an AI bot.
"Hello," said John, "how are you today?"
You'd see:
— Hello — said John — how are you today?
Hyphen for hyphen
Option + Hyphen for n-dash
Shift + Option + Hyphen for m-dash
While I'm here, Shift+Return for a soft return (i.e. not a new paragraph.) $ python -m this | grep '--' -
$ python -m this | grep -- -- -
Which is just beautiful(Your example causes the last hyphen to be grepped for, which happens to only match doubled-up ones because single ones don't occur in that text. The quotes/apostrophes do nothing because they're parsed by (ba)sh and so only the hyphens are passed to grep, not the quotes. The last hyphen can be omitted because reading from stdin is the default if neither filenames nor recursion options are passed.)
For vanilla Emacs (without evil-mode), you can always do — "C-x 8 RET EM DASH" or "C-x 8 RET 2014". That's what "M-x describe-char" would tell you.
* em dash: ⌥ + ⇧ + - (alt + shift + hyphen)
* en dash: ⌥ + - (alt + hyphen)
As a result, a hallmark of GPT-generated text is its (over)using of the em dash--I have stopped using it for this reason an just use two hyphens now instead.
It's a bit of a problem that the same character is both a mark of LLMs and skilled writing.
Em dashes allow me to get multiple ideas into a sentence with comparatve ease and have it still make sense. Otherwise I'd have to add additional sentences to a paragraph which itself has issues. With a longer paragraph one has to worry about its readability and comprehensibility, and that means having to restructure it—remove redundancies, etc.—and that takes time.
Good writers can think ahead and do all that restructuring in their heads. When writing about an idea, concept or logical unit thereof they'll write out short, coherent and readable text all in one go, and it will make sense. I only wish I could do that.
As I see it, em dashes are more a crux for bad writers like me (they allow our text to be at least comprehensible).
I learned how to use the em dash properly about 6 months before the release of ChatGPT and then when it was released I realized that it used them all the time. So, to convince people that I both know basic grammar and I am human I started to use "--" instead of "—".
I do like that the em dash is as long as it feels that broken-off thoughts should be
Not everything has to be functional, sometimes things can also just look nice for the sake of it
#-::Send("–") ; Win+- = en-dash
#+-::Send("—") ; Win+SHIFT+- = em-dash
#]::Send("‘")
#+]::Send("’")
#[::Send("“")
#+[::Send("”")
#;::Send("…")
#+>::Send("→")
#+<::Send("←")
#8::Send("•")
#+x::Send("×") ; multiplication symbol
edit...downvoted, why? weird> comma, a colon, or parenthesis
They're all different. There is a difference between clear writing and typesetting. Why mix them up? A narcissism of small differences?
− + minus sign
- + hyphen
– + en dash
— + em dash
−+-+–+—