frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Semantic Line Breaks

https://sembr.org
30•Bogdanp•3d ago

Comments

gorgoiler•5h ago
Prior art on writing line oriented prose comes from one B. Kernighan, no less! Via this blog post:

https://rhodesmill.org/brandon/2012/one-sentence-per-line/

> Start each sentence on a new line. Make lines short, and break lines at natural places, such as after commas and semicolons, rather than randomly. Since most people change documents by rewriting phrases and adding, deleting and rearranging sentences, these precautions simplify any editing you have to do later.

— Brian Kernighan, 1974

gregabbott•3h ago
Related HN Thread: https://news.ycombinator.com/item?id=4642395
chrismorgan•5h ago
> A semantic line break SHOULD occur after an […] em dash (—).

I agree with this, however it means that no existing markup language supports semantic line breaks, because every last one of them just turns the break into a space—and em dashes are, in most locales, not to be surrounded by a space. Consequently, you’ll end up with a stray space if you do this.

My irritation at being unable to break after an em dash (which I want to do quite frequently) was one of the things that headed me down the path of designing my own lightweight markup language (LML), to fix this and other problems I observe with existing LMLs. I’ve been using it for all my personal writing for something like four years now (though a a fair bit has changed since then), and I expect to finally have a functioning parser before the end of this year.

One of the other fun complications of this kind of line break in source code is languages that don’t have a word divider—inserting a space at all is incorrect in them.

CSS presently just leaves such decisions UA-defined <https://drafts.csswg.org/css-text-4/#line-break-transform>:

> any remaining segment break is either transformed into a space (U+0020) or removed depending on the context before and after the break. The rules for this operation are UA-defined in this level.

My LML currently turns segment breaks into a space unless the line ends with an en or em dash, unless there’s a colon or a space before that. I haven’t got anything in place for languages with no word separator yet, but it is unusually well-suited to such languages.

photon_garden•3h ago
More folks should define their own lightweight markup languages! It’s fun and makes your writing and notes feel more like your own.

I created a convention for defining sub-notes (with frontmatter) in a Markdown note and have found it really helpful over the past few years.

vthriller•2h ago
> em dashes are, in most locales, not to be surrounded by a space

This is definitely not the case for at least French and Russian, which means markup renderers now have to guess text language or force authors to declare such in some metadata header. And it gets even more complicated with inclusion of block quotes in different languages.

chrismorgan•2h ago
It’s not hard and doesn’t need language awareness; I described how to detect it: if there’s no space before an end-of-line em dash, suppress the segment-break-replacing space.
account42•4h ago
The problem is that this makes having line breaks that are not paragraph breaks in the output much more awkward and I think those are much more important than line breaks that are only there in the source.

This is especially true for Markdown which is supposed to be a pretty rendering of conventions that were already common in text only communication so it's weird when explicitly entered line breaks are ignored in the output.

chrismorgan•4h ago
The significant majority of markup languages essentially treat a single line breaks as a space. HTML, Markdown, et cetera. In lightweight markup languages, you normally need a blank line (i.e. two line breaks) to signify a paragraph break.

GitHub issues and discussions are an outlier in treating them as hard single line breaks (which are not paragraph breaks).

Most plain-text communication used to use line wrapping, often not supporting lines above, say, 100 characters.

Just like typeset prose uses wrapping, because your paper isn’t infinitely wide.

3036e4•1h ago
Good thing about Markdown is that the lack of a proper spec means you can pick one you like (when possible). Pandoc for instance treats input Markdown line-breaks in a sane way, allowing semantic breaks to not affect the output.
dorianmariecom•4h ago
i thought this was for ruby and javascript and this would be really cool.

automated formatting including newlines, would be great.

eviks•3h ago
> Without any line breaks at all, this paragraph appears in source as a long, continuous line of text

Of course it doesn't because

> (which may be automatically wrapped at a fixed column length, depending on your editor settings):

Indeed, are you short on apps that support this ancient text formatting feature?

> Adding a line break after each sentence makes it easier to understand the shape and structure of the source text

Nope again, visually you've just wasted my devices width or overestimated my smartphone's width and I get exactly the same issue you've just complained about: a single sentence that doesn't fit.

Semantically, what you're looking for already exists and is called a paragraph. A sentence has a different meaning, which you break by line breaking after every single one. It kills the structure, not "makes it easier to understand the shape and structure of the source text" (also, bullet points exist)

PS By the way, why deprive readers of extra clarity offered by this formatting?

> We can further clarify the source text by adding a line break after the clause “with reason and conscience”. This helps to distinguish between the “and” used as a coordinating conjunction between “reason and conscience” and the “and” used as a subordinating conjunction with the clause

dkh•2h ago
I think you might be misunderstanding. The semantic line breaks described here are not shown to readers. They are visible only to the person writing/editing the text, as a tool for their own use. If you aren't someone who finds a tool like this useful for your own writing, then no worries! Nobody has been harmed by this existing but not being used. It has no effect on the result.

While I never knew there was a name for this, I naturally do something very similar when writing, keeping thoughts separated by at least a line or two, even if I imagine they'll be in the same paragraph in the end result, just so I have a visual sense of where my different thoughts are and how long they are.

eviks•2h ago
> are not shown to readers.

Sure they are, though the spec hides some readers behind other names like "editors, and other collaborators"

But also, have you never read the plain text / source of some markdown/other markup language written by someone else? Readme.md in its raw form?

And the spec explicitly applies to plain text, so it's self-contradictory as "the final rendered output" of plain text is... itself.

tpoacher•2h ago
There is a very good technical argument for NOT using "semantic" line breaks when editing markup source code, especially of the "hardwrap" variety, and that is the ability to easily diff two versions of the same document, e.g. when comparing latex git commits.

Anything that reorganises the sentence around for the sake of maintaining justification, completely destroys any meaningful diff from taking place.

And ideally your editor should support both hard and soft wrapping, so that aesthetics of wrapping shouldn't be a big issue.

And I say this as a fan of hardwrapping text.

chrismorgan•2h ago
I think you’ve got things back to front. Semantic line breaks improves diffing.
anentropic•1h ago
I don't get it.

TBH most of the time I find markdown's collapsing of whitespace annoying - if you want a 'visual' line-break you have to add unnatural double space at the end of preceding line. And even this is renderer dependent, I don't think is part of the spec (?) so some renderers don't respect it (and IIRC GitHub comments renderer does't need it, i.e. doesn't do semantic line breaks)

Another pet hate is text editors which auto-convert double space into ". " - I find this even cropping up in IDEs now, so you try to add an end of line comment "...] # here" and it turns into "...]. # here". Awful

chrismorgan•1h ago
> if you want a 'visual' line-break you have to add unnatural double space at the end of preceding line.

That’s just a bad syntax choice on Gruber’s part. CommonMark adds trailing backslash as an alternative, so that will work in most places these days.

> And even this is renderer dependent, I don't think is part of the spec (?)

Yes it is. Quoting https://daringfireball.net/projects/markdown/syntax: “When you do want to insert a <br /> break tag using Markdown, you end a line with two or more spaces, then type return.”

> IIRC GitHub comments renderer does't need it

Yes, GitHub decided on a wilful violation of Markdown for issues and discussions.

> text editors which auto-convert double space into ". "

I have seen that as a feature on Android keyboards, but I would be very much surprised to find it in non-keyboard software.

riffraff•1h ago
this seems to consider "text being read after formatting" and "text being read before formatting" as different things.

Which I guess, if you're the sole author of the text might be true.

But in my experience most text that gets rendered is also read and edited by multiple people in its source form, so why wouldn't you want to make source just as easy to read?

pabs3•1h ago
Wonder if any linters know about this convention.
layer8•46m ago
The main reason I use semantic line breaks, not explicitly mentioned in this article, is that it minimizes reformatting when editing. Only the subclause being edited is reformatted, while the rest of the paragraph remains as-is. This also minimizes the changes in line-oriented diffs.

While one could rely on automated line-wrapping instead of using hard line breaks that require reformatting, it isn’t usefully available in all environments, in particular for indented paragraphs and when having elements like ASCII art or code that shouldn’t be word-wrapped, and it makes plain-text diffs larger than necessary when whole paragraphs are on a single source line.

admko•45m ago
I made a command-line tool [0] powered by Transformer models that performs semantic linebreaks to breaks lines in a text file at semantic boundaries. It supports multiple file types including LaTeX, Markdown, and plain text, with automatic file type detection.

[0]: https://github.com/admk/sembr

ivan_ah•11m ago
The article mentions the git diffing command `git diff --word-diff`, which is cool, but I find an even better version to be:

   git diff --color-words
which shows words removed in red, and words added in blue. The output produced is similar to `latexdiff` in case you're familiar.
jsdalton•4m ago
I’ve often thought this would be useful for version control and change review, since it allows diffs to be a lot less noisy. I’m imagining how much easier it would be to review a PR with significant README edits if the file was already structured with semantic line breaks.

I’ve previously had the above thought and applied it to the end of sentences, but the idea of introducing them at the level of semantic thought had not occurred to me. But if this is where we’re going I’d start to wish for indentation possibilities. I’ve do this frequently with SQL statements, introducing both line breaks and indentations to provide a visual structure that mimics the semantic structure of clauses and the details they contain.

Made for People, Not Cars: Reclaiming European Cities

https://www.greeneuropeanjournal.eu/made-for-people-not-cars-reclaiming-european-cities/
143•robtherobber•2h ago•60 comments

Supabase OrioleDB Patent: now freely available to the Postgres community

https://supabase.com/blog/orioledb-patent-free
64•tosh•1h ago•27 comments

I replaced Animal Crossing's dialogue with a live LLM by hacking GameCube memory

https://joshfonseca.com/blogs/animal-crossing-llm
537•vuciv•9h ago•116 comments

PKM apps need to get better at resurfacing information

https://ankursethi.com/blog/pkm-apps-need-to-get-better-at-resurfacing-information/
14•GeneralMaximus•3d ago•6 comments

iPhone Air

https://www.apple.com/newsroom/2025/09/introducing-iphone-air-a-powerful-new-iphone-with-a-breakt...
780•excerionsforte•18h ago•1597 comments

Knowledge and Memory

https://www.robinsloan.com/lab/knowledge-and-memory/
34•zdw•3d ago•13 comments

Infracost (YC W21) Is Hiring First Product Manager to Shift FinOps Left

https://www.ycombinator.com/companies/infracost/jobs/ukwJ299-senior-product-manager
1•akh•44m ago

E-paper display reaches the realm of LCD screens

https://spectrum.ieee.org/e-paper-display-modos
474•rbanffy•18h ago•149 comments

NASA finds Titan's lakes may be creating vesicles with primitive cell walls

https://www.sciencedaily.com/releases/2025/08/250831112449.htm
176•Gaishan•12h ago•38 comments

Claude now has access to a server-side container environment

https://www.anthropic.com/news/create-files
577•meetpateltech•22h ago•307 comments

Children and young people's reading in 2025

https://literacytrust.org.uk/research-services/research-reports/children-and-young-peoples-readin...
37•GeoAtreides•5h ago•21 comments

US High school students' scores fall in reading and math

https://apnews.com/article/naep-reading-math-scores-12th-grade-c18d6e3fbc125f12948cc70cb85a520a
395•bikenaga•21h ago•660 comments

We all dodged a bullet

https://xeiaso.net/notes/2025/we-dodged-a-bullet/
741•WhyNotHugo•21h ago•419 comments

All clickwheel iPod games have now been preserved for posterity

https://arstechnica.com/gaming/2025/09/all-54-lost-clickwheel-ipod-games-have-now-been-preserved-...
138•CharlesW•1d ago•35 comments

Axial twist theory

https://en.wikipedia.org/wiki/Axial_twist_theory
156•lordnacho•3d ago•39 comments

R-Zero: Self-Evolving Reasoning LLM from Zero Data

https://arxiv.org/abs/2508.05004
61•lawrenceyan•10h ago•26 comments

YouTube is a mysterious monopoly

https://anderegg.ca/2025/09/08/youtube-is-a-mysterious-monopoly
276•geerlingguy•1d ago•363 comments

Hypervisor in 1k Lines

https://1000hv.seiya.me/en
97•lioeters•13h ago•7 comments

Semantic Line Breaks

https://sembr.org
30•Bogdanp•3d ago•23 comments

Memory Integrity Enforcement

https://security.apple.com/blog/memory-integrity-enforcement/
422•circuit•18h ago•199 comments

Rendering flame fractals with a compute shader

https://wrighter.xyz/blog/2023_08_17_flame_fractals_in_comp_shader
4•ibobev•2d ago•0 comments

Show HN: Bottlefire – Build single-executable microVMs from Docker images

https://bottlefire.dev/
130•losfair•2d ago•18 comments

Tomorrow's emoji today: Unicode 17.0

https://jenniferdaniel.substack.com/p/tomorrows-emoji-today-unicode-170
169•ChrisArchitect•18h ago•283 comments

Building a DOOM-like multiplayer shooter in pure SQL

https://cedardb.com/blog/doomql/
203•lvogel•21h ago•35 comments

A new experimental Go API for JSON

https://go.dev/blog/jsonv2-exp
234•darccio•21h ago•81 comments

Immunotherapy drug clinical trial results: half of tumors shrink or disappear

https://www.rockefeller.edu/news/38120-immunotherapy-drug-eliminates-aggressive-cancers-in-clinic...
415•marc__1•15h ago•83 comments

An attacker’s blunder gave us a look into their operations

https://www.huntress.com/blog/rare-look-inside-attacker-operation
167•mellosouls•21h ago•93 comments

Microsoft is officially sending employees back to the office

https://www.businessinsider.com/microsoft-send-employees-back-to-office-rto-remote-work-2025-9
375•alloyed•20h ago•761 comments

Interesting PEZY-SC4s

https://chipsandcheese.com/p/pezy-sc4s-at-hot-chips-2025
15•christkv•3d ago•1 comments

Show HN: Downloading a folder from a repo using rust

https://github.com/zikani03/git-down
8•sonderotis•3d ago•14 comments