Extracting verified C++ from the Rocq theorem prover at Bloomberg

https://bloomberg.github.io/crane/

129•clarus•1mo ago

Comments

clarus•1mo ago

A new extraction system from Rocq to functional-style, memory-safe, thread-safe, readable, valid, performant, and modern C++.

Interestingly, this can be integrated into production system to quickly formally verify critical components while being fully compatible with the existing Bloomberg's C++ codebase.

InkCanon•1mo ago

Would be interesting to see how performant it is (or how easily you can write performant code).

seeknotfind•1mo ago

From tests/basics/levenshtein/levenshtein.cpp:

    struct Ascii {
      std::shared_ptr<Bool0::bool0> _a0;
      std::shared_ptr<Bool0::bool0> _a1;
      std::shared_ptr<Bool0::bool0> _a2;
      std::shared_ptr<Bool0::bool0> _a3;
      std::shared_ptr<Bool0::bool0> _a4;
      std::shared_ptr<Bool0::bool0> _a5;
      std::shared_ptr<Bool0::bool0> _a6;
      std::shared_ptr<Bool0::bool0> _a7;
    };

This is ... okay, if you like formal systems, but I wouldn't call it performant. Depending on what you are doing, this might be performant. It might be performant compared to other formally verified alternatives. It's certainly a lot nicer than trying to verify something already written in C++, which is just messy.

From theories/Mapping/NatIntStd.v:

    - Since [int] is bounded while [nat] is (theoretically) infinite,
    you have to make sure by yourself that your program will not
    manipulate numbers greater than [max_int]. Otherwise you should
    consider the translation of [nat] into [big_int].

One of the things formal verification people complain about is that ARM doesn't have a standard memory model, or CPU cache coherence is hard to model. I don't think that's what this project is about. This project is having basically provable code. They also say this in their wiki:

https://github.com/bloomberg/crane/wiki/Design-Principles#4-...

> Crane deliberately does not start from a fully verified compiler pipeline in the style of CompCert.

What this means is that you can formalize things, and you can have assurances, but then sometimes things may still break in weird ways if you do weird things? Well, that happens no matter what you do. This is a noble effort bridging two worlds. It's cool. It's refreshing to see a "simpler" approach. Get some of the benefits of formal verification without all the hassle.

joomy•1mo ago

Hi, I'm one of Crane's developers. You can map Rocq `bool`s to C++ `bool`, Rocq strings to C++ `std::string`s, etc. You just have to manually import the mapping module: https://github.com/bloomberg/crane/blob/6a256694460c0f895c27...

The output you posted is from an example that we missed importing. It's also one of the tests that do not yet pass. But then again, in the readme, we are upfront with these issues:

> Crane is under active development. While many features are functional, parts of the extraction pipeline are still experimental, and you may encounter incomplete features or unexpected behavior. Please report issues on the GitHub tracker.

I should also note, mapping Rocq types to ideal C++ types is only one part of it. There are still efficiency concerns with recursive functions, smart pointers, etc. This is an active research project, and we have plans to tackle these problems: for recursion: try CPS + defunctionalization + convert to loops, for smart pointers: trying what Lean does (https://dl.acm.org/doi/10.1145/3412932.3412935), etc.

seeknotfind•1mo ago

Thanks, yeah, I did see some of the pure C++ types getting converted. Though it makes a lot of sense why you had to go with shared_ptr for generic inductive types.

Have you considered combinatorial testing? Test code generation for each sample program, for each set of mappings, and ensure they all have the same behavior. If you look at the relative size or performance, it could allow you to automatically discover this issue. Also, allocation counting.

Hey also sucks you are not in SF. I'm looking for people into formalization in the area, but I haven't found any yet

joomy•1mo ago

> Have you considered combinatorial testing?

Our plan was to do random Rocq program generation and differential testing of Crane extracted code versus other extraction methods and even CertiCoq. But fixing a program and trying different mappings would certainly be a useful tool for any dev who would like to use our tool, and we should put it on our roadmap.

seeknotfind•1mo ago

Very cool. Can't wait to see what's next with this project! Congrats on the huge scale of tests/examples as well.

csto12•1mo ago

Are LLMs part of your development workflow for something like this? If they are, is it through something like Claude Code or something else?

throw567643u8•1mo ago

Where is this team based? Was curious if it was the London office.

joomy•1mo ago

We're based in NYC. The Infrastructure and Security Research team in the CTO Office, in particular.

And we are looking for senior researchers to join us, see https://x.com/jvanegue/status/2004593740472807498

zozbot234•1mo ago

Why does it have to be C++? Can the extraction strategy be ported to Rust? Rust is just getting a lot more attention from formal methods folks in general, and has good basic interop with C.

joomy•1mo ago

We do C++ only because C++ is the primary programming language at Bloomberg, and we aim to generate verified libraries that interact easily with the existing code. More about our design choices can be found here: https://bloomberg.github.io/crane/papers/crane-rocqpl26.pdf

bluGill•1mo ago

I have 10s of millions of lines of C++. It cost nearly a billion dollars to write it, starting before Rust existed. Rewriting in rust would cost more (inflation more than eats up any productivity gains - if we were to rewrite we would fix architectural mistakes we now know we made so a of this wouldn't be a straight rewrite slightly increasing costs, but safe rust wouldn't even be possible with some things anyway)

zozbot234•1mo ago

Cutting edge AI agents would eat 10 MLOC for breakfast. That's a trivial workload, especially for a rewrite that's not intended to involve any new semantics.

bluGill•1mo ago

60% of the effort is testing to ensure it works correctly.

zozbot234•1mo ago

AI agents test their code too, that's how they ensure that they've got the right solution at the end of the day. With an existing implementation in C++ the task would be incredibly easy for them.

bluGill•1mo ago

You have far too much trust in automated tests.

erichocean•1mo ago

Getting the AI to work with Rocq is a useful goal, Lean has been useful so far.

benreesman•1mo ago

I mostly write lean4 now and emit proof-carrying System F Omega via rfl. It's the right level of abstraction when the droids have been pinned to theory laden symbolisms. It's also just pleasant to use.

zero-sharp•1mo ago

If I understand this correctly, it translates Rocq to C++? Took me several minutes to even understand what this is. Why is it called an extraction system? Who is this for?

I'm confused.

edit: I had to dig into the author's publication list:

https://joomy.korkutblech.com/papers/crane-rocqpl26.pdf

Testing remains a fundamental practice for building confidence in software, but it can only establish correctness over a finite set of inputs. It cannot rule out bugs across all possible executions. To obtain stronger guarantees, we turn to formal verification, and in particular to certified programming techniques that allow us to de- velop programs alongside mathematical proofs of their correctness. However, there is a significant gap between the languages used to write certified programs and those relied upon in production systems. Bridging this gap is crucial for bringing the benefits of formal verification into real-world software systems.

cobbal•1mo ago

That's essentially correct. Extraction is a term in roqc. A rocq program contains both a computational part, and proofs about that computation, all mixed together in the type system. Extraction is the automated process of discarding the proofs and writing out the computational component to a more conventional (and probably more efficient) programming language.

The original extractor was to ocaml, and this is a new extractor to c++.

joomy•1mo ago

Just like JavaScript folks like calling their compilers "transpiler", proof assistants folks like calling their compilers "extraction". Essentially it's a compiler from a high-level language to a slightly lower-level, but still reasonably high-level language.

GregarianChild•1mo ago

I would phrase it a little different.

Simplifying a bit, a compiler tr(.) translates from a source language L1 to a target language L2 such that

    semantics(P) == semantics(tr(P))

for all programs in L1. In contrast, and again simplifying a bit, extraction extr(.) assumes not only language L1 and L2 as above, but, at least conceptually, also corresponding specification languages S1 and S2 (aka logics). Whenever P |= phi and extr(P, phi) = (P', phi') then not just

    semantics(P) == semantics(P')

as in compilation, but also

    semantics(phi) = semantics(phi'),

hence P' |= phi'.

I say "at least conceptually" above, because this specificatyion is often not lowered into a different logical formalism. Instead it is implied / assumed that if the extraction mechanism was correct, then the specification could also be lowered ...

joomy•1mo ago

I'm not entirely sure I fully agree with this definition; it seems somewhat arbitrary to me. Where is this definition from?

My usual intuition is whether the generated code at the end needs a complicated runtime to replicate the source language's semantics. In Crane, we avoid that requirement with smart pointers, for example.

GregarianChild•1mo ago

This definition is my potentially flawed attempt at summarising the essence of what program extraction is intended to do (however imperfect in practise).

I think extraction goes beyond 'mere' compilation. Otherwise we did not need to program inside an ITP. I do agree that the state-of-the-art does not really full reach this platonic ideal

GregarianChild•1mo ago

I have another question, the abstract of your paper says that you "provide concurrency primitives in Rocq". But this is not really explained in the text. What are those "concurrency primitives"?

joomy•1mo ago

We mean Haskell-style software transactional memory (STM). We call it a primitive because it is not defined in Rocq itself; instead, it is only exposed to the Rocq programmer through an interface.

GregarianChild•1mo ago

Since the point of program extraction from a prover is correctness, I wonder what kind of assertions you prove for STM in Rocq.

mzweav•1mo ago

I'm the other dev of Crane. Our current plan is to use BRiCk (https://skylabsai.github.io/BRiCk/index.html) to directly verify that the C++ implementation our STM primitives are extracted to matches the functional specification of STM. Having done that, we can then axiomatize the functional specification over our monadic, interaction tree interface and reason directly over the functional code in Rocq without needing to worry about the gritty details of the C++ interpretation.

GregarianChild•1mo ago

Thanks. I hope you publish this.

I imagine https://github.com/bloomberg/crane/blob/main/theories/Monads... is the functional specification of STM. I see that you use ITrees. WHat's the reason for not using Choice Trees that tend to be easier for handling non-determinism?

joomy•1mo ago

Our 2 page extended abstract was more like a preannouncement. We hope to have a draft of the full paper by the end of the year.

And we're not opposed to choice trees. I personally am not too familiar with them but there's time to catch up on literature. :)

GregarianChild•1mo ago

I'm not an expert in this field, but the way I understand it is that Choice Trees extend the ITree signature by adding a choice operator. Some variant of this:

ITrees:

    CoInductive itree (E : Type -> Type) (R : Type) : Type :=
    | Ret (r : R)                                                                                                                                                                                                         
    | Tau (t : itree E R)                                                                                                                                                                                                 
    | Vis {T : Type} (e : E T) (k : T -> itree E R)

ChoiceTrees:

    CoInductive ctree (E : Type -> Type) (C : Type -> Type) (R : Type) : Type :=
    | Ret (r : R)                                                                                                                                                                                                         
    | Tau (t : ctree E C R)                                                                                                                                                                                               
    | Vis {T : Type} (e : E T) (k : T -> ctree E C R)                                                                                                                                                                     
    | Choice {T : Type} (c : C T) (k : T -> ctree E C R)

One can see "Choice" constructor as modelling internal non-determinism, complementing the external non-determinism that ITrees already allow with "Vis" and that arises from interaction with the environment. (Process calculi like CCS, CSP and Pi, as well as session types and linear logic also make this distinction).

mzweav•1mo ago

Ooooh! Those indeed look fun! :)

GregarianChild•1mo ago

There are some issues arising from size inconsistencies (AKA Cantor's Paradox) if / when you try to fit the representation of all internal choices (this could be infinite) into a small universe of a theorem prover's inductive types. The ChoiceTree paper solves this with a specific encoding. I'm currently wondering how to port this trick from COq/Rocq to Lean4.

aleksejs•1mo ago

The linked website and repository do not refer to the outputs as "verified C++". The use of that term in the submission title here seems misleading, and the Design Principles [1] document clarifies it is only the source (Rocq) programs that are formally verified. It seems far from obvious that the complex and ad-hoc syntactic transformations involved in translating them to C++ preserve the validity of the source proofs.

[1] https://github.com/bloomberg/crane/wiki/Design-Principles

riedel•1mo ago

Well the title of the paper is >Crane Lowers Rocq Safely into C++

So 'safely' implies somehow that they care about not destroying guarantees during their transformation. To me as a layperson (I studied compiler design and formal verification some.long time ago, but have little to zero experience) it seems at easier to write a set of correct transformations then to formalize arbitrary C++ code.

aleksejs•1mo ago

How do you even begin to define what correctness means for the transformations if you have no formalized model of the thing you're transforming into?

joomy•1mo ago

This is another reason we are being careful with the correctness claim. The closest project I know right now that comes close to a formalized model of C++ is the BRiCk project:

https://skylabsai.github.io/BRiCk/index.html

https://github.com/SkyLabsAI/BRiCk

joomy•1mo ago

Yes, we were careful not to call it that. I still don't mind calling our programs verified, since they are verified in Rocq and we do our best to preserve the semantics of them. Right now the only measure we have is testing a small set of programs and also carefully picking a subset of C++ that we trust. Our future plan is to generate random Rocq programs, extract them via Crane, and compare the output to the outputs of extraction to OCaml, and even CertiCoq, which is a verified compiler from Rocq to C, (mostly) proven correct with respect to CompCert semantics.

Java framework for reliable, contract and graph based LLM execution

Neocaml – Rubocop Creator's New OCaml Mode for Emacs

Are seed oils bad for your health?

Show HN: Smart card eID driver written in Zig

The hard problem of AI therapy

Trump Orders Government to Stop Using Anthropic After Pentagon Standoff

Does overwork make agents Marxist?

Refactoring Is for Humans

Federal Government to restrict use of Anthropic

GLP-1 and Prior Major Adverse Limb Events in Patients with Diabetes

Show HN: Agoragentic – Agent-to-Agent Marketplace for LangChain, CrewAI and MCP

Show HN: WhenItHappens–family resource after traumatic death

Trump directs federal agencies to cease use of Anthropic

Trump Will End Government Use of Anthropic's AI Models

The Death of Spotify: Why Streaming Is Minutes Away from Being Obsolete

The Death of the Subconscious and the Birth of the Subconsciousness

Show HN: Gace AI – A zero-config platform to build and host AI plugins for free

USA to cut Anthropic from government contracts in six months

Heart attack deaths rose between 2011 and 2022 among adults younger than age 55

Ask HN: What's the best engineering interview process?

Relaxation trend: customers can meditate or snooze in open or closed casket

Massachusetts State Police are on a drone surveillance shopping spree

Trump Responds to Anthropic

LLM-Based Evolution as a Universal Optimizer

Trump Orders US Agencies to Drop Anthropic After Pentagon Feud

Netflix Declines to Raise Offer for Warner Bros

Show HN: I Built a $1 Escalating Internet Billboard – Called Space

Show HN: I vibe coded a DAW for the terminal. how'd I do?

How to Run a One Trillion-Parameter LLM Locally: AMD Ryzen AI Max+ Cluster Guide

It's Time for LLM Connection Strings

Java framework for reliable, contract and graph based LLM execution

Neocaml – Rubocop Creator's New OCaml Mode for Emacs

Are seed oils bad for your health?

Show HN: Smart card eID driver written in Zig

The hard problem of AI therapy

Trump Orders Government to Stop Using Anthropic After Pentagon Standoff

Does overwork make agents Marxist?

Refactoring Is for Humans

Federal Government to restrict use of Anthropic

GLP-1 and Prior Major Adverse Limb Events in Patients with Diabetes

Show HN: Agoragentic – Agent-to-Agent Marketplace for LangChain, CrewAI and MCP

Show HN: WhenItHappens–family resource after traumatic death

Trump directs federal agencies to cease use of Anthropic

Trump Will End Government Use of Anthropic's AI Models

The Death of Spotify: Why Streaming Is Minutes Away from Being Obsolete

The Death of the Subconscious and the Birth of the Subconsciousness

Show HN: Gace AI – A zero-config platform to build and host AI plugins for free

USA to cut Anthropic from government contracts in six months

Heart attack deaths rose between 2011 and 2022 among adults younger than age 55

Ask HN: What's the best engineering interview process?

Relaxation trend: customers can meditate or snooze in open or closed casket

Massachusetts State Police are on a drone surveillance shopping spree

Trump Responds to Anthropic

LLM-Based Evolution as a Universal Optimizer

Trump Orders US Agencies to Drop Anthropic After Pentagon Feud

Netflix Declines to Raise Offer for Warner Bros

Show HN: I Built a $1 Escalating Internet Billboard – Called Space

Show HN: I vibe coded a DAW for the terminal. how'd I do?

How to Run a One Trillion-Parameter LLM Locally: AMD Ryzen AI Max+ Cluster Guide

It's Time for LLM Connection Strings

Extracting verified C++ from the Rocq theorem prover at Bloomberg

Comments