Cucumber lets you write automated tests in plain language

50•nateb2022•14h ago

Comments

headcanon•12h ago

Interesting this is being brought up, I used to use this back in 2013 when developing Rails apps, I've been thinking about what a good "standard" spec language would look like in a post-LLM world. I wonder if the cucumber syntax could help here?

baal80spam•12h ago

Cucumber was always touted as a DSL for business people.

threetonesun•11h ago

Seems to me most of the appeal of LLMs is the standard spec is just language. The next testing product is take this Jira ticket, make sure the requirements are met. Everything between code and plain language descriptions of what was built is additional complexity no one wants to deal with.

dboreham•12h ago

Cobol reborn.

Guid_NewGuid•12h ago

In my misspent youth I've tried to introduce Cucumber/Gherkin a couple of different times but it's something with a deceptive promise.

The preconditions are that you have domain experts or business people interested and willing to engage in writing or reviewing these tests. Unless you have this and it's something that those people are going to sustain when the going gets tough you're just making writing tests harder for no real benefit.

c-hendricks•12h ago

Ding ding ding. If it's not product / other people writing the specs, your job as a developer is now:

- write specs in a cucumber fashion

- write parsers to go from cucumber to whatever objects / data you expect

- finally write your tests

MoreQARespect•12h ago

The problem isn't really that the domain experts or business people aren't interested in reviewing these tests. The problem is that the cucumber language is just very, very poorly suited for writing anything beyond very basic specifications.

In practice when people try to use it for anything semi-complex they inevitably end up either writing very vague tests or repetitive tests, neither of which are of much interest to stakeholders.

I've used docs generated from hitchstory for having conversations with stakeholders. I've also shared semi-human readable unit and e2e tests - especially when it's, say, JSON snippets in an API with technical stakeholders.

Unfortunately, except for a few domains where specs can be expressed very concisely, cucumber isn't suitable for that.

quietkoala•12h ago

What is it about Cucumber that is so attractive to early in career engineers (myself included)? Is it the naive belief that others, non-engineers even, will actually want to read or write your tests? It's great in theory, but like many ideologies that attract young people, political or otherwise, its promise of egalitarian test authorship rarely holds up in practice. My experience is that it ends up as an extra abstraction layer to maintain.

bluGill•11h ago

I spent a lot of time tryngito figure them out mid carreer before I realized that high priced consultants made a lot of money selling training me on it to management. The same salesmanship worked on me. Perhaps in part because they were also teaching unit testing which did turn out useful - once I got rid of the bad ideas they taught since they didn't know what worked in the real world but had a lot of money to make by sounding confident when asked questions.

QuercusMax•11h ago

I've been involved (mainly as an observer / code reviewer) in a process to go from test cases that are written out as a set of steps and expected results to a using "BDD-style" cucumber/gherkin test descriptions.

Going from "Step 1: Do X. Expected result: Y. Step 2: Do Z. Expected result: Q" to a big long "sentence" describing these steps joined with "and" (without the expected results being linked to each step) is strictly worse in just about every way. The actual automated tests are still being written the same way they always were, the descriptions of them are just WAY harder to understand.

montroser•10h ago

Yeah, it basically just means that now you need both a business person /and/ a dev to hold their hand in order to write tests. Business people have business things to do and don't really want to spend their time writing tests anyway.

dgunay•5h ago

Yeah this was unfortunately my experience too. A lot of head nodding and a token effort up front but they quickly reverted back to playing telephone on Jira tickets. A year or two later we're slowly ripping out the cucumber tests and replacing them with tests in our implementation language.

It's a two way street. I'd highly recommend never implementing it unless product is the one driving it, and actually makes it a priority.

BJones12•12h ago

IME this allows the BAs who are writing ticket requirements to also write tests. It works. The dev may have to tweak the tests to get them to compile, but it an org that is willing to spend quite a bit of money on testing (e.g. public sector) this is a format that devs and BAs can read.

genericspammer•12h ago

IME? BA?

justusthane•12h ago

In My Experience, Business Analyst

genericspammer•11h ago

Thanks for the clarification.

tommy_axle•12h ago

Most likely "In my experience" and "business analyst"

woah•11h ago

If Mentored Excessively, Bachelor of Arts

genericspammer•12h ago

I’ve worked with cucumber for a few yeats and only ever seen developers writing Gherkin.

To me it’s just another framework to learn - a failed abstraction. It’s always introduced by idealistic devs who regularly jerks off to conference talks.

fusslo•12h ago

I work with engineers in Sweden, Poland, mainland China, the Philippines, and half the US team is Indian.

Even the US-born engineers cant always agree on how to communicate product ideas!

eloisius•12h ago

It sounds great on the label, but what you’ll end up with are single-use matchers for almost every line of your cukes, annoyance maintaining the regexes that map your cukes to the actual test implementation, and the “non-technical” people won’t be maintaining the cukes anyway. Sure, let them specify the cukes if they want, but then translate them into regular unit tests or specs and forget maintaining them.

Just like all the other no-code inventions out there, they fail to reckon with the fact that essential complexity isn’t a problem with programming language syntax.

ajmurmann•11h ago

I have had a similar experience. It just end up being a abstraction layer that gets harder and harder to map to underlying code as a project grows.

If readable tests are the goal, I think the time is much better spent cleaning up regular tests and writing well-named helper methods. Especially in Ruby you can get pretty close to tests that can be deciphered on a high level by non-programmers. That said, I've never had a PM or other non-programmer actually want to do that.

dmix•10h ago

> but what you’ll end up with are single-use matchers for almost every line of your cukes

Capybara has this problem as well as do other web acceptance tests. Maybe not every time a new matcher but frequent enough. Mostly because of the high variability of HTML paths. You have to design your HTML in a way (use lots of [title] tags and standard HTML) if you want to avoid it.

pjm331•11h ago

I've always thought this looks cool but then you realize the whole thing is driven by regexs on the "plain english" and I recoil in horror

    Given /^a nice new bike$/ do
      expect(bike).to be_shiny
    end

so I've traded translating business requirements into specs for trying to regex against business requirements - and probably a lot of back and forth telling people they wrote their gherkin wrong

have never actually tried it for that reason - it just seems worse in every way

nativeit•11h ago

Aren’t syntaxes and semantics purposeful? Isn’t knowing the proper syntax and semantic structure part of how we design good software? I don’t understand the desire for “plain language” beyond the notion that ignorant people want to stay ignorant.

drewcoo•11h ago

Extra layers of busywork that nobody else wants to look at.

If your job is polishing a turd, make the turd shine like a cucumber. I have seen countless hours wasted on making Cuke/Gherkin "pretty" instead of accomplishing anything for the business.

the_other•11h ago

Engineers should refuse to write Gherkin. It's not meant to be for them, it's meant to be for non-engineers to tell engineers (or the systems the engineers make) what they expect will happen. It looks like some flavour of user stories for this reason.

Engineers' responsibility begins at the "code behind" the gherkin layer. Someone needs to enforce this.

Sadly, IMO because gherkin looks like structured language (i.e "code"), and because that's what engineers do, they end up doing it. Either they tricked themselves into thinking they should, or the product team tacitly assumed it was a code thing.

If you're an engineer and you're writing gherkin, you should consider this a "code smell" and consider ditching it. Go for a spec tool one layer closer to your main domain. i.e. just write normal unit & integration tests.

I'm not saying "don't do BDD" or "don't do acceptance tests". You should do that! I'm saying "gherkin should allow product owners/managers to take ownership of reporting on how well the app meets their expectations". But most typically, they don't get it, or they don't want to do it.

I recently worked somewhere where we had dedicated devs-in-test. They wrote most of the e2e tests for the application. They wrote it in Gherkin, which meant that they _also_ wrote the JS/Selenium tests behind them. So they were basically writing two sets of scripts for each test. At some point they explored using a 3rd party cross-device testing platform (lambda test, maybe?) who had their own in-house test script that looked like Gherkin, so it needed a JS/TS code-behind to actually run the tests. So they did it all again!! Such a waste.

bluGill•11h ago

Gherkin is useful as a test name. However this should be a string that the test framework uses only for the ui. The executable parts should be normal functions that if needed call subfunctions that take normal parameters.

morkalork•11h ago

This managerial anti-pattern crops up with so many DIY tools; like reporting/dashboard platforms where engineers go through all the work funneling data to a 3rd party tool just for management to open tickets asking the same engineers to build dashboards in the UI. If we knew we were going to get stuck making the reports too, we'd probably choose any option that let's you specify them in some kind of code/markup instead and commit it to version tracking rather than a proprietary platform and their own internal history etc. etc.

ianbicking•11h ago

I wish the doctest approach won over the BDD approach. I still find myself recreating the doctest approach in miniature...

Specifically I try to make a nice string representation of state, and then tests match off that state.

So for instance I don't want to say expect(x.length).toBe(0) (one syntax, Cucumber goes further in that syntactic direction). Because if the length is 1 then 1!=0 is a very opaque failure. Or if you do expect(x[0].attr).toBe("y") then you've tested one thing but you have to have another test for list length, other attributes, etc. Often you'll leave the others out, expecting them to be ok, but sometimes they aren't...

Investing in a good, stable, matchable string representation will give you a great tool for any number of tests, and it's not just a great tool to verify but also to debug your tests.

kunley•11h ago

Deja vu.

Cucumber was all the rage in 2008, IIRC..

ascendantlogic•11h ago

Nothing made me more enraged than trying to express test steps via regex. I will never go down that road again.

ageitgey•11h ago

I posted this comment in 2013 [1] and I stand by it 12 years later:

Cucumber tries to solve the problem of turning customer requirements into 'real code'. In exchange for that worthwhile benefit, it asks you to implement the most terrible, reg-ex based spaghetti code imaginable.

The problem is that it doesn't solve the original problem AT ALL. And then you are left with terrible reg-ex driven spaghetti code. Like the Jamie Zawinski saying, "now you have two problems".

The lesson here is that software development processes have to pass the 'human nature' test.

The software industry has largely abandoned waterfall development because it just doesn't work well in practice. It doesn't work because people don't know perfectly what they want before they build it. Agile processes usually are much more efficient because they are more closely aligned to how humans solve problems in the real world.

Cucumber suffers from the same issue of being disconnected with reality. In theory, you can find a customer who can give you perfectly written use cases and you can cut-and-paste those into your cukes. In practice, that never, ever works. So let's all stop wasting our time pretending it was a good idea now that it has been shown to not work.

[1] https://news.ycombinator.com/item?id=6411787#6412391

blablabla123•11h ago

Nobody is going to read these test cases any way, except when troubleshooting or adding new code. But that is never any of the stakeholders

It's quite funny that now with AI-based no/low code the same thing is attempted again. Even if the regexes might disappear, it's even more text, with even less structure (assuming anyone checked those prompts into git in the first place)

MoreQARespect•11h ago

I use executable specs quite a lot with stakeholders.

I'd never ask them to write them, but I will often write a spec/test based upon their two sentence jira and then screenshare and walk them through my interpretation to get feedback early (i.e. before ive wasted time building the wrong thing).

Cucumber/gherkin is awful at this of course, and the regex thing was a terrible idea but it's not the only tool.

The idea that tests should be split into a specification layer and execution layer is a good one that should have taken off by now.

jonahx•10h ago

> The idea that tests should be split into a specification layer and execution layer is a good one that should have taken off by now.

There is a fundamental reason it hasn't:

An actual specification layer isn't any simpler than the execution layer. That's a programmer's fallacy.

What has taken off, and is part of virtually every software project, is a loose, natural language specification, which hints at "more or less" what the stakeholders are imagining. The idea that you can close the gap between this and a complete specification in a way that all stakeholders can still digest is the fantasy of cucumber. Or any other tool that attempts it.

You can't solve the problem in that way. Because, from a high-level stakeholder's perspective, the whole point of the people below them (whether programmers or UX designers or anyone else) is to fill in those details in a way that matches their own hazy expectations, at least well enough.

MoreQARespect•8h ago

>An actual specification layer isn't any simpler than the execution layer.

The point of separation of concerns isnt to keep the simple layer separate from the complex one. It's to simplify the whole thing by only addressing one concern per layer.

Unit tests are often a pain in the ass to read because they are a mess of implementation and specification details. No separation.

>You can't solve the problem in that way. Because, from a high-level stakeholder's perspective, the whole point of the people below them (whether programmers or UX designers or anyone else) is to fill in those details in a way that matches their own hazy expectations

If you crystallize the hazy expectations on their behalf and skip feeding it back to them then you will often find out that "thats not what i meant" after the code is complete.

Those mistakes are expensive and avoidable.

jonahx•7h ago

Oh I understand the standard arguments. I just don't agree with them.

> The point of separation of concerns isnt to keep the simple layer separate from the complex one. It's to simplify the whole thing by only addressing one concern per layer.

With some caveats, I think this is just a fiction. There can be some value in having high-level tests and low-level tests, but not because it removes complexity. It can help with focus and priority. Which is a problem that can be solved in many ways.

> If you crystallize the hazy expectations on their behalf and skip feeding it back to them then you will often find out that "thats not what i meant" after the code is complete.

But this is exactly what they want you to do, and do well enough that "that's not what i meant" is not a big problem. They certainly don't want to read cucumber tests as a way of ensuring you're on the same page. They will tolerate rough, incremental prototypes, per the old agile advice, and this is probably still the best way of solving the communication gap problem.

exe34•9h ago

Hey, could you please share what you use for writing your specs? Pure python? Or some DSL?

MoreQARespect•8h ago

hitchstory. typed YAML over python.

exe34•7h ago

thanks! looks interesting.

hn_throwaway_99•11h ago

But, as a software engineer, writing tests is probably the primary place AI gives me value. I can simply write tests much faster than writing them manually, and as a consequence, I'm a lot more likely to write tests that cover more cases. I still review all the code output, but the consequence of me missing a bug is much less than missing a bug in production code.

Exoristos•11h ago

As a software developer, I have enjoyed occasionally writing Cucumber/Gherkin. However, the roles that were _supposed_ to be writing it did _not_ enjoy it, either refusing to write it or writing unusably-brief and -malformed versions even after extensive training.

MantisShrimp90•11h ago

This sparks a thought experiment I've been having. In this world where llms can be thought of as the new layer of compilers, things like pickle are likely going to be the main unit of work for humans.

Only now instead of this developing brittle generated tests, it will instead be used by the llm as guidance to generate the actual code and tests.

Before people jump down my throat, I know we are nowhere near that today and I promise I'm not pitching this to my leadership because they would gobble it up too fast.

But for us engineers, I think there is an interesting space for thinking of llms as akin to garbage collection, a feature that allows us to abstract to a slightly higher level of thought. Yes we still need to know how to check under the hood, but this is looking like the right level of precision-flexibility ratio that llms thrive in

jefflinwood•11h ago

I'm building this over at https://zapcircle.com/ - it's still a work in progress, but it's all open source.

The idea behind it was that Behavior-Driven Development might be a great idea, but Gherkin was a pain to work with. LLMs bridge that gap now:

https://www.jefflinwood.com/2025/zapcircle-bdd-2025/

hakunin•10h ago

Good old cucumber. I remember a discussion over coffee with someone who swore by it back in the day (~2010). They forced customers to sit down and write these things together side by side. Kind of cringe in retrospect. Wonder what happened to them.

Business people don't really care about this stuff. Over the years I realize more and more that we engineers are naive in thinking that business side is concerned with edge cases and race conditions.

Just to give an analogy software people might get better, if you come to a lawyer because, say, you want to buy a house, you are not going to sit down with them and say "given I want to buy a house, when the seller hides water damage costing over $2000, I get to walk away from the deal". You just hope the lawyer is good and will protect you from various edge cases. You have a lot more to deal with than just closing paperwork. You probably are thinking about renovating, moving, getting inspection, etc.

Businesses are just like that with engineering. They don't want to sit down and meticulously analyze every possible edge case. They have other things to do. Especially when stakes are not that high. Most of these errors can probably be resolved with a phone call and a database edit.

I think this is probably for the best. A good engineer will make sure you're standing on a solid ground, and ask the right questions at the right time. They wouldn't need this amount of hand holding. Leave business time to focus on making deals, connections, organizing the whole operation to move forward, etc. Let them give you vague requirements, and crystallize them yourself. It's way better than a micro-managing business that thinks they know exactly how everything should be.

P.S. Also, I'm not sure why everyone is so hung up on regex = bad, it's not like switching to an AST-based language would've made anything better here. Regex is fine imo, just the entire concept isn't.

pftg•10h ago

You can easily build your own clear, simple DSL with Minitest, Pytest, or JUnit - no overthinking needed. Engineers can whip up more readable and reliable tests quickly. Since only engineers are writing tests, there's no need for unnecessary complications.

thom•10h ago

I am very much for internal DSLs for functional testing, but have never seen the point of external DSLs (it feels wrong that anyone but the implementing team could or should create or maintain them). I passionately believe you should refactor and use abstractions in your test code thought, and in my experience that ends up looking quite like parameterised steps (again, in code rather than English preferably).

dgunay•4h ago

We did a lot of this where I work to make using cucumber easier, and when we later stripped cucumber out it still was incredibly useful.

KeenanKeenan•10h ago

I love cucumber but I never follow the pattern to (1) write the feature file, (2) generate partially implemented tests with said file, (3) finish implementing the tests + subject under test.

It's better to (1) write the test, (2) then later once all that's done, extract documentation in a human-friendly format.

didip•10h ago

This tool is awful. I hated it a decade ago and I still hate its successors in other languages.

It creates unnecessary work for something that's never to be seen by real stakeholders.

There is a new trend these days using LLM that is similar to Cucumber: Spec Driven Development using AI. You'd be left disappointed again.

solannou•9h ago

I'm a little bit astonished by the comments mentioning regex. I used Gerkins at work and there was no regex at all.

Once you write it, you can use the sentence for building other tests if you design it for being flexible and variable, i.e. design well enough your code.

Given I'm connected as ___ with password ___.

When I click on the element __.

Then the element ___ is present.

[...]

For critical use case, it was enough. For more that's a bad idea: it take way too much maintenance cost.

dgunay•5h ago

I've commented elsewhere on lack of buy-in from stakeholders as a huge pitfall.

On a technical level, cucumber is also at odds with the need for a test suite to be easily maintainable. What I mean is that each test (especially e2e tests) will want to do some setup/initialization. This is usually expressed as step definitions.

Over time, an undisciplined team may write several slightly different but effectively identical step definitions. They may also combine multiple steps into bigger steps because usually a spec writer doesn't want to exhaustively define every piece of setup, they just want to write "Given all 200 pieces of input data and mocks are magically in the right place..."

I was able to wrangle the specs into composability using Rule and Background blocks, but at that point we were just programming tests with a shitty layer over the actual code.

M8.7 earthquake in Western Pacific, tsunami warning issued

Study mode

RIP Shunsaku Tamiya, the man who made plastic model kits a global obsession

URL-Driven State in HTMX

Launch HN: Hyprnote (YC S25) – An open-source AI meeting notetaker

Two Birds with One Tone: I/Q Signals and Fourier Transform

USB-C for Lightning iPhones

Learning basic electronics by building fireflies

iPhone 16 cameras vs. traditional digital cameras

FoundationDB: From idea to Apple acquisition [video]

Actual Size Online Ruler (Mm,Cm,Inches)

How the brain increases blood flow on demand

Show HN: I built an AI that turns any book into a text adventure game

JavaScript decided my day starts at 9am

Dropbox Passwords discontinuation

ACM Transitions to Full Open Access

Irrelevant facts about cats added to math problems increase LLM errors by 300%

Show HN: Terminal-Bench-RL: Training long-horizon terminal agents with RL

CodeCrafters (YC S22) is hiring first Marketing Person

A month using XMPP (using Snikket) for every call and chat (2023)

Microsoft Flight Simulator 2024: WebAssembly SDK

Playing with more user-friendly methods for multi-factor authentication

Structuring large Clojure codebases with Biff

My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)

Measuring Engineering

Supervised fine tuning on curated data is reinforcement learning

Maru OS – Use your phone as your PC

Elements of System Design

Observable Notebooks 2.0 Technology Preview

More honey bees dying, even as antibiotic use halves