How does one square those two realities?
You aren’t buying technical service or even technical assurance. You are buying someone to blame so the stakeholders don’t hold you accountable.
>He resigned as managing director of Target in April 2016 because of accounting irregularities that he was unaware of but "happened on [his] watch".[4] He then became the chief executive of Steinhoff International.[4] (which seemed to have a lot of issues too https://en.wikipedia.org/wiki/Steinhoff_International#Debt_p...)
Foresight to mitigate potential major issues is exactly what CEOs are expected to do. I'm not sure how being unaware of major account irregularities is not seen as a career ending move here.
AI replacing CEOs seems straightforward as well. Accounting is such a data driven environment i think spotting account irregularities early would be straightforward. Likewise AI has the potential to think past short term thinking that leads to IT outsourcing (to the extent the store is not coming back online anytime soon!).
I'm not sure I want AI replacing all CEO's, ideally it would raise the bar for quality and performance forcing human CEO's to compete.
People eat terrible food because they are bombarded with messages to do so. People can use terrible software for the same reasons. It doesn't matter that the food tastes worse than it used to–food companies are having record profits.
What a fun pair of assumptions!
Such is a taste of what needs to be done if you wish to have a service that takes months to set back up after any disruption.
M&S isn’t down for months because of something innocuous like a full security audit. As a public company losing tens of millions of dollars a week, their only priority is to stop the bleed, even if that means a hasty partial restoration. The fact they can’t even do that suggests they did stuff terribly wrong. There’s an infinite amount of things I didn’t list that could also be the case. Like if Amazon gave them proprietary blobs they lost after the attack and Amazon won’t provide again. But no matter what they are, things were wrong beyond belief. That is a given.
For instance, when Cloudflare all went down a while ago due to a bad regex, it took less than a hour to rollback the changes. Undoubtably there were bad practices that lead to a regex having the ability to take everything out, but the problem was isolatable and once adressed partial service was quickly restored, and shortly after preventative measures were employed. This bug didn't destroy cloudflare for months.
P.S. in anticipation of the "but cloudflare has SLAs!!" that isn't really a distinction worth making because M&S has an implicit SLA with their customers — they are losing 40 million each week they can't offer service. Plenty of non-b2b companies that invest in quick recovery as well, like Netflix's monkey testing.
Makes a big difference in developer quality of life and improves productivity right away. If you onboard a new dev you give them a checklist and they are up and running that day.
I had a coworker who taught me a lot about sysadmining, (social) networking, and vendor management. She told me that you'd better have your backup procedures tested. One time we were doing a software upgrade and I screwed up and dropped the Oracle database for a production system. She had a mirror in place so we had less than a minute of downtime.
Somehow, at some point, they decided that my CVS pharmacy account should be linked to my Mom's extracare. Couldn't find any menu to fix it online. So the next time I went to the register I asked to update it. They read the linked phone number. It was mine. Ok, it is fixed, I think. But then the reciept prints out and it is my mom's Extracare card number. So the next time I press harder. I ask them to read me the card number they have linked from their screen. They read my card number. Ok, it is fixed, I think. But then the reciept prints out and the card number is different—it is my mom's. Then I know the system is incredibly fucked. Being an engineer, I think about how this could happen. I'm guessing there are a hundred database fields where the extracare number is stored, and only one is set to my mom's or something. I poke around the CVS website and find countless different portals made with clearly different frameworks and design practices. Then I know all of CVS's tech looks like this and a disaster is waiting to happen.
Goes like this for a lot of finance as well.
E.g. I can say with confidence that Equifax is still as scuffed as it was back in 2017 when it was hacked. That is a story for another time.
Nobody bothers to keep things clean until it is too late. The features you deliver give promotions, not the potential catastrophes you prevent. Humans have a tendency to be so short sighted, chasing endless earnings beats without anticipating future problems.
They're probably having to audit everything, invest a lot of effort in additional hardening, and re-architect things to try and minimise the impact of any future attack. And via some bureaucratic organisational structure/outsourcing contract.
Bear in mind that this is a company which still sells physically and has retail and warehouse staff. All that the e-commerce side needs to do is issue orders of what skus to send to what addresses, and pause items that are out of stock. M&S is not Amazon and doesn't have that many SKUs, 5 people could probably walk round the store in a few days and photograph all of them for the new shopping site.
Sure, customers will need to make a new account or buy as a guest. But this stuff is not hard on the technical side. There is no interaction between customers like a social media site, so horizontal scaling is easy.
Now I get that there are loads of refinements that go into maximising profit, like analytics, price optimization, etc. But to get in revenue these guys don't even need to set up advertising on day one because they have customers that have been buying from them for decades. The time to set up all that stuff is when your revenue is nonzero
With each order:
- you need warehouse integration to keep the sync of physical to digital store. That has to happen fast or you’ll get orders with no stock.
- You need to sync the payment to whatever ancient accounting system they use, again while issuing invoices, consolidating customers … etc.
- Logistics management, where to get the order from, issuing a label, using the right fleet, making sure it is dispatched on time, arrive on time.
- Customer support, refunds, partial refunds, adding items after order … etc.
So yeah, 5 people!
I can't speak about M&S buy all big physical retail brand which started selling online are exactly operating as Amazon with SKUs coming from various third party entities. The offering is much bigger than what is sold at the physical shops.
If you check CompaniesHouse [1], which normally has all financial documents for UK corporations, it points you to a separate “Public Register” for the Co-Op [2].
So, your comment has more basis in reality than simply being snark… the fact that “nobody is incentivized to care” is actually by design. That has some positive benefits but in this case we’re seeing how it breaks down for the same reasons nobody in a crowd calls an ambulance for someone hurt… it’s the bystander effect applied to corporate governance with diluted accountability.
[0] https://www.gov.uk/hmrc-internal-manuals/company-taxation-ma...
[1] https://find-and-update.company-information.service.gov.uk/c...
In practice the distinction has long been lost both for employees and members (customers), but the intent of the organisational structure was not for nobody to care; quite the opposite
And at the executive governance level, there are a few dozen directors.
There is a CEO who makes £750k a year, so it has elements of traditional governance. I’m not saying the structure is entirely to blame for the slow reaction to the hack, or that there is zero accountability, but it’s certainly interesting to see the lack of urgency to restore business continuity.
My family used to own a local market, and as my dad said when I told him this story, “my father would have been on the farm killing the chickens himself if that’s what he had to do to ensure he had inventory to sell his customers.”
You simply won’t get that level of accountability in an organization with thousands of stakeholders. And a traditional for-profit corporation will have the same problems, but it will also have a stock price that starts tanking after half a quarter of empty shelves. The co-op is missing that sort of accountability mechanism.
It's only much later that the wheels fall off and it all goes to hell. The hack isn't a result of the CEOs actions this quarter, it's years and years of cumulative stock price optimisation for which the CEO was rewarded.
And you can't even blame all the investors because many will be diluted and mixed though funds and pensions. Is Muriel to blame because her private pension, which everyone told her is good and responsible financial planning, invested in Co-Operative Group on the back of strong growth and "business optimisation intiatives"? Is she supposed to call up Legal and General and say "look I know 2% of my pension is invested in Co-Op Group Ltd and it's doing well, and yes I'm with you guys because you have good returns, but I'm concerned their supermarket division is outsourcing their IT too much, could you please reduce my returns for the next few years and invest in companies that make less money by doing the IT more correctly?"
The incentives are fucked from end to end.
they can "move" it of course but who can guarantee how many amount goes from where and who ????
paper and pen where there are thousand items in single rack is nightmare, I can tell you that
at first I thought he underestimate this part of industry, I initially thought this because its common on HN mock tech company
Some or all of those may be broken during a cyberattack.
If you’ve got trucks arriving with meat that’s going to expire in a week, and all your stores have empty shelves, surely there is a system to get that meat into customer mouths before it expires. It could be as simple as asking each store, when they call (which they surely will), how much meat they ordered last week, and sending them the same this week. You could build out more complicated distribution mechanisms, but it should be enough to keep your goods from perishing until you manage to repair your digital crutch.
She said that every shelf item is ordered on a JIT basis as the store stock levels require them - there are no standing orders to a store
Based on that, I presume they didn’t really know what any store would need
Even when they were struggling my local store still had a decent stock of lots of stuff - just some shelves were empty
That wouldn’t work today for a number of reasons but it was cool to see that kind of backup plan in place.
(^I remember the day better than the year because the ad campaign was something like 'I <3 PIN'.)
Of course you could stand up a whole new system like that eventually, but you could also use the time to fix the computers and get back to business probably sooner.
But I imagine during those 3 weeks, there were a lot of phone calls, ad-hoc processes being invented and general chaos to get some minimal level of service limping along.
Anyone who’s experienced the sudden emergence of middle management might feel otherwise :) please don’t teach those people the meaning of “triplicate,” they might try to apply it to next quarter’s Jira workflows…
I wonder if we could negotiate a return to typewriters and paper if it means individual offices and a tea trolley?
Perhaps we need fallback systems that can rebuild some of that utility from scratch...
* A communication channel of last resort that can be bootstrapped. Like an emergency RCS messaging number that everyone is given or even a print/mailing service.
* A way to authenticate people getting in touch using photo ID, archived employee data or some kind of web of trust.
* A way to send messages to everyone using a he RCS system.
* A way to commission printing, delivery and collection of printed forms.
* A bot that can guide people to enter data into a particular schema.
* An append only data store that records messages. A filtering and export layer on top of that.
* A way to give people access to an office suite outside of the normal MS/Google subscription.
* A reliable third party wifi/cell service that is detached from your infrastructure.
* A pool of admin people who can run OCR, do data entry.
Basically you onboard people onto an emergency system. And have some basic resources that let people communicate and start spreadsheets.
It’s possible they were ordering some default level of stock and I just didn’t go at the right time to see it, but it sure looked like they were missing the inventory… when I first asked the lady “is the food missing because of the bank holiday?” and she said “no because of the cyber attack” I thought she was joking! It reminded me of the March 2020 shelves.
Plus at least monthly if not daily, even hourly system patching.
Planting a garden is one thing.
Weeding it is another.
90% of startups fail within 5 years so probably not the best example of how to run things.
The few that do "succeed" often carry over mountains of cruft and garbage code into perpetuity (for example Reddit).
- Everybody in their right mind agreed that, for what they were achieving, Twitter was completely over-staffed. Like most of the big tech co in this period. And like most of those co, they went through a leaning program with mass layoffs.
- If the service is running fine with only 10% of the staff, it doesn't necessarily means that the 90% that got fired were useless. I can get a 6yo to heat their food using a microwave. Does it mean that the kid is a genius, or that the people who made the microwave did it in a way that allows a kid to operate it, even though it's a complex system at its core?
- Comparing Twitter to an international eCom website is disingenuous. If "design Twitter" is a common system design interview question, it's not because the website is popular, it's because the basics are quite simple. Whereas, behind an eCom website, there's dozens of parts going on at any time, with hundreds of interoperability issues. You're not mainly relying on your main DB for your data, most of it is coming from external systems.
Forensics, among a hundred other things.
> Literal amateurs can launch a WooCommerce site from nothing in a weekend
Selling low-volume horseshit out of your garage is in no way comparable to running a major eCommerce site.
> two Stanford grads in YC can do a hundred-fold better than that.
No they literally can't.
> Yes, a big site is more complicated, maybe there will be some frazzled manual data entry in Excel sheets while your team gets the "real" site back up
Great idea, we'll have Chloe in Accounts manage all the orders in a million-row Excel sheet. Only problem might be they come in at 50 orders a minute, but don't worry I hear she's a fast typist.
And maybe this is intentional, rational strategy - why not reinvest profits in R&D? But just because an organization is large does not mean that it’s efficient.
All of which assumes you even know what services exist, which in any company of this age and size you probably don’t.
Marks and Spencers started as a department store; they still have this operation. They sell clothes, beauty products, cookware, homeware and furniture. All these things are sold in physical shops and online. Most of this is straightforward for an e-commerce operation, but the furniture will involve separate warehousing and delivery systems.
They also offer financial services (bank accounts, credit cards and insurance). These are white labelled products, but they are closely linked to their loyalty programme (the Sparks card).
Finally, they have their food operation: M&S is also a high-end supermarket. You can't do your food shop on the M&S website (although their food products are available from online-only supermarket Ocado), but you can order some food products (sandwich platters and party food) and fresh flowers from the website.
So M&S is a mid-tier department store and a high-end supermarket. These are very different styles of retail operation: supermarkets require a lot of data processing to ensure the right things get to the right shops at the right time to ensure that food doesn't go to waste but also shoppers aren't annoyed by the unavailability of staples like bread and milk.
Finally, M&S is traditionally fairly strong in customer service; it's not exactly Harrod's or Fortnum and Mason's, but their bra-fitting service, for example, has a legendary reputation. The internet isn't their natural home.
So all-in-all, you have a business doing complicated things online because they think they have to, not because they want to: a pretty clear recipe for disaster.
I most recently remember sifting through gloating that 4chan - a shoestring operation with basically no staff - was offline for a couple weeks after getting hacked.
I've worked at a shop that had DR procedures for EVERYTHING. The recovery time for non-critical infra was measured in months. There are only so many hands to go around, and stuff takes time to rebuild. And that's assuming you have procedures on file! Not to mention if there was a major compromise you need to perform forensics to make sure you kick the bad guys out and patch the hole so the same thing doesn't happen again a week after your magical recovery.
And if you don't know, you shut it down till it's deemed safe. How do you know the backups and failover sites aren't tainted? Nothing worse than running an e-commerce site processing customer payment card data when you know you're owned. That's a good way to get in deeper trouble.
When I was at early Twilio (2011? 2012? ish), we would completely tear down our dev and staging environments every month (quarter? can't remember), and build them back up from scratch. That was everything, including databases (which would get restored from backup during the re-bring-up) and even the deployment infrastructure itself.
At that point we were still pretty small and didn't have a ton of services. Just bringing my product (Twilio Client) back up, plus some of the underlying voice services, took about 24 hours (spread across a few days). And the bits I handled were a) a small part of the whole, and b) some of the easier parts to bring up.
We stopped doing those teardowns sometime later in 2012, or perhaps 2013, because they started taking way too much time away from doing Actual Work. People can't get things done when the staging environment is down for more than a week. Over the following 10 years or so, Twilio's backend exploded in complexity, number of services, and the dependencies between those services.
I left Twilio in early 2022, and I wouldn't have been surprised if it would have taken several months to bring up Twilio (prod) from scratch at that point, though in their case it would be a situation where some products and features would be available earlier than others, so it's not really the same as an e-commerce site. And that was when I left; I'm sure complexity has increased further in the past 3 years.
Also consider that institutional knowledge matters too. I would guess that for all the services running at Twilio, the people who first brought up many (most?) of them are long gone. So I wouldn't be surprised if the people at M&S right now just have no idea how to bring up an e-commerce site like theirs from scratch, and have to learn as they go.
It’s part of the reason tape is literally never going to die for organizations with data that simply cannot be lost, regardless of rto.
That would also take a lot of the pressure off of the "full recovery team."
Of course, the real situation must be 100x more complex than I'm imagining it so "I'd like to think" != "I am confident"
The real situation is not 100x more complex, it's just that this is happening in Britain where everything is someone else's job and no one has any reason to care about the actual goals and everyone will go home at 5pm. Or, more likely, to the pub.
The largest enterprise example of a Shopify customer on their marketing website has $500 million in sales.
M&S has an annual revenue of over £10 billion
I think a lot of companies (especially in Europe) have not internalized that, yes, you actually do need to expend apparently exorbitant amounts of money on highly-paid engineers if you want your tech to actually be good. Many countries, including the UK, are simply not wealthy enough to do it at scale. They produce plenty of engineers, but most of the ones capable of holding complicated stuff together probably end up working for US companies that can pay them market rates.
Time and time and time again we have seen major failures globally, and especially in the UK, that prove that there is no fungibility of engineers, and that outsourcing the critical technical infrastructure for your core systems and services is doomed to failure. They'd rather save a dollar today and lose ten million dollars tomorrow by damaging their national economy and sending more money to India. India's GDP is basically entirely propped up by tech services, and most of that is /failed service delivery/, hard to differentiate from frauds and scams at scale.
I wonder if people like this ever hear themselves talking.
nickdothutton•1d ago
fredoralive•1d ago
https://www.theguardian.com/technology/2005/apr/19/business....
But they eventually took control back, so it clearly didn't work for them:
https://www.theguardian.com/business/2014/feb/18/marks-spenc...
M&S orders still use the same ###-#######-####### order number format as Amazon, so I'm not sure if it's still some sort of fork of whatever white-label Amazon technology they were using back then.
I'm not sure if getting Amazon to run your own ecomerce website is really the greatest idea in the long term (Amazon kinda want your customers to use Amazon, not your website), but M&S using them isn't as mad as that bit in the early 2000's where Waterstone's website was just a subsection of Amazon.co.uk.
spacebanana7•1d ago
Amazon has a clear conflict of interest with anyone in e-commerce. Shopify is probably a better example.
xp84•1d ago
fredoralive•1d ago
youngtaff•14h ago
Warehousing and delivery is probably contracted out to another third-party
My guess is it’s one of these that’s been hacked
arp242•1d ago
neepi•1d ago
Their approach was to sell the UK operation to Tata in 2018 and piss everyone off until they leave and replace them with Indian staff to save costs over time.
You get what you pay for. They're now paying for it.
dangus•23h ago
As an example of Tata’s general competency, Tata owns Jaguar Range Rover group which reported their best profit in a decade for the fiscal year ending March 31.
It’s certainly very possible for any help desk to be an attack vector regardless of the nationality of the employee.
So it seems to me the main point of your comment is to hate on an Indian company solely for being Indian and to stereotype Indian companies as low quality and low cost.
Tata Consultancy is a global company that includes offices in places like Chicago, Dallas, and Atlanta.
neepi•19h ago
We can all pick and choose good stories. What about Tata Steel's total mismanagement of Tata Steel Europe? And JLR isn't exactly in good shape as you say it is. You just picked some numbers that sound good. And wrapped it in a xenophobia straw man.
d1sxeyes•18h ago
First part is a reasonable assumption, the second is not, and this is what’s opening you up to allegations of xenophobia.
The allure (and promise) of outsourcing is the idea that you can pay less for a comparable service due to the cost of living disparity between your location and the outsourcing provider. Whether any individual provider achieves that or not is another question, but saying “if your service is provided from India you are not concerned about quality”, or “if your service is provided from India you will have a constant cycle of low quality staff” does sound a lot like xenophobia.
> No one there wants to work for those outsourcing chop shops
This is simply not true. There are a lot of benefits to working for a company like this, although as with any company, it’s not all upside. Of course turnover is high because you have a lot of entry level folks. Regardless of where you are in the world, no-one wants to work a level one help desk until they retire.
> We can all pick and choose good stories. What about Tata Steel's total mismanagement of Tata Steel Europe? And JLR isn't exactly in good shape as you say it is. You just picked some numbers that sound good.
Yeah odd example, JLR is not doing great, and Tata Steel is also struggling, but overall the Tata Group are doing well.
neepi•17h ago
The problem is crap people are cheaper in India than crap people elsewhere. And that looks good on the balance sheet.
And as we’re about maximising shareholder value these days then that’s fine apparently.
dangus•1h ago
Preventing security breaches has a lot more to do with process design and implementation versus the choice of handling that process in-house or externally. For all you know the breach was completely the fault of the M&S implementation as set up by employees in the UK (e.g., M&S gave their contractor in India access/credentials that were too generous and could have been tightened up far more in retrospect).
Remember that basically every company with an online presence is depending on third-party data brokers, and those companies operate around the globe, which is why accusing India of being low quality just because it's India is so xenophobic/racist/nationalist.
I would also say it is very obvious to me that you almost certainly wouldn't be making the same comments if M&S had outsourced to a customer service firm in Ireland or Scotland.
And wow, Tata mismanaged Tata Steel Europe. Guess what? US Steel mismanaged US Steel. General Electric mismanaged probably more than half of its entire business. So what's your point? Any company in any country can be mismanaged. It's actually really hard to manage a business perfectly for a long period of time. Singling out Indian companies for being Indian is basically the definition of xenophobia/racism/nationalism.
benjaminwootton•1d ago
ecshafer•1d ago
madeofpalk•1d ago
I’m not sure what it’s like in the US, but grocery delivery is a reasonably big deal in the UK.
runako•1d ago
Most retailers will argue that connecting with their core customers and delivering delightful experiences to them is their core expertise.
More practically, it will be tension between things like "our marketing department wants X on the site for summer" and "Shopify is planning on launching X in January." It will be less of a resistance to using a third-party provider and more that the third-party provider imposes constraints on the mode of contact with customers. That's a hard pill to swallow for a lot of consumer-focused companies.
xp84•1d ago
Having worked in e-commerce for most of my career, for individual retailers, I can assure you that the perpetual tension you describe is real. The problem as I see it is, every little retailer thinks that their two-bit designers and product managers are so uniquely visionary in designing interactions that they rightfully should have full control over the product that is the ecommerce website. Shopify employs God-knows-how-many engineers to build and maintain this experience, and probably thousands of SREs to be there 24/7 making sure a random DDOS or slow query doesn't take your site out. "But we think we can build a better site than Shopify with 10 engineers and a couple of managers," they say.
They can build one that has the 3 cute whiz-bang features that their self-important product design staff thinks matter, but it will be unreliable, and they won't have sufficient expertise to get right the other 90% of what a "good" ecom site should have. And on top of it all, none of those gimmicks will likely improve conversion or order value enough to be worth doing.
The smarter ones IMHO do use Shopify. It lacks so many things in its core that it's infuriating (decent search, any nontrivial filtering), but retailers who use it mostly patch over those flaws with plugins sold by third parties (which often introduce ghastly single points of failure that you have no visibility into, and you can't sue some random plugin vendor you pay $50 a month for your site going down on Black Friday).
Ecommerce is hard tbh. But I do personally think that most of my previous employers probably should have done lightweight Shopify skins and made their core competence sourcing, merchandising, and advertising product rather than designing cute search filters, or their own product recommendations algorithm.
runako•22h ago
That said, in the context of a Marks & Spencer-sized company (~$13B revenue), it absolutely can be a competitive advantage to in-house e-commerce if it is resourced & staffed appropriately. They are talking about a £300m hit to profit, so they appear to have some headroom for running a complex site.
Doing in-house gives the opportunity for a company to fix the kinds of things you mention with dodgy plugins etc. It also lets them take advantage of Doing Things Our Way, which sounds silly until you consider that Doing Things Our Way is how they got to be so big. And of course, in-house builds are still allowed to use off-the-shelf software where it makes sense.
Also DIY allows companies to adopt new stuff at their own pace. This tends to be important at times of tech transition like it appears we are now reentering.
Will reiterate that e-commerce is hard & there are really no easy answers.
dangus•23h ago
Basically by your exact same logic you're asking Walmart and Target to outsource their websites, which is completely insane.
crop_rotation•16h ago
No, because Walmart website being down for months is just unthinkable. If a company is so incompetent/resource deficient/use your favorite phrase to describe it/ that their e-commerce is down for a month and outlook is it will be down for more time, then something is seriously wrong with the company. Such companies are 100% going to have a much better experience with Shopify.
> If one of your core businesses is selling clothes online, and you're a large enough entity, you should write your own software to sell clothes online.
If only this was so easy. For various reasons writing non trivial software is hard, and unless these companies can make some structural changes to hire and retain very good engineers (which also is hard for various reasons for these companies), they simply have no chance of doing better than shopify.
dangus•1h ago
If I buy a car and I happen to crash it on the first day I drove it home, that doesn't mean I made the wrong choice to buy the car that day. I still bought the car based on the best information I had at the time.
M&S basically lost a lottery ticket type of bad luck scenario where they are dealing with a breach that is far, far worse than a typical data breach's impact.
Remember when the PlayStation Network was down for over a month due to a very serious breach? That breach didn't prove that PlayStation should have used some kind of external provider for its online services. In fact, an external provider for that sort o thing is not even practical for their business.
Remember, there's an alternate timeline where Shopify itself could be breached in a similarly severe way and also go down for 3 months. It's very unlikely but it's possible. If it can happen to M&S it could happen to Shopify.