> As a result, the company had to scrap thousands of wafers
Anything involving wet chemistry, photoresist, furnaces, etc. is very time-constrained. You can't let wafers sit around indefinitely. Certain process steps must be followed up very quickly to avoid scrap.
This is why you dont see redundant power for manufacturing lines. A 3nm line needs hundreds of megawatts to operate. You cant clear queued lots without a fully functional line. There's not much you could save by keeping part of the line operational.
A new failure mode resets output progress back to zero if you lose power or some other input while crafting.
You could design circuit networks to cut power to non-essential systems so the rest of the factory can keep producing.
Surely the reality might be much more complex (like... the yield/quality drop by time function?)
It eats all of your power and usually also very expensive items very quickly usually. Assume you have like 600RF/tick generated, common with certain generator constellations. 1 tick - 1024 RF and one input consumed, crafting fails due to not enough power. 1 tick wait, 1 tick, 1024 RF and one input consumed, ... This can void 10+ items / second, which can hurt very badly. Even for common items in fact.
It also tends to kick you while you're down, because it only kicks in if everything else is already failing. Then the only thing to continue functioning is the thing voiding your energy and your expensive items. Or even worse, if you did one miscalculation about your power grid, and then all of your resources are gone, often before you can react.
It can be interesting in the right packs, but it is Gregtech level hardness.
GT's system of only pulling power on-demand is very nice though; no wasting fuel
I'm currently playing Stoneblock 4 and have been playing GT:NH and Nomifactory some time ago, and the more modern mods have learned a lot from those old janky things. Heck, back in the day every mod had a different power system and you needed a nonsense amount of conversion infrastructure, unless the modpack did a lot of work to combine all of this somehow, haha.
May or may not apply to multi blocks.
It really is about the journey; GT:NH isn't played to be won - instead, the win condition is continually moved out so you can keep playing.
Suppose you can start production with only 1 of each input required for a recipe, but to keep it going you need to keep feeding all of the inputs to finish it. If any of them run out, then the recipe fails, you lose the inputs, and the machine stalls.
This works better for high latency recipes (>10s) with lots of inputs, like low density structure, modules, and atomic bombs.
It still looks kinda easy, the machines just do it automatically on the default game.
It's probably not 100% identical to TSMC's process.
For some processes, stopping will botch the wafer. In the event of a gas shortage, do plants plan which lines to take down first, and which lines should complete a process step?
In the case of a power interruption at the fab, consequences were highly dependent on the equipment and the unit process. A prolonged power interruption to diffusion was the worst case scenario. You’d have 150 wafers in the furnace, and any significant deviation from the nominal temperature profile meant they were all scrap. Worse, if the furnace cooled off, you had to scrap the quartz boat the wafers rode in, too. Other processes had a smaller blast radius but were even more of a headache to disposition. Implant, you’d lose beam and probably lose vacuum too. Then the wafer in the chamber would be dusted and in an indeterminate state, and the rest of the wafers you’d have to sleuth out whether they were implanted or not. Sometimes you’d have a lot sitting in the end station and it wouldn’t be clear whether or not it had been run at all. At least in photolithography you could tell whether or not a wafer was patterned by looking at it.
Even so, I also would still call this another monday at a semiconductor factory. Welcome! Here we play a nearly endless game of whack-a-mole. Here's your mallet and your towel. Now whack enough of the moles hard enough until they stop coming back (at least through the same holes). Beware the alpha moles.
By any road, I am surprised to see even this high-level perspective on a quality event disclosed to the mainstream public; I thought this was not standard practice. I enjoyed the read.
The number of issues that a semiconductor factory stoppage would cause stretches one imagination, worse if you cannot bring the material to a "safe" spot on the line. I will try to capture a few of them, off the top of my head.
As you alluded to, Contamination is the big one. You really need power to keep things clean. But also, the process that runs in the factory is just assumed by default to run all the time, and you optimize the process around that assumption. In a system with thousands of operations (and many suboperations within each operation), the process window is just too small to tolerate much deviance, and the process window is certainly not explored around a hard restart like this. We want to prevent it from running under these conditions at all!
Now for some more details:
- If your fab air handling/pumping system stops, particle counts will explode. This in turn causes killer defects on the process material.
- You also can't keep your tools evacuated at high vacuum / ultra-high vacuum levels (effectively, atomically pure). Pumping down to this level is not trivial and can take weeks of work to restore if the vacuum chamber is badly contaminated. Fab air is much better than the labs I used pumps in, but it is still a big job to keep these chambers pristine.
- Many tools are implicitly dependent on continuous operation and consumption of feedstock and workpieces (often called tool conditioning). For example, Letting a dry etch chamber idle means it will inevitably develop some kind of contamination layer over the previous chamber-wall conditioning layer. This can happen very fast (think ~30 min) even when the tool is idling under ideal conditions, and it often forces process module friends to run "dummy" conditioning wafers to manage the issue. Now imagine what might happen on non-ideal conditions.
- Feedstock / consumables can go bad very fast. There's wet and gaseous feedstocks trapped in the lines of every single tool, and most modules don't characterize what happens to the feedstock quality when the tool is shut down, at all. Related, I remember a story where a lab was having a terrible time replicating what was happening in a foundry due to particle contamination from wet cleans/etch. It turns out that the particulate was coming from the plastic jugs holding the wet chemistry. The root cause turned out to be the fab used that chemistry so much and so fast that the particulate contamination was never a problem, while the lab might have held the half-full jugs for months, causing plastic bits to build up in the chemistry.
- The engineers must prove that their tools/segments works as spec'd post restart. This is exhausting and painstaking work. Bringing tools back up to production in the course of normal operation is already tiresome enough. But you cannot just run critical material and hope for the best! SO now you must spend days validating the entire process line again.
- You can try to shelve / store key material to avert true disaster, but there are critical segments where this is impossible due to reactivity or sensitivity or whatever. You have a finite amount of time to get your material out of those high risk segments, and if the gas supplier only gives you an hour of forewarning, all that material might be totally screwed and there is virtually nothing you can do except cross your fingers. The material would likely be scrapped anyways since the risk is known to be too high to bother processing it further.
- There is also a finite amount of time where the wafers can spend in stores, even if they are pulled off the line in "safe" segments of the process. They will still collect particles, they will oxidize, surface quality will degrade as long as they are not in optimal conditions. Cleans are an option, but you must be sure those cleans do address the specific types of contaimination the wafers collected while in the stocks.
OK, that's what I could immediately think of off the top of my head in the time I have available. Hope that satiates your curiosity for the moment.
> This isn’t very big news.
The opening paragraph feels a bit pearl clutching to me. > the company had to scrap thousands of wafers that were in production for clients at the site which include Apple, Nvidia, and AMD.
Eh. So what? I am sure scrap thousands of wafers for all kinds of other reasons. I would be better to know the cost per hour of a total plant shutdown. (Of course, I'm sure the author doesn't have this information.) > After all, the TSMC logo features failing parts!
Final hat tip here. I never knew that.I'm not sure about that, I think the blank spaces are just parts that have been picked. The dies have been cut and the good ones are being removed.
so it always comes to those out of the loop as a bit of a surprise but from what I've read from individual Taiwanese workers and their feedback its clear that there is significant regret from one side.
and it doesn't seem to limited to just TSMC but another large company as of recent that receive icey reception for their large investment in America manufacturing.
i think this is a big reason why lot of these jobs simply wouldn't stay in america as the consumer would not be able to foot the costs added by "cultural premium" faster than what innovation can reduce.
I’m not an expert on Taiwanese labor laws but their list of exempt labor categories in the LSA is much shorter than the one in the American FLSA.
He admitted, even with their OT and bonuses, he probably makes more than them w2 salaries.
But my point still remains: if they want US (or TW) folks to work more hours, they need to pay for those hours.
This reddit post captures what I've seen at TSMC in Taiwan. $120K is normal pay at the director level...engineers make $2500-5000 a month. TSMC AZ starting pay for a new college grad w/ BS is probably just under $100K/year with just salary, with the potential to make over $120K within a few years with full vested bonuses.
My point is: Engineers in Taiwan work more hours because they are paid to work more hours (OT). Engineers in the USA are not paid more if they work 35 hours or 60 hours.
If TSMC wants to address the culture gap (get the Americans to work more), TSMC should pay up.
I can 100% tell you this is not true.
Americans typically ask for things like work life balance, non abusive working hours, etc. they also don’t (anymore) have the type of family life setup that allows them to actually focus so much - being pulled into child care duties, or taking care of family members, or whatever their next vacation should be, etc.
The general attitude is also more ‘yeah whatever’ to some extent.
The amount of singular obsessive engineering you get out of one vs the other is hard to compare.
my original thinking after reading some of the anecdotes from TSMC engineers is that they were obsessively dedicated which means extreme hours from North American culture
its also the same in places like Samsung where the company treats employers very well with perks and long career stability but its not free always requires huge sacrifice I'd imagine similar to Japanese conglomerates.
I'm not sure which is better in America its definitely transactional relationship but it also comes with stability issues relatively compared to what these East Asian giants offer but at the cost of not being able to switch if and when you find yourselves at odds.
Not sure what it was like at Nokia but also another conglomerate that ultimately folded under competition and also a country with more stringent labor/life constraints that you would find less enforced in East Asia.
Getting a bit distracted here but noting how much culture plays a role in these large companies and their management styles.
The ones I met would make Mormon trad wives look liberal, but perhaps by mainland expectations? (I doubt it though, mainland is relatively liberal for women)
I guess it really depends on the individual but Japanese definitely still seem more focused on those traditional role separations (although both couples working seem more common), Koreans used to but recent decades have become more "liberal progressive" leading to conflicts with an economy largely kept afloat by 8~10 companies and its not uncommon for some men to still manage household stuff even after their jobs and this is what I heard also to be true for Taiwanese households.
I'm not sure about Mainland but there has been shifts in these regions owing to Western media and values coinciding with lower birth rates vs areas where the traditional husband-wife roles are still intact (and also by happence more economically better off vs singles/divorced/progressive couple relationships).
Maybe there is a world where TSMC can hire enough skilled workers and optimize processes enabling people to go home at 5p, but that is not currently the case.
The US is going to have to heavily subsidize the payroll of tens of thousands of very accomplished EEs/etc to make this work. By doing that they will also wreck the HW part of SV.
Also, it's completely normal to run a factory 24/7. I think people are just impressed because TSMC is the only one they've read about?
(However, it's correct that a TSMC fab is the most advanced and complicated process on the planet.)
Think about how Intel, who pioneered the know how, can't build cutting edge nodes in the levels that they need to make it profitable.
IBM had to sell their fabs to cater to the whims of "shareholders".
It's the greed of stockholders that you need to blame.
They have a special advantage because they don't compete with their customers, which leads to trust, which leads to customers paying for their R&D for them.
Intel on the other hand just kind of sucks at their job. Skill issue basically. (But they aren't /that/ far behind.)
https://news.ycombinator.com/item?id=17686310 ("Computer Virus Cripples Several Taiwan Semiconductor Plants (bloomberg.com)"—2018, 100 comments)
https://news.ycombinator.com/item?id=19214952 ("TSMC's Photoresist Material Incident: $550M Loss (anandtech.com)"—2019, 15 comments)
Those gases are storeable, so it's surprising there wasn't enough tank capacity to deal with outages.
The site plan [2] shows "Gas Plant 1", and future "Gas Plant 2" and "Gas Plant 3". The gas plants are across a small road from the fab and feed the plant directly. Once Gas Plants 2 and 3 were built, there would be redundancy, but at this stage, there isn't a backup. The plan doesn't show a large tank farm, so they can't store gases in bulk.
[1] https://www.aztechcouncil.org/utility-company-makes-progress...
[2] https://semiwiki.com/forum/threads/tsmc-phoenix-arizona-fab-...
(joke off, it's probably not an add, but they were excited to share the reason you see Linde on all sorts of gas tanks all over the place. It's actually quite common and if you see it once you see it everywhere. )
What is funny though, at least in the Australia and UK regions, they still use the BOC brand, which is a subsidiary under Linde.
Supagas tend to have better prices for smaller operators, and hobbyists.
When googling the company, the marketing slogan that comes up is "Linde is Everywhere" but that works on so many levels. They sell air, air is everywhere. Therefore Linde is everywhere.
They are a company that sells air: that stuff that people breathe. Forget this AI nonsense. Jensen has to constantly pull something out of rear to keep food on the table. These guys sell air. What a business. :)
(Well, it's cheaper.)
From the outside, I would love to participate in semiconductor manufacturing.
I also had a pager strapped to me 24/7/365. Finding a US backup for a UI developed in an obscure language owned by a Japanese IT company proved to be quite challenging. I bet they're still using it to this day and just managing it from Korea now. The risk of rewriting or refactoring some of this stuff is measured in 10-11 figures.
And then there’s the software. Part of my job was entering numbers into a system that had been designed to make it hard to enter numbers in it. This was so that you wouldn’t change them too often. But we did. A big part of the job was data analysis, but instead of actual access, certain data was only available as server rendered PNGs. Small ones.
I could go on, but I think that’s enough.
That's some big brain management idea right there. I suppose there was probably a reason for it but it sounds like when you do make a change it would be likely to cause an error because of poor ergonomics.
Chemical Formula: \(SiH_{4}\) Appearance: A colorless gas with a repulsive odor Flammability: Highly flammable and pyrophoric, meaning it can ignite spontaneously in air Toxicity: Very toxic by inhalation and a strong irritant to the skin, eyes, and mucous membranes
It probably depends on the duration of the outage. I'd expect they have some storage, and if they plan on having the compressor plant down for longer that that can manage they'll bring in tanks.
taurath•2mo ago
joecool1029•2mo ago
I wouldn't think it would have to be too quickly since I've heard about fab disruptions from fires and such since the early 2000's. Probably just sometime after quarterly reporting to set the record straight? Why not in the report?
samus•2mo ago