They spent the effort of branding private VPC endpoints "PrivateLink". Maybe it took some engineering effort on their part, but it should be the default out of the box, and an entirely unremarkable feature.
In fact, I think if you have private subnets, the only way to use S3 etc is Private Link (correct me if I'm wrong).
It's just baffling.
People who are probably shouldn't be on aws - but they usually have to for unrelated reasons, and they will work to reduce their bill.
This just sounds like a polite way of saying "we're taking peoples' money in exchange for nothing of value, and we can get away with it because they don't know any better".
Hideous.
S3 can use either, and we recommend establishing VPC Gateway endpoints by default whenever you need S3 access.
(Disclaimer: I work for AWS, opinions are my own.)
They should be, of course, at least when the destination is an AWS service in the same region.
[edit: I'm speaking about interface endpoints, but S3 and DynamoDB can use gateway endpoints, which are free to the same region]
The other problem with (interface) VPC endpoints is that they eat up IP addresses. Every service/region permutation needs a separate IP address drawn from your subnets. Immaterial if you're using IPv6, but can be quite limiting if you're using IPv4.
S3 can use either, and we recommend establishing VPC Gateway endpoints by default whenever you need S3 access.
(Disclaimer: I work for AWS, opinions are my own.)
Other AWS services, though, don't support gateway endpoints.
Would you recommend using VPC Gateway even on a public VPC that has an Internet gateway (note: not a NAT gateway)? Or only on a private VPC or one with a NAT gateway?
Gateway endpoints only work for some things.
https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpo...
(Disclaimer: I work for AWS, opinions are my own.)
This has been a common gotcha for over a decade now: https://www.lastweekinaws.com/blog/the-aws-managed-nat-gatew...
There are many IaC libraries, including the standard CloudFormation VPC template and CDK VPC class, that can create them automatically if you so choose. I suspect the same is also true of commonly-used Terraform templates.
It's a convenience VS security argument, though the documentation could be better (including via AWS recommended settings if it sees you using S3).
NAT gateways are not purely hands-off, you can attach additional IP addresses to NAT gateways to help them scale to supporting more instances behind the NAT gateway, which is a fundamental part of how NAT gateways work in network architectures, because of the limit on the number of ports that can be opened through a single IP address. When you use a VPC Gateway Endpoint then it doesn't use up ports or IP addresses attached to a NAT gateway at all. And what about metering? If you pay per GB for traffic passing through the NAT gateway, but I guess not for traffic to an implicit built-in S3 gateway, so do you expect AWS to show you different meters for billed and not-billed traffic, but performance still depends on the sum total of the traffic (S3 and Internet egress) passing through it? How is that not confusing?
It's also besides the point that not all NAT gateways are used for Internet egress, indeed there are many enterprise networks where there are nested layers of private networks where NAT gateways help deal with overlapping private IP CIDR ranges. In such cases, having some kind of implicit built-in S3 gateway violates assumptions about how network traffic is controlled and routed, since the assumption is for the traffic to be completely private. So even if it was supported, it would need to be disabled by default (for secure defaults), and you're right back at the equivalent situation you have today, where the VPC Gateway Endpoint is a separate resource to be configured.
Not to mention that VPC Gateway Endpoints allow you to define policy on the gateway describing what may pass through, e.g. permitting read-only traffic through the endpoint but not writes. Not sure how you expect that to work with NAT gateways. This is something that AWS and Azure have very similar implementatoons for that work really well, whereas GCP only permits configuring such controls at the Organization level (!)
They are just completely different networking tools for completely different purposes. I expect closed-by-default secure defaults. I expect AWS to expose the power of different networking implements to me because these are low-level building blocks. Because they are low-level building blocks, I expect for there to be footguns and for the user to be held responsible for correct configuration.
On the one hand, this is obviously the right decision. The number of giant data breeches caused by incorrectly configured S3 buckets is enormous.
But... every year or so I find myself wanting to create an S3 bucket with public read access to I can serve files out of it. And every time I need to do that I find something has changed and my old recipe doesn't work any more and I have to figure it out again from scratch!
I'm still not sure I know how to do it if I need to again.
I'm not sure if that's changed recently, I've stopped using it.
It's so simple for storing and serving a static website.
Are there good and cheap alternatives?
- signed url's in case you want a session base files download
- default public files, for e.g. a static site.
You can also map a domain (sub-domain) to Cloudfront with a CNAME record and serve the files via your own domain.
Cloudfront distributions are also CDN based. This way you serve files local to the users location, thus increasing the speed of your site.
For lower to mid range traffic, cloudfront with s3 is cheaper as the network cost of cloudfront is cheaper. But for large network traffic, cloudfront cost can balloon very fast. But in those scenarios S3 costs are prohibitive too!
Yeah, what month?
Even if you have a terrible and permissive bucket policy or ACLs (legacy but still around) configured for the S3 bucket, if you have Block Public Access turned on - it won't matter. It still won't allow public access to the objects within.
If you turn it off but you have a well scoped and ironclad bucket policy - you're still good! The bucket policy will dictate who, if anyone, has access. Of course, you have to make sure nobody inadvertantly modifies that bucket policy over time, or adds an IAM role with access, or modifies the trust policy for an existing IAM role that has access, and so on.
TGW is... twice as expensive as vpc peering?
But unlike peering TGW traffic flows through an additional compute layer so it has additional cost.
Everything you know is wrong.
Weird Al. https://www.youtube.com/watch?v=W8tRDv9fZ_c
Firesign Theatre. https://www.youtube.com/watch?v=dAcHfymgh4Y
Wouldn't this always depend on the length of the queue to access the robotic tape library? Once your tape is loaded it should move really quickly:
https://www.ibm.com/docs/en/ts4500-tape-library?topic=perfor...
Your assumption holds if they still use tape. But this paragraph hints at it not being tape anymore. The eternal battle between tape versus drive backup takes another turn.
For storage especially we now build enough redundancy into systems that we don't have to jump on every fault. That reduces the chance of human error when trying to address it, and pushing the hardware harder during recovery (resilvering, catching up in a distributed concensus system, etc).
When the entire box gets taken out of the rack due to hitting max faults, then you can piece out the machine and recycle parts that are still good.
You could in theory ship them all off to the backend of nowhere, but it seems that Glacier is all the places where AWS data centers are, so it's not that. But Glacier being durable storage, with a low expectation of data out versus data in, they could and probably are cutting the aggregate bandwidth to the bone.
How good do your power backups have to be to power a pure Glacier server room? Can you use much cheaper in-rack switches? Can you use old in-rack switches from the m5i era?
Also most of the use cases they mention involve linear reads, which has its own recipe book for optimization. Including caching just enough of each file on fast media to hide the slow lookup time for the rest of the stream.
Little's Law would absolutely kill you in any other context but we are linear write, orders of magnitude fewer reads here. You have hardware sitting around waiting for a request. "Orders of magnitude" is the space where interesting solutions can live.
Is tape even cost competitive anymore? The market would be tiny.
As of when? According to internal support, this is still required as of 1.5 years ago.
Even worse, if you run self hosted NAT instance(s) don't use a EIP attached to them. Just use a auto-assigned public IP (no EIP).
NAT instance with EIP
- AWS routes it through the public AWS network infrastructure (hairpinning).
- You get charged $0.01/GB regional data transfer, even if in the same AZ.
NAT instance with auto-assigned public IP (no EIP)
- Traffic routes through the NAT instance’s private IP, not its public IP.
- No regional data transfer fee — because all traffic stays within the private VPC network.
- auto-assigned public IP may change if the instance is shutdown or re-created so have automations to handle that. Though you should be using the network interface ID reference in your VPC routing tables.
My understanding is that transfer gets charged on both sides as well. So if you own both sides you'll pay $0.02/GB.
Does anyone have experience running Spot in 2025? If you were to start over, would you keep using Spot?
- I observe with pricing that Spot is cheaper
- I am running on three different architectures, which should limit Spot unavailability
- I've been running about 50 Spot EC2 instances for a month without issue. I'm debating turning it on for many more instances
1. Spot with autoscaling to adjust to demand and a savings plan that covers the ~75th percentile scale
2. On-demand with RIs (RIs will definitely die some day)
3. On-demand with savings-plans (More flexible but more expensive than RIs)
3. Spot
4. On-demand
I definitely recommend spot instances. If you're greenfielding a new service and you're not tied to AWS, some other providers have hilariously cheap spot markets - see http://spot.rackspace.com/. If you're using AWS, definitely auto-scaling spot with savings plans are the way to go. If you're using Kubernetes, the AWS Karpenter project (https://karpenter.sh/) has mechanisms for determining the cheapest spot price among a set of requirements.
Overall tho, in my experience, ec2 is always pretty far down the list of AWS costs. S3, RDS, Redshift, etc wind up being a bigger bill in almost all past-early-stage startups.
There's a sweet spot somewhere in between raw VPSes and insanely detailed least-privilege serverless setups that I'm trying to revert to. Fargate isn't unmanageable as a candidate, not sure it's The One yet but I'm going to try moving more workloads to it to find out.
Ultimately AWS doesn’t have the right leadership or talent to be good at GenAI, but they do (or at least used to) have decent core engineers. I’d like to see them get back to basics and focus there. Right now leadership seems panicked about GenAI and is just throwing random stuff at the wall desperately trying to get something to stick. Thats really annoying to customers.
Hasn't it been this way for many years?
>Spot instances used to be much more of a bidding war / marketplace.
Yeah because there's no bidding any more at all, which is great because you don't get those super high spikes as availability drops and only the ones who bid super high to ensure they wouldn't be priced out are able to get them.
>You don’t have to randomize the first part of your object keys to ensure they get spread around and avoid hotspots.
This one was a nightmare and it took ages to convince some of my more pig headed coworkers in the past that they didn't need to do it any more. The funniest part is that they were storing their data as millions and millions of 10-100kb files, so the S3 backend scaling wasn't the thing bottlenecking performance anyway!
>Originally Lambda had a 5 minute timeout and didn’t support container images. Now you can run them for up to 15 minutes, use Docker images, use shared storage with EFS, give them up to 10GB of RAM (for which CPU scales accordingly and invisibly), and give /tmp up to 10GB of storage instead of just half a gig.
This was/is killer. It used to be such a pain to have to manage pyarrow's package size if I wanted a Python Lambda function that used it. One thing I'll add that took me an embarrassingly long time to realize is that your Python global scope is actually persisted, not just the /tmp directory.
Want to set up MFA ... login required to request device.
Yes, I know, they warned us far ahead of time. But not being able to request one of their MFA devices without a login is ... sucky.
I had a theory (based on no evidence I'm aware of except knowing how Amazon operates) that the original Glacier service operated out of an Amazon fulfillment center somewhere. When you put it a request for your data, a picker would go to a shelf, pick up some removable media, take it back, and slot it into a drive in a rack.
This, BTW, is how tape backups on timesharing machines used to work once upon a time. You'd put in a request for a tape and the operator in the machine room would have to go get it from a shelf and mount it on the tape drive.
cldcntrl•5h ago
Not strictly true.
rthnbgrredf•5h ago
cldcntrl•5h ago
hnlmorg•4h ago
If key prefixes don’t matter much any more, then it’s a very recent change that I’ve missed.
cldcntrl•4h ago
williamdclt•4h ago
hnlmorg•4h ago
But I don’t know what conversations did or did not happen behind the scenes.
rthnbgrredf•4h ago
time0ut•3h ago
S3 will automatically do this over time now, but I think there are/were edge cases still. I definitely hit one and experienced throttling at peak load until we made the change.
hnlmorg•3h ago
QuinnyPig•53m ago