(I was heavily using AWS since inception, way before IAM was invented).
My speech fell on deaf ears, because IAM is clear as sky to its inventors and they fail to see what's the problem in it and corporate customers are happy to employ one or two people solely dedicated to learning and managing IAM.
Anyways. There is no way around IAM. And just like with systemd you better embrace it and learn it and it turns not that terrible.
Anecdotally, at a startup I worked for, our devops and security engineers pulled me into a call once because they needed my help finding where the root credentials were being used in our code. I logged into our dev, staging, and production servers, but they were nowhere to be seen in any place you'd expect AWS creds (like ~/.aws).
Then it hit me. I pulled up our git repo and pulled up a function that handled file upload to s3 for something incredibly unimportant. Right there, in plain text, checked into git. The git blame read something like "4 years ago".
Creds weren't even needed since the AWS SDK does whatever magic to know the creds based on the EC2 instances role. I immediately put up PR just removing the creds as the empty constructor for our SDK used the EC2 role (this pattern was used in multiple other parts of the code base).
The CEO (the author of the code) didn't want to merge it in case it stopped working, despite me demoing in our staging environment and the feature using this code was only used like twice a week.
Thankfully, we convinced him, got it removed from the codebase, and either cycled or deleted the root creds, I don't recall how that worked at the moment.
I wasn't familiar with root creds at the time either, but damn is it scary to think that anyone of the engineers had access to root creds for a company of the size that we were.
And the CEO wrote it like that originally because he didn't want to mess with IAM. I can see why he did it, but man was that a bad idea.
This is a very common case. It's okay in the very early development (rarely we start projects with a definition of IAM policies), but then it gets shoved into backlog forever.
I know it’s a lot of extra maintenance effort and probably more difficult but AWS could go a long way towards making this more sane and transparent by giving gitops examples alongside their CLI ones. Even someone just copy pasting terraform or cloud formation is going to be way more visible than AWS CLI
Of course, committing credentials into a repo is absolutely never ever ok.
People being able to see and interact with certain services but not everything related to those services. Same with services since it’s the same system.
To me the problem isn’t IAM itself but the nested API calls. Which also itself isn’t a problem but we need a better UI to diagnose when something goes wrong.
Cloudtrail isn’t really suitable for this task. A “simple” UI of looking at a role and seeing a “failed calls” would go a long way.
What I don't understand is how the AWS log inspection tools are still as bad as they are. Even if it's just to prepare public-facing material, AWS clearly dogfoods them a little bit, so surely there would be glory and accolades to be won by implementing search that was half-assed (instead of quarter-assed)? Or is the AWS culture so broken that it net punishes core improvements? Come to think of it, that would explain a lot.
But there are also times that it is inconsistent at best especially when trying to look at some nested permission problem. More than a few times I have had to get on with AWS support because the actual error just was not in Cloudtrail anywhere. Or it is related to some service that doesnt log to Cloudtrail like s3 access.
Which kinda more my point was, it isnt IAM itself that is the problem.
It’d be nice if AWS could fix the madness and standardize on one List/Update/Create/Delete call per resource, since most services are already like that.
That would go a long way into solving the problem
I disagree that IAM specifically is a great product. It's great if you learn it, but learning permissions system never adds anything to the bottom line. It's an evergrowing technical debt at best. People interested in learning permissions system are typically people interested in developing one. Users are never happy with permissions systems because more often than not it's a nuisance.
In my view IAM is an incredibly overengineered list of who, what, which, suitable for corporate granularity but then too complex to be used by a startup or a project.
For the record, I’ve never been in a development team of more than 4 people, and my teams have always been responsible for their own ops. No ‘corporate dev’ here.
True.
> That doesn’t mean that it’s important,
Not necessary true.
> or that the complexity is avoidable.
Untrue. There is no need for IAM for the vast majority of 4 people teams and there could be a simpler solution for those, like it was before IAM. And it can actually be built on top of IAM, just spare people from diving deep into it.
You are diving into another topic: "fine-grained permissions"
In my opinion, fine-grained permissions is a very good idea for application-attached role: your lambda can only do "this", your ec2 instance can only do "that". For humans, they are a mistake and shall be avoided.
Like let me start the development first. I'll get to security later. I came here for object storage, don't force me to learn an incredibly complex auxiliary service.
And that's why the world is full of crappy, insecure software. Security is ever an afterthought.
Better for small teams? Sure: a login/password system with maybe optional restrictions on services and maybe "readonly" access and maybe with disabled aws console.
Basically, something that was in AWS before the IAM madness came in.
Create IAM user with access/secret keys are easy
Assign the built-in ReadOnly policy, easy too, or any other built-in "generic" policy
Of course, it is possible with IAM because IAM is a top-tier ACL system.
The point: I don't want to learn it.
“Everything has root” is fine when you are starting a new thing - what’s needed is tooling to start there and grow into fine grained permissions when you add a second thing.
And this is also why you need a PhD to fully comprehend it
To some extend. It’s perfectly possible to make a role that allows S3 access to any action on everything, but still restricts the service from operating on any of the other 400 AWS services (unlike root credentials)
I also have issues with infrastructure being declared imperatively. I’ve seen countless bugs and incidents caused by CDK surprises.
You get precisely what you ask for with a declarative language.
If you’re trying to do fine grained permissions you have to know every resource that might end up in the request context and then have an entry in a Resource section of a policy for that Resource type. For instance just creating an EC2 (RunInstances) might involve Subnet, Security Group, EBS, image, volume, snapshot, and a few other resource types.
This gets even more complicated when you’re using Condition operators because the conditions are checked against every resource in the request context but each resource type has different valid conditions (eg some resources have tags and some don’t) so you have to split up the policy into numerous statements specific to each resource type.
All of that would be fine if there was any way to actually reliably get the context of a request so you can see exactly what was denied. Sometimes you can get the context via the encoded authorization message in a failed API call but this is usually truncated in CloudTrail or if you’re using IaC tools like Cloudformation.
They’ve done a lot in the past few years to explain why a particular call was denied but none of it beats seeing the context and seeing exactly which part of the policy did or didn’t apply.
For enterprises that genuinely want the finer grained control, let them express that they want to opt out of that implicitness in the policy document.
That spreads out the actual end resulting policy into a bunch of disparate corners that's hard to unravel.
There's tools to see "can this specific user/role/service get to this thing". But because of the above, there's no one place to see "who can/cannot get to this thing".
Which are interconnected, right? If yes, then that's what the word "complicated" means: made of multiple, non-trivially interacting parts.
You can have a simple policy on a resource that just allows something, or you can make a complicated bi-directional set of policies that refer to specific principals, resources and actions to constrain it more. That will never go away if that is what you need to build. And it is still optional because if you don't want to build that, then you just end up with your single, not-inter-connected policy.
Things like principals, resources, trusts and control policies are all distinct systems with different goals and purposes. Maybe I'm missing some different AWS IAM policy that has that bolt-on flavour?
I can get "can this specific role/user access this specific bucket", just not the other direction.
It's not because the language isn't consistent, it's because the language isn't backed by a single cohesive RBAC type setup.
Microsoft tries that with Entra and it's a pretty miserable experience. Google has CEL for the conditionals which is neat, but it gets rather close to software engineering rather than IAM configuration at that point. The somewhat hierarchical nature they use is also not as great as it seems on first glance.
You have no way to see "who can/cannot get to this thing". Yet who could argue that user/password are "complicated" ?
NB: using IAM access/secret key has indeed the same issue
I found GCP's access control to be much simpler but more limited than AWS. They're adding complexity though with conditional access. In GCP the "who" in most case is an email (Google Account, Google Groups, service account)
And because you can limit yourself, you can effectively simplify your way of consuming IAM
"He who can do more can do less"
The ability to measure what permissions you need and record them automatically would improve the onboarding experience immeasurably.
Problem is, that kind of things are the backend of the backend. An accessory of an accessory. No business cares about it. Ever. And yes, running wildcard policies can hit you hard if not attended properly; my point is I don't want to learn a complex ACL system made for enterprise for my small startup that is actually gonna be fine with wildcard policy basically forever.
This seems almost bait.
The art here is knowing when to stop building, not what to build.
A lot of services will create policies for you, for example you can go to RDS and click setup connection to lambda, or ec2 and it will create the policy.
A lot of things will give you the policy to copy and paste.
Another UI to further abstract IAM would likely just complicate things, and then make it harder later if/when you need to leave that abstraction.
I was testing out sagemaker studio. I just did the quick setup wizard and the default managed execution role was insanely permissive. I believe read/write to all of the accounts S3 buckets and broad List* for account resources. There are multiple parts of the documentation that also recommended you use this role. It seemed especially wild for a product with so many ways to access. we have good account hygiene but still
I find this is often the case.
In any case, IAM and its essential features are not some optional thing; it's actually the prerequisite for AWS existing in its current form. IAM is the most universal AWS service - every other AWS service must use it. It's a loosely coupled symbolic policy computation engine for federated identity and service permissions. The "federated" part is doing A LOT of heavy lifting here: without federation, it's impossible to decouple teams and achieve the organizational velocity that AWS has.
Other cloud providers have ended up re-inventing AWS IAM, sometimes poorly. Although I have to give GCP credit for recently greatly improving their IAM console and permissions error usability.
I have EC2 Mac instances in an autoscaling group that are running untrusted code. I've "sandboxed" that code by running it as its own user and blocking that user from reaching the IMDS endpoint with a packet filter on the instance so that the untrusted code, in theory, cannot get instance policy credentials. Nonetheless, I'd like to limit the blast radius should the code somehow escape that sandbox. So I want the instance policy as restricted as possible.
One of the things the instances need to do is set tags on themselves. For this, you can use an IAM condition on the ec2 CreateTags action to say "this instance can call CreateTags, but the instance ID has to be the instance itself". This prevents an instance from setting tags on a different instance in the same autoscaling group.
The instances also need to make a few autoscaling API calls, to complete lifecycle actions (CompleteLifecycleAction), and to enable and disable scaling protection (SetInstanceProtection). Here again, I'd like each instance to only be able to make these calls against itself. Unfortunately, there is no condition you can use with the autoscaling API to similarly limit the calls as you can with the ec2 API. The same condition that works with the ec2 API doesn't do anything with the autoscaling API.
Not helping matters, the documentation is very spread about. Just in the use case I describe above, you need to read at least all of these:
https://docs.aws.amazon.com/service-authorization/latest/ref...
https://docs.aws.amazon.com/service-authorization/latest/ref...
https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-au...
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-poli...
https://docs.aws.amazon.com/autoscaling/ec2/userguide/contro...
Condition:
StringLike:
aws:userid: "*:${ec2:InstanceID}"
Documented (sorta) across four(!) pages:https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_p...
https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_p...
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-poli...
https://docs.aws.amazon.com/service-authorization/latest/ref...
More details: https://medium.com/@Bohr/introducing-awsviz-dev-simplifying-...
Welcome btw!
There's also https://www.awsiamdata.com/ which analyzes the AWS IAM data and also contains a changelog. Maybe this helps some people.
You can also query the data from your browser: https://tobilg.com/chat-with-a-duck#heading-explore-aws-iam-...
What are you using to rate severity in the security scan? Is there a full list of checks being performed?
I recommend having both resources in your documentation. If you're referencing another document, link to it. That way there's some context to the results as to why a policy author should change it. Just something to add to the improvement list.
Looks good overall! Thanks for sharing!
I am reluctant to just upload an IAM Policy when that could include bucket names, databases, account names, etc. Things that could expose more information than I would like (both security and product related)
It looks like the "The user is allowed to:" part is the most important one and should be on top. The graph visualization does not make any sense on my random JSON policy I have just pasted.
Very good and straightforward - thanks a lot for providing the GitHub repo as well as I would never tell them to paste policies on a random website.
Update you policy, wait for some period of ykme, hopefully test it.
Doesn't work?
Maybe you didn't wait long ebougb, maybe you're dumb. Who knows!? Not me.
Use multiple AWS accounts, and keep your policies simple. One account for DB, one for the backend servers, etc. Each environment (prod, staging, dev) also gets its own set of accounts.
This way, a misconfigured policy won't give admin access to everything.
For resources like databases, you don't need cross-account access if you're using internal DB authentication systems. For IAM-based DB authentication, you can simply write policies to trust the target accounts.
Occasionally, you'll need to create a cross-account trust (via AssumeRole), but it's not at all that frequent.
My personal wish is for AWS to allow account _names_ instead of ID numbers in policies.
What you should do instead is have one account per environment (as you said).
I'm honestly not sure if thats a great idea, but this might be a possible way to do one account per DB/backend/frontend in a somewhat sane way.
Generally I'd stick to accounts per environment, you'll be worrying about a lot more when you get to FAANG scale.
That said, I still think it's overkill and doesn't bring any real benefit? Sure you're reducing the blast radius of a security breach, but are the overcomplications worth it? Also, you now have to manage multiple accounts with multiple policies and users / roles, won't that extra complexity actually increase the attack surface?
That's how AWS works internally. A team can easily have several hundred accounts: one for each region, and for each env.
You absolutely need tools to manage them, and AWS is not great in this regard. IAM Identity Center is a good first step, but its usability sucks compared to the AWS internal tool (called "Isengard").
> You're going to have to expose your DB to the internet instead of having everything inside a single VPC.
There are several ways to NOT do this. The easiest one is to use IPv6 with your own block (you can get it from ARIN for around $100). Then split it into "public" and "private" subnets, and install a network ACL prohibiting external connections into the private subnet.
To be honest, I'd probably prefer a single account per environment. But managing the IAM for that would be much more work.
Or even without that, there are plenty of footguns. AWS has a good blog post: https://aws.amazon.com/blogs/security/protect-sensitive-data...