The premise
IAM drifts. Every IAM posture at any non-trivial company is a partial record of the people who needed things in a hurry. The work of IAM hardening is not exotic; it is the work of saying no to the convenient past and writing down what should be true going forward.
This manual takes you through seven passes. Each pass is a concrete set of actions. Each ends with a verification step so you know the pass is done.
Pass 1: Inventory
You cannot fix what you have not listed.
AWS
- Export the full IAM inventory per account:
aws iam list-users,list-roles,list-groups,list-policies. Script it across all accounts in the org. - Enumerate service-linked roles separately; they follow AWS lifecycle, not yours.
- For every role, enumerate its trust policy (who can assume) and its permission policy (what it can do).
- For every access key, note the age and the last-used date.
aws iam get-access-key-last-used.
GCP
gcloud projects get-iam-policyper project, per folder, per org.- Enumerate service accounts with
gcloud iam service-accounts list, their keys withgcloud iam service-accounts keys list. - Note which service accounts can impersonate which other service accounts.
Verification: You have a flat file or spreadsheet with every identity, every role, every policy, every key, every last-used timestamp. If you cannot produce this, pass 1 is not done.
Pass 2: Kill standing admin
Standing admin access is the single largest source of compounding IAM risk.
- Human admin: remove it. Grant via just-in-time tooling (AWS IAM Identity Center with permission sets that require manual elevation, GCP short-lived access-token workflow via
gcloud auth application-default login+ short-lived credentials, or a purpose-built tool like StrongDM, Teleport, or Aembit). - Service admin: the AWS
administratorrole assumed by a CI job or automation should be replaced with a tightly scoped role that does only what it actually does. - Root / organization owners: restrict to two named people, with hardware keys, recovery process documented, not used for any operational task.
Verification: aws iam list-attached-user-policies shows no human with AdministratorAccess attached. GCP: gcloud projects get-iam-policy shows no user members on roles roles/owner or roles/editor.
Pass 3: Least-privilege service accounts
Every non-human identity gets the minimum. The test is: if I remove this permission, does anything break? If nothing breaks, the permission was unnecessary.
AWS: IAM Access Analyzer
Turn it on per account. Use the “Unused access” finder. Scope findings down based on actual usage. Use CloudTrail to verify a permission has not been used in 90 days before removing.
# Enable IAM Access Analyzer org-wide
resource "aws_accessanalyzer_analyzer" "org" {
analyzer_name = "nexcur-org-analyzer"
type = "ORGANIZATION"
tags = {
owner = "platform"
}
}
GCP: Policy Analyzer
Use gcloud policy-troubleshoot iam and gcloud asset search-all-iam-policies to find overbroad grants.
The tightening pattern
For each service account that has a broad primitive role like roles/editor:
- Identify the predefined role that is closest to what the service account actually does (
roles/storage.objectAdmininstead ofroles/editor). - If no predefined role fits, write a custom role listing the exact permissions used.
- Deploy, monitor for 24 hours, remove the broad role.
Verification: No service account has roles/owner, roles/editor, or equivalent AWS primitives outside a documented exception list.
Pass 4: Secrets off env files
If any secret lives in a .env file, a repo config, or a shared document, move it.
- AWS: Secrets Manager for dynamic secrets, SSM Parameter Store for config-shaped values.
- GCP: Secret Manager.
- Developer / laptop secrets: 1Password Teams, with a documented rotation policy.
- CI/CD secrets: provider-specific vaults (GitHub Actions secrets, GitLab CI variables), ideally backed by the cloud secrets manager via OIDC.
Verification: grep -rE "(AWS|GCP|STRIPE|OPENAI|ANTHROPIC)_[A-Z_]*KEY=[A-Za-z0-9]" . in every repo returns only placeholder examples. git log scans via trufflehog or gitleaks are clean or have a triaged ignore list.
Pass 5: Access key retirement
Access keys are a liability. Replace them with federated access where possible.
AWS: OIDC federation
For CI/CD, use GitHub Actions OIDC (or equivalent for your provider) so every job assumes a short-lived role instead of holding a long-lived key.
# Trust policy for GitHub Actions OIDC role
data "aws_iam_policy_document" "gha_trust" {
statement {
effect = "Allow"
actions = ["sts:AssumeRoleWithWebIdentity"]
principals {
type = "Federated"
identifiers = [aws_iam_openid_connect_provider.github.arn]
}
condition {
test = "StringLike"
variable = "token.actions.githubusercontent.com:sub"
values = ["repo:nexcurai/*:ref:refs/heads/main"]
}
}
}
For any remaining keys, enforce automatic rotation on a 90-day cadence. For human operator keys, prefer short-lived tokens via SSO rather than long-lived keys at all.
GCP: Workload Identity Federation
Use workload identity federation for any external CI/CD or third-party service reaching into GCP. Delete long-lived service account keys as soon as alternatives exist.
Verification: Count of IAM access keys with age over 90 days equals zero. Count of service account keys over 90 days equals zero.
Pass 6: Trust-path analysis
This is the pass that catches the clever, non-obvious privilege escalations.
A trust path is a chain: identity A can assume role B, role B has permission to update role C, role C has AdministratorAccess. Identity A effectively has admin. Nobody would have granted it that intentionally.
AWS: the PrivEsc review
- For every role, enumerate which identities and roles can
sts:AssumeRoleinto it. - For every role, enumerate which actions in its permission policy could lead to privilege escalation:
iam:PassRole,iam:UpdateAssumeRolePolicy,iam:AttachRolePolicy,lambda:UpdateFunctionConfiguration,ec2:RunInstancespaired withiam:PassRole,cloudformation:CreateStackpaired withiam:PassRole, and the full list Rhino Security Labs documented. - Cross the two. Any identity that can traverse a path to admin without explicit grant is a finding.
Automation helpers: PMapper, Cloudsplaining, AWS IAM Access Analyzer external access findings.
GCP: the impersonation graph
Service accounts that can be impersonated by other identities form a directed graph. Walk it. If a low-trust identity can reach a high-trust identity through a chain of impersonations, document it or break the chain.
Verification: Trust-path map exists. Any paths to admin-equivalent permissions are explicitly documented with rationale, or broken.
Pass 7: Drift detection
IAM will drift again. The question is how fast you notice.
- Configuration drift: AWS Config or GCP Asset Inventory with rules alerting on changes to IAM roles, policy attachments, trust policies, and service account creation.
- Usage drift: CloudTrail (AWS) or Cloud Audit Logs (GCP) flowing to a searchable sink. Alert on unusual patterns: new identity created outside Terraform, role attached to a sensitive resource by a human, service account key created.
- Terraform drift: daily
terraform planin CI against production; any change that is not in code shows up as a drift finding.
Verification: Drift detection alerts reach a human inside 24 hours. Drift incidents over the last 90 days have been triaged and either reverted or formalized in code.
What this looks like in production
- Zero standing human admin.
- Zero long-lived access keys over 90 days old.
- Every non-human identity scoped to a custom role or an appropriate predefined role; no broad primitives outside a documented exception list.
- Secrets in secrets manager, never in code or env files.
- Trust-path map maintained; privilege escalations either broken or documented.
- Drift detection live, alerting within 24 hours.
Common mistakes
- Tightening without measuring usage first. You will break CI. Use CloudTrail / Audit Logs to baseline before removing permissions.
- Adding
iam:PassRolewithResource: *. This is a common PrivEsc vector. Always scopeiam:PassRoleto the specific roles the caller needs to pass. - Trust policies with no condition. A trust policy that allows any account to assume is a critical finding. Always scope with
aws:SourceAccount,aws:PrincipalOrgID, external ID, or the OIDC sub claim. - Using primitive GCP roles (
owner,editor,viewer) on service accounts. Always replace with predefined or custom roles. - Not automating rotation. Manual rotation does not happen. Automate or accept the compound risk.
Related
- Series A security readiness - the framework this manual lives inside.
- Sample: IAM hardening checklist - the checklist in interactive form.
- Cybersecurity service line