The 46-Minute Problem
The IAMTrail detection engine fetches ~1,500 AWS managed IAM policies every run. The original approach was pure bash:
Looks fine, right? Except each iteration spawns a full AWS CLI process. That means a fresh Python runtime, boto3 import, credential resolution, one single HTTP call, then exit. Times 1,500.
Even with -P 16 parallelism on a 0.5 vCPU / 1 GiB Fargate Spot task, this took 46 minutes.
The Fix: One Process to Rule Them All
The replacement is embarrassingly simple - a ~60 line Python script using boto3 with ThreadPoolExecutor:
One process, one boto3 session, one connection pool, 32 lightweight threads. No more spawning 1,500 Python runtimes.
The Tricky Part: Format Compatibility
The dangerous part of this migration is not the code - it’s the output. IAMTrail stores policies as JSON files in a git repository. If the Python serialization differs by even one byte from the bash version, git sees it as a “change” and we get a massive false-positive commit affecting all 1,500+ policies.
I actually hit this once already during the debugging session. The original bash pipeline used jq -S (sort keys alphabetically), but the existing files in the repo had the natural API key order. That single -S flag turned every policy file into a “change”.
For the Python migration, I verified byte-level compatibility:
Tested against all 1,547 files in the repo: 1,527 byte-identical, 20 outliers that were pre-existing legacy artifacts from 2018.
Results
| Metric | Before (bash) | After (Python) |
|---|---|---|
| Duration | 46 min | 2 min 30s |
| Processes spawned | 1,500+ | 1 |
| Fargate config | 0.5 vCPU / 1 GiB | 0.25 vCPU / 0.5 GiB |
| Format diff | - | 0 (byte-identical) |
With a 2.5 minute run, I could now justify switching from every-4-hours to hourly scans. Policy changes detected faster, same infrastructure, smaller task.
The Cost “Savings”
Now let’s talk about the part that makes me smile.
The annual Fargate Spot cost went from $10.18/year to $1.11/year (at hourly schedule).
Savings: $9.07/year.
To achieve this, I spent an evening pair-programming with Claude in Cursor - burning through what I can only assume is a very respectable amount of LLM tokens. Between the debugging session, the format analysis, the benchmarking, the false-positive investigation, the Terraform changes, the Docker rebuild, the failed deploys - I am fairly confident the API bill for this session alone exceeds the lifetime Fargate savings of this optimization.
Peak cloud economics.
Takeaways
- Spawning CLI processes in a loop is expensive - not in dollars, but in time. A single boto3 session with threads beats 1,500 CLI invocations by 20x.
- Byte-level format testing matters - when your output is tracked by git, a single extra space creates thousands of false-positive diffs.
- FARGATE_SPOT is absurdly cheap - the real optimization currency is time, not money.
The full change is on GitHub. IAMTrail now scans every hour at iamtrail.com.