Curious what any pitfalls of this approach might be?
And any suggestions as to how to go about it?
Simply adding all cloud based company ips to a blacklist would be a good start. Do any half-decent ones already exist before I create my own?
Google:
for line in $(dig +short txt _cloud-netblocks.googleusercontent.com | tr " " "\n" | grep include | cut -f 2 -d :)
do
dig +short txt "${line}"
done | tr " " "\n" | grep ip4 | cut -f 2 -d : | sort -n | uniq | xz -9ecv > ./_GOOGLE.netset.xz
Amazon: curl --url "https://ip-ranges.amazonaws.com/ip-ranges.json" -o ./aws.json
grep ip_prefix ./aws.json | awk -F "\"" '{print $4}' | sort -n | uniq | xz -9ecv > ./_AWS.netset.xz
I don't have one for Azure handy at this time. Skip the xz compression step if you just want plain text. If some day they remove these services, you can also look up all the CIDR blocks using sites like this [1] Put in a name or IP to start with, then click on the AS number link, then click on prefixes v4 and prefixes v6.[1] - https://bgp.he.net/
Maxmind has a free database as part of GeoLite2 [1], but you can also put together a database from IP assignments or BGP data or ?
Most larger clouds publish their IPs as well.
Pitfalls are that you do need to update your database frequently, and it is difficult to validate changes. You're likely to get some real people who are using a VPN or something in cloud ranges, and some abuse/automation that is using residential ISPs, so it's not perfect, but it may help somewhat.
[1] https://dev.maxmind.com/geoip/geolite2-free-geolocation-data
I’d recommend tackling the issue from another angle if possible.