Open IP enrichment knowledge layer for CIDR, ASN, cloud, crawler, Tor, and VPN-adjacent network intelligence.
This repository is data-first: the main output is a set of machine-readable files
that can be pulled directly with curl, GitHub Actions, SIEM pipelines, WAF
tooling, anti-fraud systems, and internal enrichment jobs.
Most public IP repositories publish one narrow list: cloud IPs, Tor IPs, crawler IPs, or ASN mappings. IP Knowledge Layer combines multiple public and derived signals into one normalized enrichment layer.
The value is context:
CIDR or ASN -> layer -> provider -> service -> tags -> confidence -> source
Instead of only knowing that a prefix exists, consumers can understand whether it belongs to cloud hosting, CDN edge, GitHub infrastructure, AI crawlers, Tor, or a VPN-adjacent ASN signal.
| Metric | Value |
|---|---|
| Updated | 2026-05-25T05:08:54Z |
| Release | data-20260525-050854Z |
| Records | 111,502 |
| Prefix records | 111,502 |
| ASN signals | 0 |
| Sources | 11 |
| Collector errors | 1 |
| Layer | Records |
|---|---|
hosting-cloud |
98,191 |
anonymity |
11,489 |
crawler-bot |
1,822 |
| Top Provider | Records |
|---|---|
| Azure | 73,422 |
| AWS | 15,867 |
| Tor | 11,489 |
| GitHub | 6,703 |
| Oracle Cloud | 1,078 |
Replace main with another branch if needed.
BASE="https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current"
curl -fsSL "$BASE/summary.json"
curl -fsSL "$BASE/source-index.json"
curl -fsSL "$BASE/ip-knowledge.jsonl"
curl -fsSL "$BASE/ip-knowledge.csv"
curl -fsSL "$BASE/cloud-prefixes.csv"
curl -fsSL "$BASE/asn-signals.csv"
curl -fsSL "$BASE/cidr-tags.txt"| Need | Use this file | Why |
|---|---|---|
| I want the full knowledge layer | ip-knowledge.jsonl |
Best for pipelines, jq, streaming, and preserving nested fields |
| I want Excel/BI/SIEM-friendly data | ip-knowledge.csv |
Same broad dataset in tabular form |
| I only need cloud/CDN/developer platform ranges | cloud-prefixes.csv |
Smaller and focused on AWS, Azure, GCP, Cloudflare, Fastly, GitHub, Oracle |
| I need quick CIDR-to-tags lookup | cidr-tags.txt |
Lightweight text file: one CIDR plus comma-separated tags per line |
| I care about VPN-heavy/provider ASN signals | asn-signals.csv |
ASN-level aggregate evidence, without raw VPN IP publication |
| I need to check source health and counts | summary.json |
Current run status, layer counts, provider/source aggregates |
| I need source provenance | source-index.json |
Source URLs, source types, and record counts |
For most users:
Start with cloud-prefixes.csv if you only need cloud/datacenter/CDN ranges.
Start with ip-knowledge.jsonl if you want the full enrichment layer.
Start with cidr-tags.txt if you want the simplest possible feed.
| File | Purpose | Approx size |
|---|---|---|
data/current/summary.json |
Current build summary, counts, layer/provider/source aggregates | 8 KB |
data/current/source-index.json |
Source metadata, URLs, source types, record counts | 3 KB |
data/current/ip-knowledge.jsonl |
Full normalized knowledge layer, one JSON record per line | 49 MB |
data/current/ip-knowledge.csv |
Full normalized knowledge layer as CSV | 25 MB |
data/current/cloud-prefixes.csv |
Official cloud/CDN/developer-platform prefixes only | 22 MB |
data/current/asn-signals.csv |
ASN-level VPN-adjacent aggregate signals | 399 KB |
data/current/cidr-tags.txt |
Simple CIDR tags text file for lightweight consumers |
4.7 MB |
data/history/summary.csv |
Build history | small |
data/snapshots/*.json |
Compact summary snapshots, not full data copies | small |
Official cloud, CDN, edge, and developer-platform IP ranges.
Current providers:
- AWS
- Azure
- Google Cloud
- Google public infrastructure
- Cloudflare
- Fastly
- GitHub
- Oracle Cloud
Crawler, AI bot, monitoring probe, scanner, SEO bot, and social preview ranges derived from CrawlerScope.
Tor relay host routes derived from Tor-Radar.
ASN-level VPN-adjacent aggregate signals from provider analysis. This layer does not publish raw VPN IP lists. It only publishes aggregate provider-to-ASN evidence.
Official/public sources:
- AWS IP ranges:
https://ip-ranges.amazonaws.com/ip-ranges.json - Azure Service Tags:
https://www.microsoft.com/en-us/download/details.aspx?id=56519 - Google Cloud ranges:
https://www.gstatic.com/ipranges/cloud.json - Google public ranges:
https://www.gstatic.com/ipranges/goog.json - Cloudflare ranges:
https://www.cloudflare.com/ips-v4,https://www.cloudflare.com/ips-v6 - Fastly public IP list:
https://api.fastly.com/public-ip-list - GitHub Meta API:
https://api.github.com/meta - Oracle Cloud ranges:
https://docs.oracle.com/en-us/iaas/tools/public_ip_ranges.json
Derived project sources:
- CrawlerScope: crawler, AI bot, monitoring, scanner, and SEO bot ranges
- Tor-Radar: Tor relay and exit IPs
- VPN provider ASN summary: aggregate ASN signals, no raw VPN IP feed
Example hosting-cloud JSONL record:
{"prefix":"104.16.0.0/13","layer":"hosting-cloud","provider":"Cloudflare","service":"edge","tags":["cdn","edge","proxy"],"confidence":0.99,"source_id":"cloudflare-v4"}Example crawler-bot JSONL record:
{"prefix":"66.249.64.0/19","layer":"crawler-bot","provider":"Google","service":"Google common crawlers","tags":["bot","crawler","search"],"confidence":0.95,"source_id":"crawler-scope"}Example anonymity JSONL record:
{"prefix":"185.220.101.1/32","layer":"anonymity","provider":"Tor","service":"exit","tags":["anonymity-network","tor","tor-exit"],"confidence":0.98,"source_id":"tor-radar"}Example asn-signal JSONL record:
{"layer":"asn-signal","provider":"NordVPN","asn":9009,"asn_name":"M247","tags":["asn-signal","vpn-adjacent"],"confidence":0.7}Get current build stats:
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/summary.json | jq .Download cloud prefixes:
curl -fsSLO https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cloud-prefixes.csvExtract Cloudflare rows:
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cloud-prefixes.csv \
| awk -F, '$3 == "Cloudflare" { print }'Extract Tor exits from JSONL:
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/ip-knowledge.jsonl \
| jq -r 'select(.layer=="anonymity" and .service=="exit") | .prefix'Extract AI crawler prefixes:
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/ip-knowledge.jsonl \
| jq -r 'select(.layer=="crawler-bot" and (.tags | index("ai-crawler"))) | .prefix'Use as a lightweight block/allow enrichment feed:
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cidr-tags.txt \
| grep 'cloud'Find all ASN signals for a provider:
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/asn-signals.csv \
| awk -F, '$3 == "NordVPN" { print }'- IP enrichment for fraud/risk systems
- WAF and SIEM context
- Cloud/datacenter detection
- CDN/edge infrastructure classification
- AI crawler and bot visibility
- Tor relay context
- ASN-level VPN-adjacent signals
- Source provenance for explainable decisions
- Building internal allowlists, denylists, and review queues
This project is not a malware or abuse blacklist. It provides operational network context with source provenance and confidence.
python3 scripts/update.pyThe collector prefers local sibling project outputs when present:
../crawler-scope/data/current/crawlers.json
../tor-radar/data/current/network.json
../release/analysis/data/provider_asn.csv
When those files are not present, it pulls the public raw GitHub project outputs where possible.
The workflow runs every 6 hours and commits updated files under data/.
.github/workflows/ip-knowledge-layer.yml
The workflow intentionally stores full data only in data/current/*. Historical
snapshots are compact summaries to avoid repository bloat.
Planned additions inspired by projects such as ipverse:
asn-knowledge.csv: ASN-level rollup with tags, cloud presence, Tor presence, crawler presence, VPN-adjacent evidence, and confidence.asn-prefixes.csv.gz: compressed bulk ASN-to-prefix layer, kept separate fromip-knowledge.jsonlto avoid making the main file too large.provider-index.json: normalized provider metadata and aliases.overlap-summary.csv: overlap between cloud/CDN, crawler, Tor, and VPN-adjacent ASN signals.diff/current.json: added/removed prefix summary between runs.
The intent is not to clone ipverse. The goal is to build a higher-level
knowledge layer with source provenance, tags, and confidence.
- The project avoids full IPv4 expansion.
- The project avoids mass RDAP/whois lookups in GitHub Actions.
vpn-adjacentsignals are aggregate ASN-level indicators, not a raw VPN IP dump.- Confidence is source-level confidence, not a claim that traffic from a network is malicious.
- Some official providers publish overlapping service rows for the same prefix. Those rows are preserved because service labels carry useful context.
CC0-1.0. See LICENSE.