IP Knowledge Layer

Open IP enrichment knowledge layer for CIDR, ASN, cloud, crawler, Tor, and VPN-adjacent network intelligence.

This repository is data-first: the main output is a set of machine-readable files that can be pulled directly with curl, GitHub Actions, SIEM pipelines, WAF tooling, anti-fraud systems, and internal enrichment jobs.

Why This Exists

Most public IP repositories publish one narrow list: cloud IPs, Tor IPs, crawler IPs, or ASN mappings. IP Knowledge Layer combines multiple public and derived signals into one normalized enrichment layer.

The value is context:

CIDR or ASN -> layer -> provider -> service -> tags -> confidence -> source

Instead of only knowing that a prefix exists, consumers can understand whether it belongs to cloud hosting, CDN edge, GitHub infrastructure, AI crawlers, Tor, or a VPN-adjacent ASN signal.

Current Release

Metric	Value
Updated	`2026-05-25T05:08:54Z`
Release	data-20260525-050854Z
Records	111,502
Prefix records	111,502
ASN signals	0
Sources	11
Collector errors	1

Layer	Records
`hosting-cloud`	98,191
`anonymity`	11,489
`crawler-bot`	1,822

Top Provider	Records
Azure	73,422
AWS	15,867
Tor	11,489
GitHub	6,703
Oracle Cloud	1,078

Download URLs

Replace main with another branch if needed.

BASE="https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current"

curl -fsSL "$BASE/summary.json"
curl -fsSL "$BASE/source-index.json"
curl -fsSL "$BASE/ip-knowledge.jsonl"
curl -fsSL "$BASE/ip-knowledge.csv"
curl -fsSL "$BASE/cloud-prefixes.csv"
curl -fsSL "$BASE/asn-signals.csv"
curl -fsSL "$BASE/cidr-tags.txt"

Which File Should I Use?

Need	Use this file	Why
I want the full knowledge layer	`ip-knowledge.jsonl`	Best for pipelines, `jq`, streaming, and preserving nested fields
I want Excel/BI/SIEM-friendly data	`ip-knowledge.csv`	Same broad dataset in tabular form
I only need cloud/CDN/developer platform ranges	`cloud-prefixes.csv`	Smaller and focused on AWS, Azure, GCP, Cloudflare, Fastly, GitHub, Oracle
I need quick CIDR-to-tags lookup	`cidr-tags.txt`	Lightweight text file: one CIDR plus comma-separated tags per line
I care about VPN-heavy/provider ASN signals	`asn-signals.csv`	ASN-level aggregate evidence, without raw VPN IP publication
I need to check source health and counts	`summary.json`	Current run status, layer counts, provider/source aggregates
I need source provenance	`source-index.json`	Source URLs, source types, and record counts

For most users:

Start with cloud-prefixes.csv if you only need cloud/datacenter/CDN ranges.
Start with ip-knowledge.jsonl if you want the full enrichment layer.
Start with cidr-tags.txt if you want the simplest possible feed.

Files

File	Purpose	Approx size
`data/current/summary.json`	Current build summary, counts, layer/provider/source aggregates	8 KB
`data/current/source-index.json`	Source metadata, URLs, source types, record counts	3 KB
`data/current/ip-knowledge.jsonl`	Full normalized knowledge layer, one JSON record per line	49 MB
`data/current/ip-knowledge.csv`	Full normalized knowledge layer as CSV	25 MB
`data/current/cloud-prefixes.csv`	Official cloud/CDN/developer-platform prefixes only	22 MB
`data/current/asn-signals.csv`	ASN-level VPN-adjacent aggregate signals	399 KB
`data/current/cidr-tags.txt`	Simple `CIDR tags` text file for lightweight consumers	4.7 MB
`data/history/summary.csv`	Build history	small
`data/snapshots/*.json`	Compact summary snapshots, not full data copies	small

Layers

`hosting-cloud`

Official cloud, CDN, edge, and developer-platform IP ranges.

Current providers:

AWS
Azure
Google Cloud
Google public infrastructure
Cloudflare
Fastly
GitHub
Oracle Cloud

`crawler-bot`

Crawler, AI bot, monitoring probe, scanner, SEO bot, and social preview ranges derived from CrawlerScope.

`anonymity`

Tor relay host routes derived from Tor-Radar.

`asn-signal`

ASN-level VPN-adjacent aggregate signals from provider analysis. This layer does not publish raw VPN IP lists. It only publishes aggregate provider-to-ASN evidence.

Source Inventory

Official/public sources:

AWS IP ranges: https://ip-ranges.amazonaws.com/ip-ranges.json
Azure Service Tags: https://www.microsoft.com/en-us/download/details.aspx?id=56519
Google Cloud ranges: https://www.gstatic.com/ipranges/cloud.json
Google public ranges: https://www.gstatic.com/ipranges/goog.json
Cloudflare ranges: https://www.cloudflare.com/ips-v4, https://www.cloudflare.com/ips-v6
Fastly public IP list: https://api.fastly.com/public-ip-list
GitHub Meta API: https://api.github.com/meta
Oracle Cloud ranges: https://docs.oracle.com/en-us/iaas/tools/public_ip_ranges.json

Derived project sources:

CrawlerScope: crawler, AI bot, monitoring, scanner, and SEO bot ranges
Tor-Radar: Tor relay and exit IPs
VPN provider ASN summary: aggregate ASN signals, no raw VPN IP feed

Record Shape

Example hosting-cloud JSONL record:

{"prefix":"104.16.0.0/13","layer":"hosting-cloud","provider":"Cloudflare","service":"edge","tags":["cdn","edge","proxy"],"confidence":0.99,"source_id":"cloudflare-v4"}

Example crawler-bot JSONL record:

{"prefix":"66.249.64.0/19","layer":"crawler-bot","provider":"Google","service":"Google common crawlers","tags":["bot","crawler","search"],"confidence":0.95,"source_id":"crawler-scope"}

Example anonymity JSONL record:

{"prefix":"185.220.101.1/32","layer":"anonymity","provider":"Tor","service":"exit","tags":["anonymity-network","tor","tor-exit"],"confidence":0.98,"source_id":"tor-radar"}

Example asn-signal JSONL record:

{"layer":"asn-signal","provider":"NordVPN","asn":9009,"asn_name":"M247","tags":["asn-signal","vpn-adjacent"],"confidence":0.7}

Usage Examples

Get current build stats:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/summary.json | jq .

Download cloud prefixes:

curl -fsSLO https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cloud-prefixes.csv

Extract Cloudflare rows:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cloud-prefixes.csv \
  | awk -F, '$3 == "Cloudflare" { print }'

Extract Tor exits from JSONL:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/ip-knowledge.jsonl \
  | jq -r 'select(.layer=="anonymity" and .service=="exit") | .prefix'

Extract AI crawler prefixes:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/ip-knowledge.jsonl \
  | jq -r 'select(.layer=="crawler-bot" and (.tags | index("ai-crawler"))) | .prefix'

Use as a lightweight block/allow enrichment feed:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cidr-tags.txt \
  | grep 'cloud'

Find all ASN signals for a provider:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/asn-signals.csv \
  | awk -F, '$3 == "NordVPN" { print }'

What It Can Help With

IP enrichment for fraud/risk systems
WAF and SIEM context
Cloud/datacenter detection
CDN/edge infrastructure classification
AI crawler and bot visibility
Tor relay context
ASN-level VPN-adjacent signals
Source provenance for explainable decisions
Building internal allowlists, denylists, and review queues

This project is not a malware or abuse blacklist. It provides operational network context with source provenance and confidence.

Local Update

python3 scripts/update.py

The collector prefers local sibling project outputs when present:

../crawler-scope/data/current/crawlers.json
../tor-radar/data/current/network.json
../release/analysis/data/provider_asn.csv

When those files are not present, it pulls the public raw GitHub project outputs where possible.

GitHub Actions

The workflow runs every 6 hours and commits updated files under data/.

.github/workflows/ip-knowledge-layer.yml

The workflow intentionally stores full data only in data/current/*. Historical snapshots are compact summaries to avoid repository bloat.

Planned Improvements

Planned additions inspired by projects such as ipverse:

asn-knowledge.csv: ASN-level rollup with tags, cloud presence, Tor presence, crawler presence, VPN-adjacent evidence, and confidence.
asn-prefixes.csv.gz: compressed bulk ASN-to-prefix layer, kept separate from ip-knowledge.jsonl to avoid making the main file too large.
provider-index.json: normalized provider metadata and aliases.
overlap-summary.csv: overlap between cloud/CDN, crawler, Tor, and VPN-adjacent ASN signals.
diff/current.json: added/removed prefix summary between runs.

The intent is not to clone ipverse. The goal is to build a higher-level knowledge layer with source provenance, tags, and confidence.

Notes

The project avoids full IPv4 expansion.
The project avoids mass RDAP/whois lookups in GitHub Actions.
vpn-adjacent signals are aggregate ASN-level indicators, not a raw VPN IP dump.
Confidence is source-level confidence, not a claim that traffic from a network is malicious.
Some official providers publish overlapping service rows for the same prefix. Those rows are preserved because service labels carry useful context.

License

CC0-1.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
data		data
scripts		scripts
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IP Knowledge Layer

Why This Exists

Current Release

Download URLs

Which File Should I Use?

Files

Layers

`hosting-cloud`

`crawler-bot`

`anonymity`

`asn-signal`

Source Inventory

Record Shape

Usage Examples

What It Can Help With

Local Update

GitHub Actions

Planned Improvements

Notes

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IP Knowledge Layer

Why This Exists

Current Release

Download URLs

Which File Should I Use?

Files

Layers

hosting-cloud

crawler-bot

anonymity

asn-signal

Source Inventory

Record Shape

Usage Examples

What It Can Help With

Local Update

GitHub Actions

Planned Improvements

Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`hosting-cloud`

`crawler-bot`

`anonymity`

`asn-signal`

Packages