Skip to content

ipanalytics/IP-Knowledge-Layer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IP Knowledge Layer

Open IP enrichment knowledge layer for CIDR, ASN, cloud, crawler, Tor, and VPN-adjacent network intelligence.

This repository is data-first: the main output is a set of machine-readable files that can be pulled directly with curl, GitHub Actions, SIEM pipelines, WAF tooling, anti-fraud systems, and internal enrichment jobs.

Why This Exists

Most public IP repositories publish one narrow list: cloud IPs, Tor IPs, crawler IPs, or ASN mappings. IP Knowledge Layer combines multiple public and derived signals into one normalized enrichment layer.

The value is context:

CIDR or ASN -> layer -> provider -> service -> tags -> confidence -> source

Instead of only knowing that a prefix exists, consumers can understand whether it belongs to cloud hosting, CDN edge, GitHub infrastructure, AI crawlers, Tor, or a VPN-adjacent ASN signal.

Current Release

Metric Value
Updated 2026-05-25T05:08:54Z
Release data-20260525-050854Z
Records 111,502
Prefix records 111,502
ASN signals 0
Sources 11
Collector errors 1
Layer Records
hosting-cloud 98,191
anonymity 11,489
crawler-bot 1,822
Top Provider Records
Azure 73,422
AWS 15,867
Tor 11,489
GitHub 6,703
Oracle Cloud 1,078

Download URLs

Replace main with another branch if needed.

BASE="https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current"

curl -fsSL "$BASE/summary.json"
curl -fsSL "$BASE/source-index.json"
curl -fsSL "$BASE/ip-knowledge.jsonl"
curl -fsSL "$BASE/ip-knowledge.csv"
curl -fsSL "$BASE/cloud-prefixes.csv"
curl -fsSL "$BASE/asn-signals.csv"
curl -fsSL "$BASE/cidr-tags.txt"

Which File Should I Use?

Need Use this file Why
I want the full knowledge layer ip-knowledge.jsonl Best for pipelines, jq, streaming, and preserving nested fields
I want Excel/BI/SIEM-friendly data ip-knowledge.csv Same broad dataset in tabular form
I only need cloud/CDN/developer platform ranges cloud-prefixes.csv Smaller and focused on AWS, Azure, GCP, Cloudflare, Fastly, GitHub, Oracle
I need quick CIDR-to-tags lookup cidr-tags.txt Lightweight text file: one CIDR plus comma-separated tags per line
I care about VPN-heavy/provider ASN signals asn-signals.csv ASN-level aggregate evidence, without raw VPN IP publication
I need to check source health and counts summary.json Current run status, layer counts, provider/source aggregates
I need source provenance source-index.json Source URLs, source types, and record counts

For most users:

Start with cloud-prefixes.csv if you only need cloud/datacenter/CDN ranges.
Start with ip-knowledge.jsonl if you want the full enrichment layer.
Start with cidr-tags.txt if you want the simplest possible feed.

Files

File Purpose Approx size
data/current/summary.json Current build summary, counts, layer/provider/source aggregates 8 KB
data/current/source-index.json Source metadata, URLs, source types, record counts 3 KB
data/current/ip-knowledge.jsonl Full normalized knowledge layer, one JSON record per line 49 MB
data/current/ip-knowledge.csv Full normalized knowledge layer as CSV 25 MB
data/current/cloud-prefixes.csv Official cloud/CDN/developer-platform prefixes only 22 MB
data/current/asn-signals.csv ASN-level VPN-adjacent aggregate signals 399 KB
data/current/cidr-tags.txt Simple CIDR tags text file for lightweight consumers 4.7 MB
data/history/summary.csv Build history small
data/snapshots/*.json Compact summary snapshots, not full data copies small

Layers

hosting-cloud

Official cloud, CDN, edge, and developer-platform IP ranges.

Current providers:

  • AWS
  • Azure
  • Google Cloud
  • Google public infrastructure
  • Cloudflare
  • Fastly
  • GitHub
  • Oracle Cloud

crawler-bot

Crawler, AI bot, monitoring probe, scanner, SEO bot, and social preview ranges derived from CrawlerScope.

anonymity

Tor relay host routes derived from Tor-Radar.

asn-signal

ASN-level VPN-adjacent aggregate signals from provider analysis. This layer does not publish raw VPN IP lists. It only publishes aggregate provider-to-ASN evidence.

Source Inventory

Official/public sources:

  • AWS IP ranges: https://ip-ranges.amazonaws.com/ip-ranges.json
  • Azure Service Tags: https://www.microsoft.com/en-us/download/details.aspx?id=56519
  • Google Cloud ranges: https://www.gstatic.com/ipranges/cloud.json
  • Google public ranges: https://www.gstatic.com/ipranges/goog.json
  • Cloudflare ranges: https://www.cloudflare.com/ips-v4, https://www.cloudflare.com/ips-v6
  • Fastly public IP list: https://api.fastly.com/public-ip-list
  • GitHub Meta API: https://api.github.com/meta
  • Oracle Cloud ranges: https://docs.oracle.com/en-us/iaas/tools/public_ip_ranges.json

Derived project sources:

  • CrawlerScope: crawler, AI bot, monitoring, scanner, and SEO bot ranges
  • Tor-Radar: Tor relay and exit IPs
  • VPN provider ASN summary: aggregate ASN signals, no raw VPN IP feed

Record Shape

Example hosting-cloud JSONL record:

{"prefix":"104.16.0.0/13","layer":"hosting-cloud","provider":"Cloudflare","service":"edge","tags":["cdn","edge","proxy"],"confidence":0.99,"source_id":"cloudflare-v4"}

Example crawler-bot JSONL record:

{"prefix":"66.249.64.0/19","layer":"crawler-bot","provider":"Google","service":"Google common crawlers","tags":["bot","crawler","search"],"confidence":0.95,"source_id":"crawler-scope"}

Example anonymity JSONL record:

{"prefix":"185.220.101.1/32","layer":"anonymity","provider":"Tor","service":"exit","tags":["anonymity-network","tor","tor-exit"],"confidence":0.98,"source_id":"tor-radar"}

Example asn-signal JSONL record:

{"layer":"asn-signal","provider":"NordVPN","asn":9009,"asn_name":"M247","tags":["asn-signal","vpn-adjacent"],"confidence":0.7}

Usage Examples

Get current build stats:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/summary.json | jq .

Download cloud prefixes:

curl -fsSLO https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cloud-prefixes.csv

Extract Cloudflare rows:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cloud-prefixes.csv \
  | awk -F, '$3 == "Cloudflare" { print }'

Extract Tor exits from JSONL:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/ip-knowledge.jsonl \
  | jq -r 'select(.layer=="anonymity" and .service=="exit") | .prefix'

Extract AI crawler prefixes:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/ip-knowledge.jsonl \
  | jq -r 'select(.layer=="crawler-bot" and (.tags | index("ai-crawler"))) | .prefix'

Use as a lightweight block/allow enrichment feed:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cidr-tags.txt \
  | grep 'cloud'

Find all ASN signals for a provider:

curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/asn-signals.csv \
  | awk -F, '$3 == "NordVPN" { print }'

What It Can Help With

  • IP enrichment for fraud/risk systems
  • WAF and SIEM context
  • Cloud/datacenter detection
  • CDN/edge infrastructure classification
  • AI crawler and bot visibility
  • Tor relay context
  • ASN-level VPN-adjacent signals
  • Source provenance for explainable decisions
  • Building internal allowlists, denylists, and review queues

This project is not a malware or abuse blacklist. It provides operational network context with source provenance and confidence.

Local Update

python3 scripts/update.py

The collector prefers local sibling project outputs when present:

../crawler-scope/data/current/crawlers.json
../tor-radar/data/current/network.json
../release/analysis/data/provider_asn.csv

When those files are not present, it pulls the public raw GitHub project outputs where possible.

GitHub Actions

The workflow runs every 6 hours and commits updated files under data/.

.github/workflows/ip-knowledge-layer.yml

The workflow intentionally stores full data only in data/current/*. Historical snapshots are compact summaries to avoid repository bloat.

Planned Improvements

Planned additions inspired by projects such as ipverse:

  • asn-knowledge.csv: ASN-level rollup with tags, cloud presence, Tor presence, crawler presence, VPN-adjacent evidence, and confidence.
  • asn-prefixes.csv.gz: compressed bulk ASN-to-prefix layer, kept separate from ip-knowledge.jsonl to avoid making the main file too large.
  • provider-index.json: normalized provider metadata and aliases.
  • overlap-summary.csv: overlap between cloud/CDN, crawler, Tor, and VPN-adjacent ASN signals.
  • diff/current.json: added/removed prefix summary between runs.

The intent is not to clone ipverse. The goal is to build a higher-level knowledge layer with source provenance, tags, and confidence.

Notes

  • The project avoids full IPv4 expansion.
  • The project avoids mass RDAP/whois lookups in GitHub Actions.
  • vpn-adjacent signals are aggregate ASN-level indicators, not a raw VPN IP dump.
  • Confidence is source-level confidence, not a claim that traffic from a network is malicious.
  • Some official providers publish overlapping service rows for the same prefix. Those rows are preserved because service labels carry useful context.

License

CC0-1.0. See LICENSE.

About

Open IP enrichment knowledge layer: CIDR, ASN, cloud, CDN, crawler, Tor, and VPN-adjacent network context with source provenance and confidence.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages