Proxy Extractor

Proxy Extractor is the entry point for messy sources. It fetches or reads source material, parses possible proxy candidates, normalizes formats, dedupes results, and gives per-source extraction metrics.

Proxy Extractor

Input Modes

Use one URL or source. This is best when testing a new source before adding it to a larger workflow.

Paste multiple URLs or sources. This is best for a known group of source feeds.

Load sources from a local text file. This is best when maintaining source lists outside the app.

Use cloud proxy sources returned from ZeroTrace Server. This is best when you want managed ZeroTrace source discovery.

Parser Profiles

Profile	Use when
Auto	You are not sure whether the source is text, HTML, or JSON.
Plain text	The source is a raw list or text blob.
HTML	The source is a web page, table, code block, list, or page with useful attributes.
JSON	The source is an API response, feed, or structured JSON object list.

HTML Extraction

HTML mode can use a custom selector and can also extract from common structures such as:

tables
pre, code, and textarea
ordered and unordered lists
data-ip, data-host, data-proxy, and data-address attributes
scripts that look like JSON or contain proxy-like values

JSON Extraction

JSON mode supports simple JSON path selection:

data.items[*]
data.items[*].proxy
sources[0].proxies[*]

It can also infer proxy-shaped objects from fields such as host, hostname, ip, address, server, port, scheme, protocol, username, user, password, and pass.

Dedupe Modes

Mode	Keeps separate rows when
Full	The full normalized proxy string differs.
Host + port	The endpoint differs, regardless of scheme or auth.
Host	The host differs, regardless of port, scheme, or auth.

Use Prefer strongest when you want dedupe to keep the strongest variant for the same dedupe key.

Metrics To Review

Metric	What it tells you
HTTP status	Whether the source loaded successfully.
Parser used	Which parser actually produced output.
Candidate count	How many proxy-like values were found before final cleanup.
Duplicate count	How noisy the source was.
Proxy count	How many normalized proxies survived extraction.
Error	Fetch, parsing, selector, or source-specific failure.

Details That Matter

Feature	Detail
Source fetch	Concurrency, timeout, redirects, random User-Agent, custom User-Agent, headers, and cookies.
Cloud source mode	Uses cloud proxy sources.
Response handling	Supports gzip, zlib, and deflate-style response bodies with an extraction body cap around 4 MiB.
Auto fallback	Auto parser can try JSON, HTML, and plain text fallbacks when the first route produces no proxies.
JSON object inference	Finds proxy objects split across host/port/scheme/user/pass fields.
HTML table handling	Reads table rows, cells, child elements, page text, useful attributes, and script-like blocks.
Scheme cleanup	Normalizes malformed scheme variants and recognizes HTTP, HTTPS, SOCKS4, SOCKS4a, SOCKS5, and SOCKS5h.
Endpoint validation	Accepts IPv4, IPv6, domains, localhost, and ports from 1 to 65535.
Dedupe strategy	Full proxy, host + port, host only, and prefer-strongest are separate cleanup decisions.
Per-source diagnostics	Status, duration, content type, response bytes, parser used, scope count, candidates, duplicates, proxy count, and errors.

If Auto gives weak results on a known HTML table, rerun with HTML mode and a selector such as table tbody tr.

Proxy Extractor

Input Modes

Parser Profiles

HTML Extraction

JSON Extraction

Dedupe Modes

Metrics To Review

Details That Matter

On this page