Logo
Core Tools

Proxy Extractor

Extract proxies from text, files, URLs, HTML, and JSON sources.

Proxy Extractor is the entry point for messy sources. It fetches or reads source material, parses possible proxy candidates, normalizes formats, dedupes results, and gives per-source extraction metrics.

Proxy Extractor

Input Modes

Use one URL or source. This is best when testing a new source before adding it to a larger workflow.

Paste multiple URLs or sources. This is best for a known group of source feeds.

Load sources from a local text file. This is best when maintaining source lists outside the app.

Use cloud proxy sources returned from ZeroTrace Server. This is best when you want managed ZeroTrace source discovery.

Parser Profiles

ProfileUse when
AutoYou are not sure whether the source is text, HTML, or JSON.
Plain textThe source is a raw list or text blob.
HTMLThe source is a web page, table, code block, list, or page with useful attributes.
JSONThe source is an API response, feed, or structured JSON object list.

HTML Extraction

HTML mode can use a custom selector and can also extract from common structures such as:

  • tables
  • pre, code, and textarea
  • ordered and unordered lists
  • data-ip, data-host, data-proxy, and data-address attributes
  • scripts that look like JSON or contain proxy-like values

JSON Extraction

JSON mode supports simple JSON path selection:

data.items[*]
data.items[*].proxy
sources[0].proxies[*]

It can also infer proxy-shaped objects from fields such as host, hostname, ip, address, server, port, scheme, protocol, username, user, password, and pass.

Dedupe Modes

ModeKeeps separate rows when
FullThe full normalized proxy string differs.
Host + portThe endpoint differs, regardless of scheme or auth.
HostThe host differs, regardless of port, scheme, or auth.

Use Prefer strongest when you want dedupe to keep the strongest variant for the same dedupe key.

Metrics To Review

MetricWhat it tells you
HTTP statusWhether the source loaded successfully.
Parser usedWhich parser actually produced output.
Candidate countHow many proxy-like values were found before final cleanup.
Duplicate countHow noisy the source was.
Proxy countHow many normalized proxies survived extraction.
ErrorFetch, parsing, selector, or source-specific failure.

Details That Matter

FeatureDetail
Source fetchConcurrency, timeout, redirects, random User-Agent, custom User-Agent, headers, and cookies.
Cloud source modeUses cloud proxy sources.
Response handlingSupports gzip, zlib, and deflate-style response bodies with an extraction body cap around 4 MiB.
Auto fallbackAuto parser can try JSON, HTML, and plain text fallbacks when the first route produces no proxies.
JSON object inferenceFinds proxy objects split across host/port/scheme/user/pass fields.
HTML table handlingReads table rows, cells, child elements, page text, useful attributes, and script-like blocks.
Scheme cleanupNormalizes malformed scheme variants and recognizes HTTP, HTTPS, SOCKS4, SOCKS4a, SOCKS5, and SOCKS5h.
Endpoint validationAccepts IPv4, IPv6, domains, localhost, and ports from 1 to 65535.
Dedupe strategyFull proxy, host + port, host only, and prefer-strongest are separate cleanup decisions.
Per-source diagnosticsStatus, duration, content type, response bytes, parser used, scope count, candidates, duplicates, proxy count, and errors.

If Auto gives weak results on a known HTML table, rerun with HTML mode and a selector such as table tbody tr.

On this page