Proxy Formats and Parsers
Supported proxy formats, parser profiles, JSON paths, HTML selectors, and dedupe behavior.
ZeroTrace Proxy accepts a broad set of proxy input formats across extractor, checker, benchmark, leak test, chain tester, and viewer workflows.
Supported Proxy Formats
| Format | Example |
|---|---|
| Host and port | 192.0.2.10:8080 |
| Scheme URL | http://192.0.2.10:8080 |
| SOCKS URL | socks5://198.51.100.20:1080 |
| User/pass URL | http://user:pass@192.0.2.10:8080 |
| Host-port-user-pass | 192.0.2.10:8080:user:pass |
| IPv6 bracketed | http://[2001:db8::10]:8080 |
Supported schemes include:
- HTTP
- HTTPS
- SOCKS4
- SOCKS4a
- SOCKS5
- SOCKS5h
Scheme variants with missing or malformed :// are normalized when possible.
Endpoint Validation
The parser validates:
- IPv4 hosts
- IPv6 hosts
- domains
- localhost
- ports from 1 to 65535
Parser Profiles
| Profile | Behavior |
|---|---|
| Auto | Chooses JSON, HTML, or plain text based on content type, body shape, selector, and JSON path. |
| Plain text | Runs text extraction directly. |
| HTML | Uses selectors, common elements, attributes, tables, lists, and script-like blocks. |
| JSON | Walks JSON payloads and proxy-shaped objects. |
JSON Path Examples
data.items[*]
data.items[*].proxy
sources[0].items[*]Supported path features:
- dot paths
- numeric indexes like
[0] - wildcards like
[*]
HTML Selector Examples
table tbody tr
pre
code
[data-proxy]Use selectors when Auto mode finds too much or too little.
Extractor Sources and Fetch Controls
| Feature | Detail |
|---|---|
| Sources | Single source, many sources, and local file. |
| HTTP controls | Timeout seconds, concurrency, redirects, random User-Agent, custom User-Agent, custom headers, and cookies. |
| Body handling | Gzip, zlib, and deflate-style decompression plus an extraction body cap around 4 MiB. |
| Per-source metrics | HTTP status, duration, content type, response bytes, parser used, extraction scope count, candidate count, duplicate count, proxy count, and error. |
HTML Extraction Details
HTML mode can read selected elements, table tr, pre, code, textarea, ordered lists, unordered lists, data-ip, data-host, data-proxy, and data-address. It also reads useful attributes such as data-*, href, content, value, and title, understands table cells and child elements, and inspects embedded scripts that look like JSON or contain proxy-shaped values.
JSON Extraction Details
JSON mode collects strings, numbers, arrays, and object fields. It can infer proxy-shaped objects from host-like fields, port fields, scheme/protocol/type fields, username/user/login fields, and password/pass fields.
Dedupe Modes
| Mode | Best for |
|---|---|
| Full | Keeping scheme/auth variants separate. |
| Host + port | Reducing duplicates where only scheme or auth varies. |
| Host | Aggressive cleanup by host. |
For mixed public proxy sources, start with Auto parser and Host + port dedupe. Tighten to HTML selector or JSON path only when source metrics show noise.