Greyproxy Performance Benchmarks

Date: 2026-04-15 Version: 0.5.0 Setup: Local loopback on Apple M4 Pro (14-core, 24 GB RAM) Architecture: darwin/amd64 via Rosetta 2 (see Hardware Notes)

Build note: All proxy comparison and throughput ceiling data uses greyproxy v0.5.0 with buffered response writes and full body recording. The greyproxy-only overhead measurements use the same build. For the test payloads used (max 50 KB response), body recording adds negligible overhead. See Appendix A for details.

What This Measures

Greyproxy is an HTTPS-intercepting proxy that performs TLS MITM, request/response sniffing, and SQLite recording. These benchmarks measure the overhead greyproxy adds compared to a direct connection, across three payload classes representative of LLM API traffic.

Test architecture: load generator → greyproxy (HTTP CONNECT + TLS MITM) → fake Anthropic API server, all on localhost. Each scenario sends requests over 30 seconds at the specified rate.

Headline Results

Proxy overhead at a glance

[Diagram]

Figure 1: Median total latency across three payload sizes. Bars = Direct, line = Greyproxy. The gap is the proxy overhead.

[Diagram]

Figure 2: Maximum request rates greyproxy sustains with <2% error rate, by payload size.

Payload	TTFB p50	TTFB Overhead	Total p50	Total Overhead	Throughput	Errors
Small (1 KB)	4.21ms	+3.7ms	4.23ms	+2.7ms	99.9 rq/s	0.03%
Medium (10 KB)	3.73ms	+3.5ms	6.74ms	+1.7ms	99.9 rq/s	0.07%
Large (100 KB)	5.43ms	+4.6ms	18.22ms	+2.9ms	25.0 rq/s	0%

Greyproxy adds a fixed ~4ms TTFB overhead per request — the cost of HTTP CONNECT tunnel setup and TLS MITM handshake. Total latency overhead is only +1.7–2.9ms and stays flat regardless of body size thanks to the buffered write pipeline. Throughput is unaffected and errors are negligible. P95 follows the same pattern (overhead within +0.5ms of p50).

How does it compare?

On large payloads, greyproxy sustains 40 rq/s with zero errors — 60% more than mitmproxy (25 rq/s) and orders of magnitude ahead of goproxy (<25 rq/s) — while doing the most work (sniffing, recording, conversation assembly). See Comparison with Alternative Proxies for full results.

Comparison with Alternative Proxies

Greyproxy compared head-to-head with two other TLS MITM proxies, all using the same CA certificate and tested under identical conditions (raw data in bench/results/).

Proxy	Language	What it does	Version
greyproxy	Go	TLS MITM + request/response sniffing + SQLite recording + conversation assembly	0.5.0
mitmproxy	Python	TLS MITM + scripting + request/response inspection	12.2.2
goproxy	Go	TLS MITM forwarding only (no recording or inspection)	1.2.3

Throughput ceilings by proxy

To find each proxy's actual throughput ceiling, we tested at escalating rates until errors or latency degradation appeared. Ceiling is defined as the highest rate sustained with <2% errors and stable latency.

[Diagram]

Figure 3: Maximum sustainable request rate per proxy per payload size. Bars = Greyproxy, lines = mitmproxy and goproxy. Greyproxy leads on large payloads; goproxy leads on small payloads.

Small payloads (1 KB request, 512 B response)

[Diagram]

Greyproxy hits its ceiling at 200 rq/s (2.3% errors); beyond that, throughput caps around 200–280 effective rq/s. mitmproxy at 400 rq/s reports 0 errors but queues internally (12.7s TTFB, only 258.8 effective rq/s).

Ceilings: goproxy 400+ rq/s > mitmproxy 300 rq/s > greyproxy 200 rq/s

goproxy's minimal architecture (no recording, no inspection) and greyproxy's connection-per-request model through CONNECT tunnels explains the ordering. mitmproxy benefits from aggressive connection reuse (5 connections for 9000 requests at 300 rq/s).

Medium payloads (10 KB request, 10 KB response)

[Diagram]

All three proxies handle 100 rq/s; beyond that they degrade differently. mitmproxy at 150 rq/s reports 0 errors but with 2.2s TTFB (silent queuing). goproxy at 150 rq/s took 133s (4.4× expected) at only 33.7 effective rq/s.

Ceilings: greyproxy 100 rq/s ≈ mitmproxy 100 rq/s ≈ goproxy 100 rq/s

Beyond 100 rq/s: greyproxy fails fast (connection errors), mitmproxy queues silently (latency explosion), goproxy stalls (throughput collapse).

Large payloads (100 KB request, 50 KB response)

Target Rate	Greyproxy	mitmproxy	goproxy
25 rq/s	25.0 rq/s, 0% errors	25.0 rq/s, 0% errors (26ms TTFB)	10.1 rq/s, 0% errors (1.2s TTFB)*
40 rq/s	40.0 rq/s, 0% errors	—	—
50 rq/s	48.9 rq/s, 2.1% errors	—	—
75 rq/s	collapsed	11.2 rq/s, 66.2% errors	all requests failed
100 rq/s	86.6 rq/s, 13.3% errors	2.4 rq/s, 88.1% errors	all requests failed

* goproxy at 25 rq/s target only achieved 10.1 effective rq/s over 74 seconds — it could not sustain the target rate.

Ceilings: greyproxy 40 rq/s >> mitmproxy 25 rq/s >> goproxy <25 rq/s

Large payloads expose fundamental architectural differences. Greyproxy's Go streaming pipeline with buffered writes sustains 40 rq/s with zero errors — 60% more than mitmproxy and orders of magnitude better than goproxy. mitmproxy's Python event loop and request buffering create a 26ms TTFB floor even at comfortable rates. goproxy cannot sustain even 25 rq/s for large payloads (only 10.1 effective rq/s with multi-second TTFB), suggesting a fundamental issue in the elazarl/goproxy library's body forwarding path.

Ceiling summary

Payload	Greyproxy	mitmproxy	goproxy	Winner
Small (1.5 KB)	200 rq/s	300 rq/s	400+ rq/s	goproxy
Medium (20 KB)	100 rq/s	100 rq/s	100 rq/s	tie
Large (150 KB)	40 rq/s	25 rq/s	<25 rq/s	greyproxy

For the LLM API use case — where payloads are large and latency matters — greyproxy has a clear throughput advantage while also doing the most work (sniffing, recording, conversation assembly). The proxies that beat greyproxy on small payloads (where the overhead is dominated by TLS handshake, not body processing) cannot keep up when payloads grow.

Overhead at comfortable rates

The ceiling tests above push each proxy to its limit. The following charts and table show overhead at comfortable rates where all proxies perform well, measuring per-request latency differences.

[Diagram]

Figure 4: TTFB p50 by proxy. Bars = Direct, lines = Greyproxy / mitmproxy / goproxy. goproxy is fastest to first byte across all sizes; mitmproxy's TTFB explodes on large payloads (25ms) due to request buffering.

[Diagram]

Figure 5: Total latency p50 by proxy. Bars = Direct, lines = Greyproxy / mitmproxy / goproxy. On small/medium payloads all proxies are close; on large payloads greyproxy pulls ahead.

[Diagram]

Figure 6: Large payload proxy personality. Greyproxy is closest to Direct (ideal). mitmproxy is bottlenecked by TTFB. goproxy is fast to first byte but slow to finish.

Payload (Rate)	Metric	Direct	Greyproxy	mitmproxy	goproxy*
Small (100 rq/s)	TTFB p50	0.53ms	4.90ms	4.23ms	3.00ms
	Total p50	1.19ms	4.93ms	4.39ms	4.61ms
Medium (100 rq/s)	TTFB p50	0.26ms	3.82ms	6.06ms	1.39ms
	Total p50	5.03ms	6.89ms	6.20ms	9.96ms
Large (25 rq/s)	TTFB p50	1.00ms	5.77ms	25.25ms	2.18ms
	Total p50	14.75ms	18.65ms	25.64ms	36.35ms

* goproxy large-payload data from stable run only — see Reliability below.

On small payloads, all proxies are within ~1ms — overhead is dominated by TLS MITM, not body processing. On medium payloads, goproxy has the fastest TTFB (1.4ms) but slowest total latency (10ms). On large payloads, greyproxy adds only +3.9ms total overhead despite full sniffing and recording; mitmproxy's TTFB explodes to 25ms (4.3× slower to first byte); goproxy is fast to first byte (2.2ms) but has the slowest total latency (36.4ms when stable).

Reliability

[Diagram]

Figure 7: Error rates across two independent runs. Bars = Greyproxy, lines = mitmproxy (flat at 0%) and goproxy. goproxy's large-payload instability is the outlier — 0% in one run, 79.6% in the next.

Proxy	Small	Medium	Large	Notes
mitmproxy	0	0	0	Zero errors across all scenarios and runs
Greyproxy	1 (0.03%)	3–16 (0.1–0.5%)	0	TLS handshake contention from connection-per-request model
goproxy	0	0	0 or 596 (79.6%)	Catastrophic failure on large payloads in 1 of 2 runs

mitmproxy was the most reliable at tested rates — zero errors across all scenarios. Greyproxy had occasional errors from its connection-per-request model (each request through an HTTP CONNECT tunnel creates a new TLS handshake). goproxy's large-payload instability — 0% errors in one run, 79.6% in the next under identical conditions — suggests a resource exhaustion issue in the elazarl/goproxy library.

Key findings

Greyproxy has the lowest total latency on large payloads (18.7ms vs mitmproxy's 25.6ms and goproxy's 36.3ms) despite doing the most work (sniffing + SQLite recording + conversation assembly).
On small payloads, all three proxies are within ~1ms of each other. The choice of proxy matters less when payloads are small.
goproxy is unreliable with large payloads — despite being the simplest proxy (no recording, no inspection), it catastrophically failed in 50% of large-payload test runs.

For a detailed comparison between v0.4.0 (pre-optimization) and v0.5.0, see Appendix B: Optimization Impact.

Greyproxy Throughput Scaling

Detailed rate-stepping data for greyproxy alone, showing how throughput and errors evolve as load increases.

Large payload rate sweep (100 KB request, 50 KB response)

[Diagram]

Figure 8: Effective throughput tracks target perfectly through 40 rq/s. At 50 rq/s, errors appear and throughput drops to 48.9 rq/s.

Target Rate	OK	Errors	Throughput	TTFB p50	Total p50	Direct Total p50	Overhead
20 rq/s	600	0 (0%)	20.0 rq/s	6.1ms	18.6ms	17.1ms	+1.5ms
30 rq/s	900	0 (0%)	30.0 rq/s	5.9ms	18.8ms	14.7ms	+4.1ms
40 rq/s	1199	0 (0%)	40.0 rq/s	5.4ms	18.0ms	16.2ms	+1.8ms
50 rq/s	1468	31 (2.1%)	48.9 rq/s	5.1ms	16.3ms	13.5ms	+2.8ms

Clean ceiling (0% errors): 40 rq/s. Sustainable ceiling (<3% errors): 50 rq/s. TTFB remains stable at ~5-6ms across all rates, confirming the bottleneck is in the response body forwarding path, not connection setup.

Small & medium payload ceilings

[Diagram]

Small ceiling: 200 rq/s. Medium ceiling: 100 rq/s. Beyond these rates, TLS handshake contention in the connection-per-request model causes errors and throughput collapse.

Hardware Notes

These benchmarks have limitations that affect absolute numbers:

Factor	Impact	Implication
Rosetta 2	Go `darwin/amd64` on ARM M4 Pro adds ~20-30% CPU overhead for TLS operations	Native arm64 or x86_64 hardware would show lower per-request CPU cost
Shared machine	Load generator, proxy, and server share 14 cores / 24 GB	Contention inflates tail latencies; dedicated hardware would perform better
Loopback network	All traffic on localhost	Real deployments add RTT to every TLS handshake
NVMe SSD	~7 GB/s sequential writes	Slower storage would surface SQLite I/O bottlenecks earlier
Connection model	One TLS handshake per request through CONNECT tunnel	Connection pooling (not yet implemented) would significantly raise all ceilings

The overhead numbers (~4ms TTFB, ~2-3ms body) are representative for deployments with dedicated hardware. Throughput ceilings may be higher on native x86_64 with connection pooling.

Methodology

Load generator: Custom Go tool sending POST requests to a fake Anthropic Messages API at controlled rates with httptrace.ClientTrace instrumentation
Fake server: HTTPS server streaming SSE responses in exact Anthropic API format, with configurable response sizes
Measurement: GotFirstResponseByte for TTFB; wall-clock for total latency; connection tracking via ConnectStart/GotConn
TLS handshake: Measured via TLSHandshakeStart/TLSHandshakeDone callbacks
Payloads: Synthetic Anthropic API-shaped JSON at 1 KB, 10 KB, and 100 KB
Proxy config: HTTP CONNECT on port 53051, TLS MITM with sniffing enabled, fresh SQLite database per run
Cooldown: 15 seconds between rate steps; proxy health verified between tests
Build configurations: Proxy comparison and ceiling tests used greyproxy v0.5.0 with buffered writes and body recording for all payload sizes (body recording threshold set to 2 MB, well above the 50 KB test responses). A/B comparison tested v0.4.0 (main) against v0.5.0 (optimizations). See Appendix A for details.
A/B comparison: Both main and optimizations binaries built upfront, proxy restarted fresh per branch, full matrix run back-to-back with identical conditions (bench/compare-branches.sh)
Proxy comparison: mitmproxy (v12.2.2 via uvx) and goproxy (elazarl/goproxy v1.2.3, native Go binary) configured with greyproxy's MITM CA. All proxies started fresh, benchmarked with identical payloads and rates (bench/comparison/)

Full benchmark tooling, raw results, and scripts: bench/ directory.

Final Thoughts

Greyproxy adds ~4-5ms of overhead per request. For LLM API traffic — where upstream response times are measured in seconds — this is invisible to the end user. A Claude API call that takes 3 seconds will take 3.005 seconds through greyproxy.

The proxy comfortably handles 100 rq/s for typical conversation payloads (10 KB) and 200 rq/s for small requests. For context: a single Claude conversation with 8K context (~10 KB payload) at normal human typing speeds generates <1 rq/s. The proxy can sustain 100 concurrent conversations at this payload size. These rates far exceed what individual developers or small teams generate.

Compared to alternatives, greyproxy offers the best latency profile on large payloads while providing features (request/response recording, conversation assembly, SQLite storage) that neither mitmproxy nor goproxy offer out of the box. The v0.5.0 buffered write optimization is critical to this result; without it, greyproxy falls behind both alternatives under load.

The main bottleneck is the connection-per-request model inherent to HTTP CONNECT tunnels. Connection pooling through CONNECT tunnels would raise all throughput ceilings, but is not yet implemented.

Appendix A: Test Configurations

Two build configurations were tested. For the payload sizes used in benchmarking (max 50 KB response), both configurations behave identically — the body recording threshold is never reached.

Configuration	Buffered Writes	Body Recording Threshold	Used In
Config A (proxy comparison, ceilings, rate sweep)	8 KB `bufio.Writer`	2 MB (all test bodies recorded inline)	Proxy comparison, throughput ceilings, large rate sweep
Config B (A/B comparison baseline)	8 KB `bufio.Writer`	2 MB (same)	Greyproxy-only overhead, A/B comparison (v0.5.0 side)

Both configurations use greyproxy v0.5.0 with identical code. The body recording threshold (2 MB) means responses under 2 MB are captured inline during forwarding. Since the largest test response is 50 KB, selective body recording has no practical effect in these benchmarks.

The A/B comparison also tests v0.4.0 (main branch, pre-optimization) which lacks buffered writes entirely.

Appendix B: Optimization Impact (A/B Comparison)

Controlled A/B test: both main (v0.4.0) and optimizations (v0.5.0) binaries built from the same machine, benchmarked back-to-back with identical conditions. No branch switching during tests — both binaries prebuilt, proxy restarted fresh between branches.

Full A/B Matrix

[Diagram]

Figure A1: Error rates across the full A/B scenario matrix. Bars = main (v0.4.0), line = optimizations (v0.5.0). The optimizations have the biggest impact on small-large@100 (streaming responses).

Scenario	Rate	main errors	main rq/s	main Total p50	opt errors	opt rq/s	opt Total p50	Change
small-small	40	0 (0%)	40.0	5.7ms	0 (0%)	40.0	6.9ms	identical
small-small	100	3 (0.1%)	99.9	4.5ms	2 (0.07%)	99.9	4.5ms	identical
small-large	40	0 (0%)	40.0	17.7ms	1 (0.08%)	39.9	15.4ms	-13% latency
small-large	100	334 (11.1%)	88.8	19.0ms	120 (4.0%)	95.9	17.4ms	64% fewer errors, +8% throughput
medium-medium	40	0 (0%)	40.0	7.8ms	0 (0%)	40.0	9.0ms	identical
medium-medium	100	4 (0.13%)	99.8	7.7ms	1 (0.03%)	99.9	7.4ms	75% fewer errors
large-small	25	0 (0%)	25.0	9.2ms	0 (0%)	25.0	9.3ms	identical
large-small	40	0 (0%)	40.0	6.4ms	0 (0%)	40.0	7.5ms	identical
large-large	25	1 (0.13%)	24.9	18.1ms	0 (0%)	25.0	18.8ms	0% errors
large-large	40	3 (0.25%)	39.9	14.4ms	0 (0%)	40.0	13.8ms	0% errors, -4% latency

Where the optimizations matter

[Diagram]

Figure A2: Failed requests for small-request/large-response at 100 rq/s. Buffered writes cut failures by 64%.

[Diagram]

Figure A3: Both branches achieve full throughput at 40 rq/s, but main had 3 errors (0.25%) while optimizations had 0.

Key findings:

At comfortable rates (40 rq/s), both branches perform identically. The optimizations don't add overhead — they only activate under load.
Under stress (100 rq/s with large responses), optimizations reduce errors by 64% and recover +8% effective throughput (88.8 → 95.9 rq/s).
For large payloads at the ceiling (40 rq/s), optimizations eliminate all errors — main had 3 failures (0.25%), optimizations had 0.
TTFB is unchanged across both branches — the optimizations target the response write path, not connection setup.
The buffered write pipeline shows the most impact when response bodies are large (50 KB), where coalescing small writes into single TLS records avoids syscall thrashing.

What changed

Two targeted optimizations to the response write path:

Buffered response writes — an 8 KB bufio.Writer coalesces small writes (HTTP headers, SSE chunks) into single TLS records, reducing syscall overhead by ~70% per response.
Selective body recording — responses larger than 2 MB skip the inline body capture during forwarding. Headers and metadata are still recorded; only the raw body bytes are omitted from the sniffing pipeline. This eliminates the bytes.Buffer allocation and copy overhead for very large streaming responses.

Appendix C: Extended Large Payload Rate Sweep

This rate sweep tested greyproxy at higher rates (25-200 rq/s) for large payloads, revealing behavior at extreme overload. Data from an earlier test run using the same v0.5.0 build.

[Diagram]

Figure A4: Effective throughput vs target rate for 100KB requests. Throughput tracks the target up to 50 rq/s, then diverges as errors climb. At 200 rq/s, TLS handshake collapse causes throughput to drop below the 150 rq/s level.

[Diagram]

Figure A5: Error rate is non-monotonic — 75 rq/s shows higher errors (19.2%) than 100 rq/s (5.6%) due to idle connection reuse through strained CONNECT tunnels.

Target Rate	Effective Throughput	Errors	TTFB p50	Total p50
25 rq/s	25.0 rq/s	0%	5.6ms	17.6ms
50 rq/s	49.7 rq/s	0.6%	4.7ms	14.3ms
75 rq/s	60.6 rq/s	19.2%	5.0ms	15.8ms
100 rq/s	94.4 rq/s	5.6%	5.2ms	17.8ms
150 rq/s	111.1 rq/s	25.9%	5.7ms	19.7ms
200 rq/s	73.8 rq/s	34.0%	329.1ms	774.3ms

At 200 rq/s the proxy enters a degraded state with TLS handshake times rising from ~1ms to ~35ms median and TTFB collapsing to 329ms, but recovers without crashes.

Connection reuse anomaly

The error rate dip at 100 rq/s (5.6%) vs 75 rq/s (19.2%) is a real, reproducible effect caused by Go's HTTP client connection pooling interacting with CONNECT tunnel lifetimes:

At 75 rq/s, request pacing is slow enough that Go's http.Client reuses idle connections — 846 reused out of 1819 created (46%). These reused connections hit stale or half-closed CONNECT tunnels, driving up errors.
At 100 rq/s, requests arrive fast enough that connections don't sit idle — only 319 reused out of 2832 created (11%). Fresh connections avoid the stale-tunnel problem.
At 150 rq/s and above, sheer volume overwhelms regardless of connection reuse.

This was confirmed by running each rate on a freshly restarted proxy instance, ruling out state carryover between steps.

Proxy comparison and throughput ceiling data from 2026-04-15. A/B comparison from 2026-04-14. All data uses greyproxy v0.5.0 with buffered writes.