Greyproxy Performance Benchmarks

Date: 2026-04-15 Version: 0.5.0 Setup: Local loopback on Apple M4 Pro (14-core, 24 GB RAM) Architecture: darwin/amd64 via Rosetta 2 (see Hardware Notes)

Build note: All proxy comparison and throughput ceiling data uses greyproxy v0.5.0 with buffered response writes and full body recording. The greyproxy-only overhead measurements use the same build. For the test payloads used (max 50 KB response), body recording adds negligible overhead. See Appendix A for details.


What This Measures

Greyproxy is an HTTPS-intercepting proxy that performs TLS MITM, request/response sniffing, and SQLite recording. These benchmarks measure the overhead greyproxy adds compared to a direct connection, across three payload classes representative of LLM API traffic.

Test architecture: load generator → greyproxy (HTTP CONNECT + TLS MITM) → fake Anthropic API server, all on localhost. Each scenario sends requests over 30 seconds at the specified rate.


Headline Results

Proxy overhead at a glance

[Diagram]

Figure 1: Median total latency across three payload sizes. Bars = Direct, line = Greyproxy. The gap is the proxy overhead.

[Diagram]

Figure 2: Maximum request rates greyproxy sustains with <2% error rate, by payload size.

Payload TTFB p50 TTFB Overhead Total p50 Total Overhead Throughput Errors
Small (1 KB) 4.21ms +3.7ms 4.23ms +2.7ms 99.9 rq/s 0.03%
Medium (10 KB) 3.73ms +3.5ms 6.74ms +1.7ms 99.9 rq/s 0.07%
Large (100 KB) 5.43ms +4.6ms 18.22ms +2.9ms 25.0 rq/s 0%

Greyproxy adds a fixed ~4ms TTFB overhead per request — the cost of HTTP CONNECT tunnel setup and TLS MITM handshake. Total latency overhead is only +1.7–2.9ms and stays flat regardless of body size thanks to the buffered write pipeline. Throughput is unaffected and errors are negligible. P95 follows the same pattern (overhead within +0.5ms of p50).

How does it compare?

On large payloads, greyproxy sustains 40 rq/s with zero errors — 60% more than mitmproxy (25 rq/s) and orders of magnitude ahead of goproxy (<25 rq/s) — while doing the most work (sniffing, recording, conversation assembly). See Comparison with Alternative Proxies for full results.


Comparison with Alternative Proxies

Greyproxy compared head-to-head with two other TLS MITM proxies, all using the same CA certificate and tested under identical conditions (raw data in bench/results/).

Proxy Language What it does Version
greyproxy Go TLS MITM + request/response sniffing + SQLite recording + conversation assembly 0.5.0
mitmproxy Python TLS MITM + scripting + request/response inspection 12.2.2
goproxy Go TLS MITM forwarding only (no recording or inspection) 1.2.3

Throughput ceilings by proxy

To find each proxy's actual throughput ceiling, we tested at escalating rates until errors or latency degradation appeared. Ceiling is defined as the highest rate sustained with <2% errors and stable latency.

[Diagram]

Figure 3: Maximum sustainable request rate per proxy per payload size. Bars = Greyproxy, lines = mitmproxy and goproxy. Greyproxy leads on large payloads; goproxy leads on small payloads.

Small payloads (1 KB request, 512 B response)

[Diagram]

Greyproxy hits its ceiling at 200 rq/s (2.3% errors); beyond that, throughput caps around 200–280 effective rq/s. mitmproxy at 400 rq/s reports 0 errors but queues internally (12.7s TTFB, only 258.8 effective rq/s).

Ceilings: goproxy 400+ rq/s > mitmproxy 300 rq/s > greyproxy 200 rq/s

goproxy's minimal architecture (no recording, no inspection) and greyproxy's connection-per-request model through CONNECT tunnels explains the ordering. mitmproxy benefits from aggressive connection reuse (5 connections for 9000 requests at 300 rq/s).

Medium payloads (10 KB request, 10 KB response)

[Diagram]

All three proxies handle 100 rq/s; beyond that they degrade differently. mitmproxy at 150 rq/s reports 0 errors but with 2.2s TTFB (silent queuing). goproxy at 150 rq/s took 133s (4.4× expected) at only 33.7 effective rq/s.

Ceilings: greyproxy 100 rq/s ≈ mitmproxy 100 rq/s ≈ goproxy 100 rq/s

Beyond 100 rq/s: greyproxy fails fast (connection errors), mitmproxy queues silently (latency explosion), goproxy stalls (throughput collapse).

Large payloads (100 KB request, 50 KB response)

Target Rate Greyproxy mitmproxy goproxy
25 rq/s 25.0 rq/s, 0% errors 25.0 rq/s, 0% errors (26ms TTFB) 10.1 rq/s, 0% errors (1.2s TTFB)*
40 rq/s 40.0 rq/s, 0% errors
50 rq/s 48.9 rq/s, 2.1% errors
75 rq/s collapsed 11.2 rq/s, 66.2% errors all requests failed
100 rq/s 86.6 rq/s, 13.3% errors 2.4 rq/s, 88.1% errors all requests failed

* goproxy at 25 rq/s target only achieved 10.1 effective rq/s over 74 seconds — it could not sustain the target rate.

Ceilings: greyproxy 40 rq/s >> mitmproxy 25 rq/s >> goproxy <25 rq/s

Large payloads expose fundamental architectural differences. Greyproxy's Go streaming pipeline with buffered writes sustains 40 rq/s with zero errors — 60% more than mitmproxy and orders of magnitude better than goproxy. mitmproxy's Python event loop and request buffering create a 26ms TTFB floor even at comfortable rates. goproxy cannot sustain even 25 rq/s for large payloads (only 10.1 effective rq/s with multi-second TTFB), suggesting a fundamental issue in the elazarl/goproxy library's body forwarding path.

Ceiling summary

Payload Greyproxy mitmproxy goproxy Winner
Small (1.5 KB) 200 rq/s 300 rq/s 400+ rq/s goproxy
Medium (20 KB) 100 rq/s 100 rq/s 100 rq/s tie
Large (150 KB) 40 rq/s 25 rq/s <25 rq/s greyproxy

For the LLM API use case — where payloads are large and latency matters — greyproxy has a clear throughput advantage while also doing the most work (sniffing, recording, conversation assembly). The proxies that beat greyproxy on small payloads (where the overhead is dominated by TLS handshake, not body processing) cannot keep up when payloads grow.

Overhead at comfortable rates

The ceiling tests above push each proxy to its limit. The following charts and table show overhead at comfortable rates where all proxies perform well, measuring per-request latency differences.

[Diagram]

Figure 4: TTFB p50 by proxy. Bars = Direct, lines = Greyproxy / mitmproxy / goproxy. goproxy is fastest to first byte across all sizes; mitmproxy's TTFB explodes on large payloads (25ms) due to request buffering.

[Diagram]

Figure 5: Total latency p50 by proxy. Bars = Direct, lines = Greyproxy / mitmproxy / goproxy. On small/medium payloads all proxies are close; on large payloads greyproxy pulls ahead.

[Diagram]

Figure 6: Large payload proxy personality. Greyproxy is closest to Direct (ideal). mitmproxy is bottlenecked by TTFB. goproxy is fast to first byte but slow to finish.

Payload (Rate) Metric Direct Greyproxy mitmproxy goproxy*
Small (100 rq/s) TTFB p50 0.53ms 4.90ms 4.23ms 3.00ms
Total p50 1.19ms 4.93ms 4.39ms 4.61ms
Medium (100 rq/s) TTFB p50 0.26ms 3.82ms 6.06ms 1.39ms
Total p50 5.03ms 6.89ms 6.20ms 9.96ms
Large (25 rq/s) TTFB p50 1.00ms 5.77ms 25.25ms 2.18ms
Total p50 14.75ms 18.65ms 25.64ms 36.35ms

* goproxy large-payload data from stable run only — see Reliability below.

On small payloads, all proxies are within ~1ms — overhead is dominated by TLS MITM, not body processing. On medium payloads, goproxy has the fastest TTFB (1.4ms) but slowest total latency (10ms). On large payloads, greyproxy adds only +3.9ms total overhead despite full sniffing and recording; mitmproxy's TTFB explodes to 25ms (4.3× slower to first byte); goproxy is fast to first byte (2.2ms) but has the slowest total latency (36.4ms when stable).

Reliability

[Diagram]

Figure 7: Error rates across two independent runs. Bars = Greyproxy, lines = mitmproxy (flat at 0%) and goproxy. goproxy's large-payload instability is the outlier — 0% in one run, 79.6% in the next.

Proxy Small Medium Large Notes
mitmproxy 0 0 0 Zero errors across all scenarios and runs
Greyproxy 1 (0.03%) 3–16 (0.1–0.5%) 0 TLS handshake contention from connection-per-request model
goproxy 0 0 0 or 596 (79.6%) Catastrophic failure on large payloads in 1 of 2 runs

mitmproxy was the most reliable at tested rates — zero errors across all scenarios. Greyproxy had occasional errors from its connection-per-request model (each request through an HTTP CONNECT tunnel creates a new TLS handshake). goproxy's large-payload instability — 0% errors in one run, 79.6% in the next under identical conditions — suggests a resource exhaustion issue in the elazarl/goproxy library.

Key findings

For a detailed comparison between v0.4.0 (pre-optimization) and v0.5.0, see Appendix B: Optimization Impact.


Greyproxy Throughput Scaling

Detailed rate-stepping data for greyproxy alone, showing how throughput and errors evolve as load increases.

Large payload rate sweep (100 KB request, 50 KB response)

[Diagram]

Figure 8: Effective throughput tracks target perfectly through 40 rq/s. At 50 rq/s, errors appear and throughput drops to 48.9 rq/s.

Target Rate OK Errors Throughput TTFB p50 Total p50 Direct Total p50 Overhead
20 rq/s 600 0 (0%) 20.0 rq/s 6.1ms 18.6ms 17.1ms +1.5ms
30 rq/s 900 0 (0%) 30.0 rq/s 5.9ms 18.8ms 14.7ms +4.1ms
40 rq/s 1199 0 (0%) 40.0 rq/s 5.4ms 18.0ms 16.2ms +1.8ms
50 rq/s 1468 31 (2.1%) 48.9 rq/s 5.1ms 16.3ms 13.5ms +2.8ms

Clean ceiling (0% errors): 40 rq/s. Sustainable ceiling (<3% errors): 50 rq/s. TTFB remains stable at ~5-6ms across all rates, confirming the bottleneck is in the response body forwarding path, not connection setup.

Small & medium payload ceilings

[Diagram]

Small ceiling: 200 rq/s. Medium ceiling: 100 rq/s. Beyond these rates, TLS handshake contention in the connection-per-request model causes errors and throughput collapse.


Hardware Notes

These benchmarks have limitations that affect absolute numbers:

Factor Impact Implication
Rosetta 2 Go darwin/amd64 on ARM M4 Pro adds ~20-30% CPU overhead for TLS operations Native arm64 or x86_64 hardware would show lower per-request CPU cost
Shared machine Load generator, proxy, and server share 14 cores / 24 GB Contention inflates tail latencies; dedicated hardware would perform better
Loopback network All traffic on localhost Real deployments add RTT to every TLS handshake
NVMe SSD ~7 GB/s sequential writes Slower storage would surface SQLite I/O bottlenecks earlier
Connection model One TLS handshake per request through CONNECT tunnel Connection pooling (not yet implemented) would significantly raise all ceilings

The overhead numbers (~4ms TTFB, ~2-3ms body) are representative for deployments with dedicated hardware. Throughput ceilings may be higher on native x86_64 with connection pooling.


Methodology

Full benchmark tooling, raw results, and scripts: bench/ directory.


Final Thoughts

Greyproxy adds ~4-5ms of overhead per request. For LLM API traffic — where upstream response times are measured in seconds — this is invisible to the end user. A Claude API call that takes 3 seconds will take 3.005 seconds through greyproxy.

The proxy comfortably handles 100 rq/s for typical conversation payloads (10 KB) and 200 rq/s for small requests. For context: a single Claude conversation with 8K context (~10 KB payload) at normal human typing speeds generates <1 rq/s. The proxy can sustain 100 concurrent conversations at this payload size. These rates far exceed what individual developers or small teams generate.

Compared to alternatives, greyproxy offers the best latency profile on large payloads while providing features (request/response recording, conversation assembly, SQLite storage) that neither mitmproxy nor goproxy offer out of the box. The v0.5.0 buffered write optimization is critical to this result; without it, greyproxy falls behind both alternatives under load.

The main bottleneck is the connection-per-request model inherent to HTTP CONNECT tunnels. Connection pooling through CONNECT tunnels would raise all throughput ceilings, but is not yet implemented.


Appendix A: Test Configurations

Two build configurations were tested. For the payload sizes used in benchmarking (max 50 KB response), both configurations behave identically — the body recording threshold is never reached.

Configuration Buffered Writes Body Recording Threshold Used In
Config A (proxy comparison, ceilings, rate sweep) 8 KB bufio.Writer 2 MB (all test bodies recorded inline) Proxy comparison, throughput ceilings, large rate sweep
Config B (A/B comparison baseline) 8 KB bufio.Writer 2 MB (same) Greyproxy-only overhead, A/B comparison (v0.5.0 side)

Both configurations use greyproxy v0.5.0 with identical code. The body recording threshold (2 MB) means responses under 2 MB are captured inline during forwarding. Since the largest test response is 50 KB, selective body recording has no practical effect in these benchmarks.

The A/B comparison also tests v0.4.0 (main branch, pre-optimization) which lacks buffered writes entirely.


Appendix B: Optimization Impact (A/B Comparison)

Controlled A/B test: both main (v0.4.0) and optimizations (v0.5.0) binaries built from the same machine, benchmarked back-to-back with identical conditions. No branch switching during tests — both binaries prebuilt, proxy restarted fresh between branches.

Full A/B Matrix

[Diagram]

Figure A1: Error rates across the full A/B scenario matrix. Bars = main (v0.4.0), line = optimizations (v0.5.0). The optimizations have the biggest impact on small-large@100 (streaming responses).

Scenario Rate main errors main rq/s main Total p50 opt errors opt rq/s opt Total p50 Change
small-small 40 0 (0%) 40.0 5.7ms 0 (0%) 40.0 6.9ms identical
small-small 100 3 (0.1%) 99.9 4.5ms 2 (0.07%) 99.9 4.5ms identical
small-large 40 0 (0%) 40.0 17.7ms 1 (0.08%) 39.9 15.4ms -13% latency
small-large 100 334 (11.1%) 88.8 19.0ms 120 (4.0%) 95.9 17.4ms 64% fewer errors, +8% throughput
medium-medium 40 0 (0%) 40.0 7.8ms 0 (0%) 40.0 9.0ms identical
medium-medium 100 4 (0.13%) 99.8 7.7ms 1 (0.03%) 99.9 7.4ms 75% fewer errors
large-small 25 0 (0%) 25.0 9.2ms 0 (0%) 25.0 9.3ms identical
large-small 40 0 (0%) 40.0 6.4ms 0 (0%) 40.0 7.5ms identical
large-large 25 1 (0.13%) 24.9 18.1ms 0 (0%) 25.0 18.8ms 0% errors
large-large 40 3 (0.25%) 39.9 14.4ms 0 (0%) 40.0 13.8ms 0% errors, -4% latency

Where the optimizations matter

[Diagram]

Figure A2: Failed requests for small-request/large-response at 100 rq/s. Buffered writes cut failures by 64%.

[Diagram]

Figure A3: Both branches achieve full throughput at 40 rq/s, but main had 3 errors (0.25%) while optimizations had 0.

Key findings:

What changed

Two targeted optimizations to the response write path:

  1. Buffered response writes — an 8 KB bufio.Writer coalesces small writes (HTTP headers, SSE chunks) into single TLS records, reducing syscall overhead by ~70% per response.

  2. Selective body recording — responses larger than 2 MB skip the inline body capture during forwarding. Headers and metadata are still recorded; only the raw body bytes are omitted from the sniffing pipeline. This eliminates the bytes.Buffer allocation and copy overhead for very large streaming responses.


Appendix C: Extended Large Payload Rate Sweep

This rate sweep tested greyproxy at higher rates (25-200 rq/s) for large payloads, revealing behavior at extreme overload. Data from an earlier test run using the same v0.5.0 build.

[Diagram]

Figure A4: Effective throughput vs target rate for 100KB requests. Throughput tracks the target up to 50 rq/s, then diverges as errors climb. At 200 rq/s, TLS handshake collapse causes throughput to drop below the 150 rq/s level.

[Diagram]

Figure A5: Error rate is non-monotonic — 75 rq/s shows higher errors (19.2%) than 100 rq/s (5.6%) due to idle connection reuse through strained CONNECT tunnels.

Target Rate Effective Throughput Errors TTFB p50 Total p50
25 rq/s 25.0 rq/s 0% 5.6ms 17.6ms
50 rq/s 49.7 rq/s 0.6% 4.7ms 14.3ms
75 rq/s 60.6 rq/s 19.2% 5.0ms 15.8ms
100 rq/s 94.4 rq/s 5.6% 5.2ms 17.8ms
150 rq/s 111.1 rq/s 25.9% 5.7ms 19.7ms
200 rq/s 73.8 rq/s 34.0% 329.1ms 774.3ms

At 200 rq/s the proxy enters a degraded state with TLS handshake times rising from ~1ms to ~35ms median and TTFB collapsing to 329ms, but recovers without crashes.

Connection reuse anomaly

The error rate dip at 100 rq/s (5.6%) vs 75 rq/s (19.2%) is a real, reproducible effect caused by Go's HTTP client connection pooling interacting with CONNECT tunnel lifetimes:

This was confirmed by running each rate on a freshly restarted proxy instance, ruling out state carryover between steps.


Proxy comparison and throughput ceiling data from 2026-04-15. A/B comparison from 2026-04-14. All data uses greyproxy v0.5.0 with buffered writes.