Section 1

Silicon Logic design system v1

The site header above is rendered with navigation and a functional theme toggle.

Section 2

Typography Scale

Apple Silicon Memory Bandwidth Analysis

H1: Tiempos Text Web Semibold, 38px/1.15, §3.4

Bandwidth Ceilings

H2: Inter Semibold, 24px/1.3, §3.4

Quantization and Working Set Size

H3: Inter Semibold, 19px/1.4, §3.4

Measurement Notes

H4: Inter Medium, 16px/1.45, §3.4

The memory bandwidth ceiling on Apple Silicon's unified memory architecture is real but rarely the binding constraint for inference workloads at typical context lengths.

Body italic carries emphasis through shape, while body bold is reserved for moments that need stronger editorial pressure.

Footnote-size paragraph: Tiempos Text Web, 15px/1.5, §2.4

Meta text: Inter Medium, 13px/1.4, §2.4

Link Patterns

Base link: high-contrast text with matching underline

Base visited sample: muted text with muted underline

Accent as text marker: Reviews

Prose link: high-contrast text with an accent underline. Visited prose sample: muted text with muted underline.

Section 3

Color Tokens - Light Mode

--bg-canvas

oklch(98.5% 0.005 90)

#fbfaf7

15.59:1 vs --fg-default (AAA)

Spec reference: §2.3

--bg-elevated

oklch(100% 0 0)

#ffffff

16.27:1 vs --fg-default (AAA)

Spec reference: §2.3

--bg-subtle

oklch(96.5% 0.006 90)

#f4f2ed

14.55:1 vs --fg-default (AAA)

Spec reference: §2.3

--bg-code-inline

oklch(95% 0.008 90)

#efece5

13.79:1 vs --fg-default (AAA)

Spec reference: §2.3

--bg-code-block

oklch(97% 0.006 90)

#f7f5f0

14.94:1 vs --fg-default (AAA)

Spec reference: §2.3

--fg-default

oklch(20% 0.005 280)

#1f2024

15.59:1 vs --bg-canvas (AAA)

Spec reference: §2.3

--fg-muted

oklch(45% 0.01 280)

#5e6068

6.01:1 vs --bg-canvas (AA)

Spec reference: §2.3

--fg-subtle

oklch(60% 0.01 280)

#8a8c93

3.22:1 vs --bg-canvas (AA large)

Spec reference: §2.3

--border-default

oklch(88% 0.008 90)

#dcd9d1

11.54:1 vs --fg-default (AAA)

Spec reference: §2.3

--border-muted

oklch(93% 0.006 90)

#ebe9e2

13.40:1 vs --fg-default (AAA)

Spec reference: §2.3

--accent

oklch(58% 0.14 55)

#c87238

3.40:1 vs --bg-canvas (AA large)

Spec reference: §2.3

--accent-hover

oklch(50% 0.16 55)

#a85a25

4.84:1 vs --bg-canvas (AA)

Spec reference: §2.3

--accent-visited

oklch(45% 0.10 35)

#88553a

5.90:1 vs --bg-canvas (AA)

Spec reference: §2.3

--link-fg

var(--fg-default)

#1f2024

15.59:1 vs --bg-canvas (AAA)

Spec reference: §2.3

--link-fg-visited

var(--fg-muted)

#5e6068

6.01:1 vs --bg-canvas (AA)

Spec reference: §2.3

--signed-fg

oklch(45% 0.08 195)

#2c6470

5.32:1 vs --signed-bg (AA)

Spec reference: §2.3

--signed-bg

oklch(94% 0.02 195)

#dde8ec

5.32:1 vs --signed-fg (AA)

Spec reference: §2.3

--signed-border

oklch(75% 0.06 195)

#8db5be

7.35:1 vs --fg-default (AAA)

Spec reference: §2.3

Section 4

Color Tokens - Dark Mode

--bg-canvas

oklch(15% 0.008 280)

#111317

14.77:1 vs --fg-default (AAA)

Spec reference: §2.3

--bg-elevated

oklch(19% 0.008 280)

#1a1d22

13.42:1 vs --fg-default (AAA)

Spec reference: §2.3

--bg-subtle

oklch(22% 0.008 280)

#1f2228

12.66:1 vs --fg-default (AAA)

Spec reference: §2.3

--bg-code-inline

oklch(25% 0.008 280)

#25282f

11.72:1 vs --fg-default (AAA)

Spec reference: §2.3

--bg-code-block

oklch(18% 0.008 280)

#181a1f

13.83:1 vs --fg-default (AAA)

Spec reference: §2.3

--fg-default

oklch(92% 0.005 90)

#e8e5dd

14.77:1 vs --bg-canvas (AAA)

Spec reference: §2.3

--fg-muted

oklch(70% 0.008 90)

#a8a59c

7.55:1 vs --bg-canvas (AAA)

Spec reference: §2.3

--fg-subtle

oklch(55% 0.008 90)

#7d7a71

4.33:1 vs --bg-canvas (AA large)

Spec reference: §2.3

--border-default

oklch(30% 0.01 280)

#2c2f36

10.65:1 vs --fg-default (AAA)

Spec reference: §2.3

--border-muted

oklch(25% 0.008 280)

#24272d

11.89:1 vs --fg-default (AAA)

Spec reference: §2.3

--accent

oklch(75% 0.13 55)

#e8a877

9.12:1 vs --bg-canvas (AAA)

Spec reference: §2.3

--accent-hover

oklch(82% 0.11 55)

#f0bd95

11.00:1 vs --bg-canvas (AAA)

Spec reference: §2.3

--accent-visited

oklch(65% 0.08 35)

#b6906f

6.38:1 vs --bg-canvas (AA)

Spec reference: §2.3

--link-fg

var(--fg-default)

#e8e5dd

14.77:1 vs --bg-canvas (AAA)

Spec reference: §2.3

--link-fg-visited

var(--fg-muted)

#a8a59c

7.55:1 vs --bg-canvas (AAA)

Spec reference: §2.3

--signed-fg

oklch(78% 0.09 195)

#82c0cc

6.58:1 vs --signed-bg (AA)

Spec reference: §2.3

--signed-bg

oklch(25% 0.04 195)

#1f323a

6.58:1 vs --signed-fg (AA)

Spec reference: §2.3

--signed-border

oklch(45% 0.07 195)

#4a7681

3.97:1 vs --fg-default (AA large)

Spec reference: §2.3

Section 5

Prose Sample

The tempting story is that unified memory bandwidth explains every local inference result on Apple Silicon. It explains enough to be dangerous. A 128-bit LPDDR interface can make a quantized 8B model look cleanly bandwidth-bound when the batch is small, the prompt is already resident, and the runtime spends most of its time streaming weights. But that framing loses resolution as soon as context grows, because KV-cache traffic, scheduler overhead, and prompt ingestion begin to compete with the neat weights-per-token arithmetic. In practice, Q4_K_S often moves the bottleneck from raw bandwidth to a mix of cache locality and kernel dispatch. The result is not that bandwidth does not matter; it is that bandwidth is the floor, not the whole building. A useful benchmark therefore reports first-token latency, steady-state throughput, and tail behavior under repeated runs, with artifact hashes attached to the exact configuration. See the inference methodology for the signed-run contract that keeps these measurements inspectable after the article has aged.

Section 6

Code Blocks

from pathlib import Path

def tokens_per_second(tokens: int, seconds: float) -> float:
    return tokens / seconds

result = tokens_per_second(tokens=4096, seconds=46.9)
print(f"{result:.1f} tok/s")

$ python bench/inference.py --model llama-3.1-8b-q4
Loading model: llama-3.1-8b-instruct-Q4_K_S.gguf
Prompt tokens: 2048
Throughput: 87.3 tok/s

{
  "run_id": "2026-05-14-1830-llama",
  "model": "llama-3.1-8b-instruct-Q4_K_S",
  "median_tokens_per_second": 87.3,
  "artifact_sha256": "abc123f09d4a"
}

1
from pathlib import Path
2

3
def tokens_per_second(tokens: int, seconds: float) -> float:
4
    return tokens / seconds
5

6
result = tokens_per_second(tokens=4096, seconds=46.9)
7
print(f"{result:.1f} tok/s")

def bandwidth_bound(bytes_per_token, tokens_per_second):
    return bytes_per_token * tokens_per_second

peak = bandwidth_bound(5_200_000, 91.2)
peak = bandwidth_bound(5_200_000, 87.3)
print(peak)

Section 7

Components

Reviews

Apple Silicon Memory Bandwidth Analysis

Why peak bandwidth matters for Llama inference, and where it stops explaining the result.

Douglas Vargas May 14, 2026 18 min read

Reviews

Benchmarks

Methodology

The benchmark table below is tied to Signed Run · 2026-05-14-llama so the numbers remain auditable after runtime versions move on.

A benchmark without its artifact is a claim; a benchmark with its artifact is an invitation to inspect the claim.
Silicon Logic methodology notes

System	Model	Median tok/s	P95 latency
M4 Max	Llama 3.1 8B Q4_K_S	87.3	141 ms
M3 Max	Llama 3.1 8B Q4_K_S	74.8	158 ms
M2 Ultra	Mistral 7B Q4_K_M	92.5	132 ms
RTX 4090	Llama 3.1 8B Q4_K_S	154.2	96 ms
Framework Laptop	Phi-3 Mini Q4	31.7	244 ms

Representative local inference measurements.

Section 8

Wordmark

Silicon Logic

Tiempos Headline at masthead scale. Falls back to system serif without local Tiempos files.

Section 9

Footer

The site footer below is rendered with feed links, a duplicate theme toggle, about link, and copyright notice.