Section 2

Typography Scale

Apple Silicon Memory Bandwidth Analysis

H1: Tiempos Text Web Semibold, 38px/1.15, §3.4

Bandwidth Ceilings

H2: Inter Semibold, 24px/1.3, §3.4

Quantization and Working Set Size

H3: Inter Semibold, 19px/1.4, §3.4

Measurement Notes

H4: Inter Medium, 16px/1.45, §3.4

The memory bandwidth ceiling on Apple Silicon's unified memory architecture is real but rarely the binding constraint for inference workloads at typical context lengths.

Body italic carries emphasis through shape, while body bold is reserved for moments that need stronger editorial pressure.

Footnote-size paragraph: Tiempos Text Web, 15px/1.5, §2.4

Meta text: Inter Medium, 13px/1.4, §2.4

Link Patterns

Base link: high-contrast text with matching underline

Base visited sample: muted text with muted underline

Accent as text marker: Reviews

Section 3

Color Tokens - Light Mode

--bg-canvas
oklch(98.5% 0.005 90)
#fbfaf7
15.59:1 vs --fg-default (AAA)
Spec reference: §2.3
--bg-elevated
oklch(100% 0 0)
#ffffff
16.27:1 vs --fg-default (AAA)
Spec reference: §2.3
--bg-subtle
oklch(96.5% 0.006 90)
#f4f2ed
14.55:1 vs --fg-default (AAA)
Spec reference: §2.3
--bg-code-inline
oklch(95% 0.008 90)
#efece5
13.79:1 vs --fg-default (AAA)
Spec reference: §2.3
--bg-code-block
oklch(97% 0.006 90)
#f7f5f0
14.94:1 vs --fg-default (AAA)
Spec reference: §2.3
--fg-default
oklch(20% 0.005 280)
#1f2024
15.59:1 vs --bg-canvas (AAA)
Spec reference: §2.3
--fg-muted
oklch(45% 0.01 280)
#5e6068
6.01:1 vs --bg-canvas (AA)
Spec reference: §2.3
--fg-subtle
oklch(60% 0.01 280)
#8a8c93
3.22:1 vs --bg-canvas (AA large)
Spec reference: §2.3
--border-default
oklch(88% 0.008 90)
#dcd9d1
11.54:1 vs --fg-default (AAA)
Spec reference: §2.3
--border-muted
oklch(93% 0.006 90)
#ebe9e2
13.40:1 vs --fg-default (AAA)
Spec reference: §2.3
--accent
oklch(58% 0.14 55)
#c87238
3.40:1 vs --bg-canvas (AA large)
Spec reference: §2.3
--accent-hover
oklch(50% 0.16 55)
#a85a25
4.84:1 vs --bg-canvas (AA)
Spec reference: §2.3
--accent-visited
oklch(45% 0.10 35)
#88553a
5.90:1 vs --bg-canvas (AA)
Spec reference: §2.3
--link-fg
var(--fg-default)
#1f2024
15.59:1 vs --bg-canvas (AAA)
Spec reference: §2.3
--link-fg-visited
var(--fg-muted)
#5e6068
6.01:1 vs --bg-canvas (AA)
Spec reference: §2.3
--signed-fg
oklch(45% 0.08 195)
#2c6470
5.32:1 vs --signed-bg (AA)
Spec reference: §2.3
--signed-bg
oklch(94% 0.02 195)
#dde8ec
5.32:1 vs --signed-fg (AA)
Spec reference: §2.3
--signed-border
oklch(75% 0.06 195)
#8db5be
7.35:1 vs --fg-default (AAA)
Spec reference: §2.3

Section 4

Color Tokens - Dark Mode

--bg-canvas
oklch(15% 0.008 280)
#111317
14.77:1 vs --fg-default (AAA)
Spec reference: §2.3
--bg-elevated
oklch(19% 0.008 280)
#1a1d22
13.42:1 vs --fg-default (AAA)
Spec reference: §2.3
--bg-subtle
oklch(22% 0.008 280)
#1f2228
12.66:1 vs --fg-default (AAA)
Spec reference: §2.3
--bg-code-inline
oklch(25% 0.008 280)
#25282f
11.72:1 vs --fg-default (AAA)
Spec reference: §2.3
--bg-code-block
oklch(18% 0.008 280)
#181a1f
13.83:1 vs --fg-default (AAA)
Spec reference: §2.3
--fg-default
oklch(92% 0.005 90)
#e8e5dd
14.77:1 vs --bg-canvas (AAA)
Spec reference: §2.3
--fg-muted
oklch(70% 0.008 90)
#a8a59c
7.55:1 vs --bg-canvas (AAA)
Spec reference: §2.3
--fg-subtle
oklch(55% 0.008 90)
#7d7a71
4.33:1 vs --bg-canvas (AA large)
Spec reference: §2.3
--border-default
oklch(30% 0.01 280)
#2c2f36
10.65:1 vs --fg-default (AAA)
Spec reference: §2.3
--border-muted
oklch(25% 0.008 280)
#24272d
11.89:1 vs --fg-default (AAA)
Spec reference: §2.3
--accent
oklch(75% 0.13 55)
#e8a877
9.12:1 vs --bg-canvas (AAA)
Spec reference: §2.3
--accent-hover
oklch(82% 0.11 55)
#f0bd95
11.00:1 vs --bg-canvas (AAA)
Spec reference: §2.3
--accent-visited
oklch(65% 0.08 35)
#b6906f
6.38:1 vs --bg-canvas (AA)
Spec reference: §2.3
--link-fg
var(--fg-default)
#e8e5dd
14.77:1 vs --bg-canvas (AAA)
Spec reference: §2.3
--link-fg-visited
var(--fg-muted)
#a8a59c
7.55:1 vs --bg-canvas (AAA)
Spec reference: §2.3
--signed-fg
oklch(78% 0.09 195)
#82c0cc
6.58:1 vs --signed-bg (AA)
Spec reference: §2.3
--signed-bg
oklch(25% 0.04 195)
#1f323a
6.58:1 vs --signed-fg (AA)
Spec reference: §2.3
--signed-border
oklch(45% 0.07 195)
#4a7681
3.97:1 vs --fg-default (AA large)
Spec reference: §2.3

Section 5

Prose Sample

The tempting story is that unified memory bandwidth explains every local inference result on Apple Silicon. It explains enough to be dangerous. A 128-bit LPDDR interface can make a quantized 8B model look cleanly bandwidth-bound when the batch is small, the prompt is already resident, and the runtime spends most of its time streaming weights. But that framing loses resolution as soon as context grows, because KV-cache traffic, scheduler overhead, and prompt ingestion begin to compete with the neat weights-per-token arithmetic. In practice, Q4_K_S often moves the bottleneck from raw bandwidth to a mix of cache locality and kernel dispatch. The result is not that bandwidth does not matter; it is that bandwidth is the floor, not the whole building. A useful benchmark therefore reports first-token latency, steady-state throughput, and tail behavior under repeated runs, with artifact hashes attached to the exact configuration. See the inference methodology for the signed-run contract that keeps these measurements inspectable after the article has aged.

Section 6

Code Blocks

from pathlib import Path
def tokens_per_second(tokens: int, seconds: float) -> float:
return tokens / seconds
result = tokens_per_second(tokens=4096, seconds=46.9)
print(f"{result:.1f} tok/s")
Terminal window
$ python bench/inference.py --model llama-3.1-8b-q4
Loading model: llama-3.1-8b-instruct-Q4_K_S.gguf
Prompt tokens: 2048
Throughput: 87.3 tok/s
{
"run_id": "2026-05-14-1830-llama",
"model": "llama-3.1-8b-instruct-Q4_K_S",
"median_tokens_per_second": 87.3,
"artifact_sha256": "abc123f09d4a"
}
from pathlib import Path
def tokens_per_second(tokens: int, seconds: float) -> float:
return tokens / seconds
result = tokens_per_second(tokens=4096, seconds=46.9)
print(f"{result:.1f} tok/s")
def bandwidth_bound(bytes_per_token, tokens_per_second):
return bytes_per_token * tokens_per_second
peak = bandwidth_bound(5_200_000, 91.2)
peak = bandwidth_bound(5_200_000, 87.3)
print(peak)

Section 7

Components

Reviews

Benchmarks

Methodology

The benchmark table below is tied to Signed Run · 2026-05-14-llama so the numbers remain auditable after runtime versions move on.

A benchmark without its artifact is a claim; a benchmark with its artifact is an invitation to inspect the claim.

Silicon Logic methodology notes
System Model Median tok/s P95 latency
M4 Max Llama 3.1 8B Q4_K_S 87.3 141 ms
M3 Max Llama 3.1 8B Q4_K_S 74.8 158 ms
M2 Ultra Mistral 7B Q4_K_M 92.5 132 ms
RTX 4090 Llama 3.1 8B Q4_K_S 154.2 96 ms
Framework Laptop Phi-3 Mini Q4 31.7 244 ms
Representative local inference measurements.
Theme

Section 8

Wordmark

Silicon Logic

Tiempos Headline at masthead scale. Falls back to system serif without local Tiempos files.