Language / 语言
BIIC: Bio-Inspired Information Cell
A geometric algebra framework for lossless information representation in language models
📐 Cl(4,1) Conformal GA ✅ Phase 1 Complete ✅ Phase 2 Complete 🔄 Phase 3 Running 📋 Phase 4 Planned

🔴 The Problem with Tokens

Every current language model compresses all semantics into a single flat vector — and that vector gets overwritten layer by layer. There is no mechanism to distinguish what a token originally meant from what inference added to it. The consequences are structural, not incidental.

💥
Irreversible semantic loss
Deep layers overwrite original semantics. Residual connections are engineering patches, not mathematical guarantees.
🌊
Information overload
Residual connections only add, never subtract. Irrelevant intermediate states accumulate indefinitely.
📉
Long-context degradation
No active forgetting mechanism. Performance systematically degrades as sequence length grows.

💡 Key Insight: Learn from DNA

DNA simultaneously achieves three things that tokens cannot: permanent genome preservation, dynamic epigenetic read/write, and active erasure of outdated marks. We map each mechanism directly to a mathematical structure.

🧬 DNA Architecture

Genome (immutable)
Permanent identity, never overwritten
Epigenome (read/write)
Dynamic context, updated per cell state
TET demethylase
Active erasure of outdated methylation marks

⚡ BIIC Architecture

Grade-0 (invariant core)
Algebraically invariant under any sandwich product — theorem, not heuristic
Grade-1~4 (equivariant)
Evolves with inference context, carries reasoning state
GradeAwareEraser
Controlled decay on equivariant grades only; grade-0 change = 0.0 (exact)

The critical property: in Cl(4,1) conformal geometric algebra, the grade-0 scalar is algebraically invariant under the sandwich product RMR̃ for any rotor R. This is not an approximation — it is a theorem. We use this structure to separate what a token is from what inference knows about it.

📊 Results

Phase 1 — Mathematical Verification ✅

Grade-0 invariance after 100 sandwich products
Fig 1. Grade-0 error after 100 consecutive sandwich product transformations — stays at 10⁻⁶ level across 3 independent seeds. Threshold: 10⁻⁴.
Grade-0 invariance error (100 transforms, 3 seeds)
6.56×10⁻⁶
± 4.95×10⁻⁶ · threshold: 10⁻⁴
Multi-channel leakage (C=8 channels)
0.0
Exact zero — channels are fully independent
Eraser effect on grade-0
0.0
50 consecutive erase ops — grade-0 unchanged
Gradient flow ratio (10-layer chain)
0.55
Healthy range: 0.1 – 10

Phase 2 — Encoding-Decoding Pipeline ✅

AllGradeDecoder training curve
Fig 2. AllGradeDecoder overfitting test — loss converges from 7.08 to 0.012 (99.8% improvement, 3 seeds, std < 0.0001).
Grade separation after training
Fig 3. Grade L2 norms after 300 training steps — grade-2 is most active (10D, relational information), consistent with theory. Separation emerges without explicit supervision.
All-grade vs. grade-0 only decoding
5.3×
0.006 vs 0.032 · equivariant grades carry real information
Grade-0 after 6 inference layers
0.0
Exact zero change — end-to-end training confirmed
Token discrimination via grade-0
Fig 4. Average cosine similarity between different tokens' grade-0 representations: 0.029 ± 0.013 (near-orthogonal). Grade-0 strongly discriminates token identity.

Phase 4 Dry Run — Architecture Validated ✅

Parameters (n_channels=8)
10M
Full pipeline: encoder + SlowFast + DualCodebook
Peak VRAM (n_channels=8)
295MB
Fixed size — does not grow with sequence length

🗺 Roadmap

Phase Goal Status
Phase 1 Mathematical verification of Cl(4,1) properties (invariance, equivariance, Eraser) ✅ Complete
Phase 2 Encoding-decoding pipeline: TokenToIC → BIICLayer → AllGradeDecoder ✅ Complete
Phase 3 6-group controlled experiment — H1: geometry vs. orthogonality · H2: equivariant structure vs. dimensionality · H3: Eraser on long sequences 🔄 Running
Phase 4 MVP language model: SlowFastBIIC + DualCodebook, WikiText-103, no residual / no KV cache 📋 Planned

⚡ What This Enables

🔒 Lossless long-context
Grade-0 preserves original semantics regardless of inference depth — algebraically guaranteed, not approximated.
💾 No KV Cache
Mutable state replaces key-value storage. Memory footprint is fixed — does not grow with sequence length.
🔍 Built-in interpretability
Grade decomposition separates "what the token is" from "what inference knows". Each grade has a distinct role.
🌐 Natural multimodal alignment
Different modalities map into the same algebra space. Grade-0 cores are naturally comparable across text, image, and audio.
📐 O(L) complexity
SlowFast architecture eliminates quadratic attention. Slow network updates every K steps; fast network reads every step.
🧹 Active forgetting
GradeAwareEraser decays equivariant grades toward semantic prior. Information entropy stays bounded in long inference.

📍 How We Differ

BIIC is not a new transformer variant. It replaces the information carrier itself — the token — before any transformer-style processing occurs.

ApproachWhat they replaceInput/outputInvariance guarantee
GATr (2023) Attention mechanism Still tokens E(3) equivariance only
Versor (2026) Internal computation Still tokens SE(3) equivariance only
FoldToken (2024) Protein structure tokens Domain-specific SE(3) invariant encoder
BIIC (ours) The information carrier itself Multivector (invariant + equivariant) Grade-0 invariant by theorem · Eraser preserves invariant core exactly

🚀 Quick Start

bash · Phase 1 verification (CPU, ~2 min)
pip install torch numpy scipy matplotlib

# Phase 1: Mathematical verification (CPU is enough)
python tests/test_phase1.py

# Phase 2: Pipeline verification (CPU, ~10 min)
python tests/test_decoder_basic.py
python tests/test_encoder.py
python tests/test_full_pipeline.py

📁 Repository Structure

BIIC/
BIIC/
├── src/
│   ├── clifford_cl41.py      # Cl(4,1) Golden Reference (never delete)
│   ├── rotor_utils.py        # Rotors & sandwich products
│   ├── eraser_ops.py         # GradeAwareEraser
│   ├── token_to_ic.py        # TokenToImmutableCore encoder
│   ├── all_grade_decoder.py  # DualCodebook decoder
│   ├── mutable_state.py      # BIICLayer (Writer + Eraser)
│   └── biic_loss.py          # Annealed auxiliary losses
├── tests/                    # 10+11 validation tests
├── results/                  # JSON data, 3 seeds each phase
├── figures/                  # Paper figures (fig1–fig4)
└── LICENSE

📚 References

Brehmer et al. (2023). Geometric Algebra Transformer (GATr). NeurIPS 2023.
Huy & Hirst (2026). Versor: A Geometric Sequence Architecture.
Ji (2026). CliffordNet: All You Need is Geometric Algebra.
Anonymous (2026). Toward a Functional Geometric Algebra for NLP.
Dasgupta et al. (2026). Invariant Features in Language Models.
Wu & Zhang (2017). TET-mediated active DNA demethylation. Nature Reviews Genetics.

💬 Collaboration

🧬
Val Huang
Independent Researcher · BIIC Author
💬 WeChat: llmbbs

Interested in collaborating on the paper, contributing experiments, or exploring new information-theoretic paradigms? Reach out.

📄 Citation

@misc{huang2026biic,
  title = {Bio-Inspired Information Cell: A Geometric Algebra Framework for Lossless Information Representation in Language Models},
  author = {Huang, Zhongchang},
  year = {2026},
  note = {Phase 1–2 complete, Phase 3–4 ongoing. }
}

📜 License

Business Source License 1.1 — free for non-production and research use. See LICENSE for details.

BIIC:生物启发信息胞
基于几何代数的语言模型无损信息表示框架
📐 Cl(4,1) 保形几何代数 ✅ Phase 1 完成 ✅ Phase 2 完成 🔄 Phase 3 运行中 📋 Phase 4 计划中

🔴 Token 的根本缺陷

当前所有语言模型都把语义压进一个扁平向量,每经过一层推理就被覆盖一次。没有任何机制能区分这个 token 原本是什么意思推理过程给它附加了什么信息。这是结构性缺陷,不是可以调参解决的问题。

💥
语义损耗不可逆
深层网络覆盖原始语义。残差连接是工程补丁,不是数学保证。
🌊
信息过载
残差连接只加不减,无关中间状态无限累积,没有主动清除机制。
📉
长推理系统性退化
随序列长度增加,KV Cache线性膨胀,推理质量系统性下降。

💡 核心思路:向DNA学习

DNA同时做到了三件 token 做不到的事:永久保存基因组动态读写表观标记主动擦除过时标记。我们把每个机制直接映射到一个数学结构。

🧬 DNA 架构

基因组(不可变)
永久的身份信息,从不被覆盖
表观基因组(可读写)
动态上下文,随细胞状态更新
TET 去甲基化酶
主动擦除过时的甲基化标记

⚡ BIIC 架构

Grade-0(不变核)
在任意 sandwich 积变换下代数严格不变——定理保证,非近似
Grade-1~4(等变分量)
随推理上下文动态演化,携带推理状态
GradeAwareEraser
仅对等变分量施加受控衰减;grade-0 变化量 = 0.0(精确)

关键性质:在 Cl(4,1) 保形几何代数中,grade-0 标量在旋转子 sandwich 积 RMR̃ 下代数严格不变。这不是近似——是定理。我们用这个结构把token 是什么推理知道什么物理隔离。

📊 实验结果

Phase 1 — 数学验证 ✅

Grade-0不变性验证
图1. Grade-0 在100次连续 sandwich 积变换后的误差分布——3个随机种子均低于阈值 10⁻⁴,均值 6.56×10⁻⁶。
Grade-0 不变性误差(100次变换,3种子)
6.56×10⁻⁶
± 4.95×10⁻⁶ · 阈值:10⁻⁴
多通道间信息泄漏(C=8通道)
0.0
精确为零——通道间完全独立
Eraser 对 grade-0 的影响
0.0
50次连续 Eraser 操作后 grade-0 不变
梯度流比值(10层链)
0.55
健康范围:0.1 – 10

Phase 2 — 编解码链路 ✅

AllGradeDecoder训练曲线
图2. AllGradeDecoder 过拟合测试——loss 从 7.08 收敛至 0.012(改善 99.8%,3种子,std < 0.0001)。
Grade分离范数分布
图3. 训练300步后各 grade 的 L2 范数——grade-2 最活跃(10维,关系信息),与理论预期一致。分工自然涌现,无需显式监督。
全 grade vs 仅 grade-0 解码
5.3×
0.006 vs 0.032 · 等变分量携带真实独立信息
6层推理后 grade-0 变化量
0.0
精确为零——端到端训练中确认
Grade-0 token区分能力
图4. 不同 token 的 grade-0 余弦相似度均值:0.029 ± 0.013(接近正交)。grade-0 具备强 token 区分能力。

Phase 4 Dry Run — 架构验证 ✅

参数量(n_channels=8)
10M
完整链路:编码器 + SlowFast + DualCodebook
峰值显存(n_channels=8)
295MB
固定大小——不随序列长度增长

🗺 实验计划

阶段目标状态
Phase 1 Cl(4,1) 数学性质验证(不变性、等变性、Eraser) ✅ 完成
Phase 2 编解码链路:TokenToIC → BIICLayer → AllGradeDecoder ✅ 完成
Phase 3 6组对照实验 · H1:几何结构 vs 正交约束 · H2:等变结构 vs 维度 · H3:Eraser 在长序列上的效果 🔄 运行中
Phase 4 MVP 语言模型:SlowFastBIIC + DualCodebook,WikiText-103,无残差 / 无 KV Cache 📋 计划中

⚡ 能解决什么问题

🔒 无损长上下文
Grade-0 无论推理多深都保持原始语义——代数定理保证,不是近似。
💾 不需要 KV Cache
可变态替代键值存储,显存占用固定——不随序列长度增长。
🔍 内建可解释性
Grade 分解直接区分"token 是什么"vs"推理知道什么",每个 grade 有明确语义角色。
🌐 天然多模态对齐
不同模态映射到同一代数空间,grade-0 跨文字/图像/音频天然可比较,无需额外对齐训练。
📐 O(L) 复杂度
慢快分离架构消除二次方注意力。慢速网络每 K 步更新,快速网络每步读出。
🧹 主动遗忘
GradeAwareEraser 将等变分量受控衰减至语义先验,长推理中信息熵保持有界。

📍 与现有工作的本质区别

BIIC 不是新的 transformer 变体。它在任何 transformer 式处理发生之前,就替换了信息承载物本身——token。

方法替换了什么输入/输出不变性保证
GATr (2023) 注意力机制 仍然是 token 仅 E(3) 等变
Versor (2026) 内部计算 仍然是 token 仅 SE(3) 等变
FoldToken (2024) 蛋白质结构 token 领域特定 SE(3) 不变编码器
BIIC(我们) 信息承载物本身 多向量(不变 + 等变) Grade-0 由定理保证不变 · Eraser 精确保持不变核

🚀 快速开始

bash · Phase 1 验证(CPU,约2分钟)
pip install torch numpy scipy matplotlib

# Phase 1:数学验证(CPU 即可)
python tests/test_phase1.py

# Phase 2:链路验证(CPU,约10分钟)
python tests/test_decoder_basic.py
python tests/test_encoder.py
python tests/test_full_pipeline.py

📚 参考文献

Brehmer et al. (2023). Geometric Algebra Transformer (GATr). NeurIPS 2023.
Huy & Hirst (2026). Versor: A Geometric Sequence Architecture.
Ji (2026). CliffordNet: All You Need is Geometric Algebra.
Anonymous (2026). Toward a Functional Geometric Algebra for NLP.
Dasgupta et al. (2026). Invariant Features in Language Models.
Wu & Zhang (2017). TET 介导的主动 DNA 去甲基化. Nature Reviews Genetics.

💬 合作联系

🧬
Val Huang
独立研究者 · BIIC 作者
💬 微信:llmbbs

对这个方向感兴趣、愿意一起写论文、贡献实验或探索新范式的朋友,欢迎联系。

📄 引用

@misc{huang2026biic,
  title = {Bio-Inspired Information Cell: A Geometric Algebra Framework for Lossless Information Representation in Language Models},
  author = {Huang, Zhongchang},
  year = {2026},
  note = {Phase 1–2 完成,Phase 3–4 进行中。}
}

📜 许可证

Business Source License 1.1——研究和非生产使用免费。详见 LICENSE