STL as Intermediate Representation for LLM Code Generation: A Comparative Experiment
Report Type: Empirical Experiment Report Version: 1.0 Date: 2026-03-01 Authors: Wuko (experiment design & execution), Syn-claude (plan generation & evaluation) Organization: SCOS Lab (https://github.com/scos-lab) License: CC BY 4.0 Citation: SCOS Lab. (2026). STL as Intermediate Representation for LLM Code Generation: A Comparative Experiment. https://stl-lang.org/research/llm-code-generation-experiment
Abstract
This report presents a controlled experiment comparing three specification formats — Natural Language (NL), STL (Semantic Tension Language), and STLC (Semantic Tension Language for Code) — as inputs to an LLM code generation agent. The same application (MarkView, a cross-platform Markdown reader) was specified identically across all three formats, then each specification was provided to a separate instance of Google Gemini CLI to generate a complete implementation. The results reveal that STL produces the highest-quality output (68% feature completion), outperforming both STLC (53%) and NL (45%). More significantly, STL was the only format that enabled fully autonomous execution — the LLM completed the entire implementation without any human intervention. These findings suggest that STL occupies a “sweet spot” of structural formality that enhances LLM execution capability without the overhead penalties of over-specification.
1. Introduction
1.1 Research Question
Does the format of a software specification affect the quality of code produced by an LLM executing that specification?
As Large Language Models increasingly serve as code generation agents, the question of how to communicate requirements to these agents becomes critical. Traditional software engineering uses natural language specifications, but LLMs can also process structured formats. This experiment tests whether structured semantic representations (STL/STLC) provide measurable advantages over natural language when used as LLM-facing specifications.
1.2 Background
STL (Semantic Tension Language) is a knowledge representation format that structures information through directional semantic relations:
[Source] -> [Target] ::mod(key=value, ...)
STL represents “what exists” — entities, relationships, properties, and constraints — in a structured but human-readable format.
STLC (Semantic Tension Language for Code) extends STL with computational semantics, modeling software as executable flows with Entry/Exit nodes, branching, error paths, side effects, and validation chains. STLC represents “how it works” — the computational flows that transform inputs to outputs.
Natural Language (NL) represents the baseline — traditional prose-based specification describing features, requirements, and implementation guidance.
1.3 Hypothesis
H1 (Initial): STLC > STL > NL — More structured specifications produce better code.
H2 (Alternative): There exists an optimal level of structural formality. Too little structure (NL) leads to ambiguity; too much structure (STLC) may constrain or distract the LLM from practical implementation details.
2. Methodology
2.1 Experimental Design
Design Type: Between-subjects, single-factor experiment with three conditions.
Independent Variable: Specification format (NL, STL, STLC).
Dependent Variables:
- Feature completion rate (19-item checklist)
- Build success (compilability)
- Execution autonomy (human interventions required)
- Code quality (error handling, structure, type safety)
Controlled Variables:
- Same application (MarkView — a Markdown reader for Windows)
- Same feature set (19 features, identically specified across all formats)
- Same technology stack (Tauri 2 + React 19 + TypeScript + Zustand)
- Same LLM executor (Google Gemini CLI, same model version)
- Same execution environment (Windows, same machine)
- Same initial prompt: “Based on the provided plan, implement the complete MarkView application from scratch — a cross-platform Markdown reader built with Tauri 2 + React 19 + TypeScript, targeting the Windows App Store. Follow the plan exactly, implement all specified features, and produce production-ready code.”
- All three Gemini instances had identical memory state (including STL knowledge)
2.2 Application Under Test: MarkView
MarkView is a cross-platform Markdown reader targeting the Microsoft Store, with the following feature requirements:
| # | Feature | Category |
|---|---|---|
| 1 | Open file via dialog (Ctrl+O) | Core I/O |
| 2 | Open folder with tree browser (Ctrl+Shift+O) | Core I/O |
| 3 | GFM tables, task lists, strikethrough | Rendering |
| 4 | KaTeX math expression rendering | Rendering |
| 5 | Code block syntax highlighting | Rendering |
| 6 | Table of Contents with scroll spy | Navigation |
| 7 | Drag & drop file opening | Core I/O |
| 8 | In-document search with case toggle | Search |
| 9 | Theme toggle with persistence | Preferences |
| 10 | Font size control with persistence | Preferences |
| 11 | External links open in system browser | Navigation |
| 12 | Local image rendering via asset protocol | Rendering |
| 13 | File watcher with auto-reload + scroll preservation | Core I/O |
| 14 | Print mode (Ctrl+P) | Output |
| 15 | Sidebar toggle with animation | UI |
| 16 | Recent files list with persistence | Preferences |
| 17 | File argument launch (Open with…) | Platform |
| 18 | Dynamic window title | Platform |
| 19 | MSIX build for Microsoft Store | Platform |
This application was chosen for its:
- Moderate complexity: ~1,200-1,500 LOC across 17-22 source files
- Full-stack scope: Rust backend + TypeScript frontend + CSS styling + configuration
- Real-world applicability: A genuine useful application, not a contrived benchmark
- Measurable features: Each feature is independently verifiable as PASS/PARTIAL/FAIL
2.3 Specification Formats
2.3.1 Natural Language (NL)
Traditional prose-based specification including:
- Background description and tech stack overview
- Project directory structure
- Feature descriptions in paragraph/bullet format
- TypeScript interface definitions (embedded code)
- Keyboard shortcuts table
- Tauri configuration example
- Build commands and validation checklist
Characteristics: Describes “what to build” in human-readable narrative. High readability, low structural formality.
2.3.2 STL (Semantic Tension Language)
Structured knowledge representation using anchor-path-modifier syntax:
[Feature:OpenFile] -> [Action:ShowDialog] ::mod(
rule="causal",
confidence=0.99,
trigger="Ctrl+O",
intent="Open native file picker filtered to .md/.markdown/.mdx"
)
Characteristics: Same information as NL but organized into typed semantic relations. Each feature, component, state field, and dependency is an explicit node with typed connections and metadata. Describes “what exists and how things relate.”
2.3.3 STLC (Semantic Tension Language for Code)
Compiler interface protocol with computational flow semantics:
[Entry_OpenFile] -> [Input_KeyboardShortcut] ::mod(
trigger="Ctrl+O keydown event",
preventDefault=true
)
[Input_KeyboardShortcut] -> [Query_InvokeDialog] ::mod(
query="invoke('open_file_dialog')",
async=true,
side_effect="io"
)
[Query_InvokeDialog] -> [Branch_FileSelected] ::mod(
condition="result is Some(path) or None"
)
[Branch_FileSelected] -> [Exit_Cancelled] ::mod(
on_cancel="None -> return, no state change"
)
Characteristics: Models each feature as an executable computational flow with Entry/Exit points, branching, error paths, side effect marking, async annotations, and data type specifications. Describes “how it flows step by step.”
Contains 15 frontend computational flows + 6 Rust backend command specifications.
2.4 Evaluation Criteria
Each of the 19 features was scored as:
- PASS (1.0): Feature fully implemented and functional
- PARTIAL (0.5): Core logic present but missing sub-requirements (e.g., persistence, animation, edge cases)
- FAIL (0.0): Feature not implemented, broken, or non-functional
Additional qualitative assessments:
- Build success: Can the project compile without errors?
- Execution autonomy: How many human interventions were required?
- Error handling: Poor / Adequate / Good / Excellent
- Code structure: Poor / Adequate / Good / Excellent
2.5 Evaluation Process
Each implementation was evaluated by an independent AI agent (Claude Opus 4.6) that:
- Listed the complete directory structure
- Read every source file (no file skipped)
- Assessed each of the 19 features against the checklist
- Evaluated code quality dimensions
- Identified critical bugs, security issues, and missing functionality
All three evaluations ran in parallel to ensure consistent evaluation standards.
3. Results
3.1 Feature Completion Scores
| # | Feature | NL | STL | STLC |
|---|---|---|---|---|
| 1 | Open File + render | 0.5 | 1.0 | 0.5 |
| 2 | Open Folder + tree | 0.5 | 1.0 | 1.0 |
| 3 | GFM rendering | 1.0 | 1.0 | 0.5 |
| 4 | KaTeX math | 1.0 | 1.0 | 0.0 |
| 5 | Code syntax highlighting | 1.0 | 1.0 | 0.0 |
| 6 | TOC + scroll spy | 0.5 | 1.0 | 0.5 |
| 7 | Drag & drop | 0.5 | 1.0 | 1.0 |
| 8 | In-document search | 0.5 | 0.0 | 0.5 |
| 9 | Theme persistence | 0.0 | 1.0 | 1.0 |
| 10 | Font size persistence | 0.0 | 1.0 | 1.0 |
| 11 | External links | 0.0 | 1.0 | 1.0 |
| 12 | Local images | 0.5 | 0.5 | 0.5 |
| 13 | File watcher + scroll | 0.5 | 0.5 | 0.5 |
| 14 | 1.0 | 0.5 | 1.0 | |
| 15 | Sidebar animation | 0.5 | 0.5 | 0.5 |
| 16 | Recent files persistence | 0.5 | 1.0 | 0.5 |
| 17 | File argument launch | 0.0 | 0.0 | 0.0 |
| 18 | Window title | 0.0 | 0.0 | 0.0 |
| 19 | MSIX build | 0.0 | 0.0 | 0.0 |
| Total | 8.5/19 | 13.0/19 | 10.0/19 | |
| Percentage | 44.7% | 68.4% | 52.6% |
3.2 Summary Comparison
| Metric | NL | STL | STLC |
|---|---|---|---|
| Feature Score | 8.5/19 (45%) | 13/19 (68%) | 10/19 (53%) |
| Full PASS | 4 | 11 | 6 |
| PARTIAL | 9 | 4 | 8 |
| FAIL | 6 | 4 | 5 |
| Compiles | NO | YES | YES |
| Source Files | 17 | 21 | 22 |
| Lines of Code | 1,203 | 1,483 | 1,198 |
| Human Interventions | 1+ | 0 | 1 |
| Error Handling | Poor | Adequate | Adequate |
| Code Structure | Adequate | Good | Good |
| Persistence Impl. | None | Full | Partial |
| Security (CSP) | Disabled | Disabled | Disabled |
3.3 Execution Autonomy
This metric captures whether the LLM could complete the entire implementation without human intervention — a critical factor for LLM agent reliability in production workflows.
| Event | NL | STL | STLC |
|---|---|---|---|
| Completed autonomously | NO | YES | NO |
| Interruption type | LLM asked human to make shell selections | None | LLM accidentally deleted plan file with --force flag |
| Human message required | ”所有shell的选择都你自己决定” (Make all shell choices yourself) | None | ”刚刚的STLC文件怎么不见了?” (The STLC file disappeared?) |
| Impact on flow | LLM paused execution, waited for human guidance | None | Context disruption, plan had to be regenerated |
3.4 Critical Failures by Version
NL Version — Build-Blocking Issues:
- Missing
build.rs(required by Tauri 2) — project cannot compile - Broken regex literal in
WelcomePage.tsx— TypeScript compilation error tauri-plugin-storedeclared in dependencies but never imported/used — all persistence features non-functional- External link handler code commented out
STL Version — Functional Gaps:
- Search feature completely unimplemented (UI shell exists, zero search logic)
- No file argument handling, no dynamic window title, no MSIX config
STLC Version — Implementation Oversights:
highlight.jsCSS never imported — syntax highlighting visually broken despite correct JS pipelinekatex.min.cssnever imported — math rendering visually broken despite correct JS pipeline- Scroll container confusion (
windowvs<main overflow-y-auto>) — scroll spy and scroll preservation both fail silently - Windows path separator bug (
lastIndexOf("/")fails for backslash paths)
3.5 Features All Three Failed
Three features were universally failed across all versions:
- #17 File argument launch — None implemented
std::env::args()or Tauri’s file-open event handling - #18 Dynamic window title — None called
setTitle()or updateddocument.title - #19 MSIX build configuration — None configured AppX/MSIX bundle targets with publisher identity
These represent platform-integration features that require specific API knowledge beyond general web development.
4. Analysis
4.1 The Structural Formality Spectrum
The results map to a clear pattern along the structural formality spectrum:
Formality Level: LOW ◄──────────────────────────────► HIGH
│ │ │
Format: NL STL STLC
Score: 45% 68% 53%
Autonomy: Poor Perfect Good
│ │ │
▼ ▼ ▼
Under-specified "Sweet Spot" Over-specified
Under-specification (NL): Insufficient structural guidance causes the LLM to:
- Lack confidence in implementation choices → seeks human confirmation
- Miss implicit requirements → dependencies declared but never used
- Produce syntactically incorrect code → broken regex, missing build files
- Generate the least code (1,203 LOC) with the most failures
Optimal specification (STL): Structured enough to convey clear requirements while leaving implementation freedom:
- LLM proceeds with full confidence → zero human interventions
- Clear entity-relationship structure guides feature implementation
- Produces the most code (1,483 LOC) with the most features working
- Correctly implements complex integrations (plugin-store persistence, IntersectionObserver scroll spy)
Over-specification (STLC): Detailed computational flows cause the LLM to:
- Focus on computational logic while overlooking non-computational requirements (CSS imports, configuration files)
- Exhibit overconfidence → uses
--forceflags that destroy the plan file - Correctly implement data flows but miss “obvious” practical details
- Produce less total code (1,198 LOC) despite having the most detailed specification
4.2 What STL Gets Right
STL’s advantage appears to stem from three properties:
1. Explicit Entity Relationships
STL forces every feature, component, and dependency to be an explicit named entity with typed connections. This prevents the “declared but unused” pattern seen in the NL version (where tauri-plugin-store was in package.json but never imported).
[Feature:ThemePersistence] -> [Plugin:TauriStore] ::mod(
rule="causal",
operation="save/load theme preference"
)
The LLM sees TauriStore as a required entity connected to ThemePersistence, making it harder to “forget” the connection during implementation.
2. Structured but Not Prescriptive
STL describes WHAT things are and HOW they relate, but not the step-by-step HOW to implement them. This gives the LLM freedom to choose implementation patterns while ensuring all requirements are captured.
Compare:
- NL: “Theme should persist across restarts” (easy to overlook in a wall of text)
- STL:
[ThemePersistence] -> [TauriStore] ::mod(...)(explicit relationship, hard to miss) - STLC: 15 lines specifying Entry → Query → Mutation → Exit flow (detailed but shifts attention to flow mechanics)
3. Confidence Without Constraint
The STL format appears to provide enough structure for the LLM to make confident decisions independently (zero human interventions), without constraining implementation choices to the point where practical details are overlooked.
4.3 Why STLC Underperformed
STLC’s lower-than-expected performance reveals an important insight about LLM cognition:
Attention Displacement: STLC’s computational flow descriptions (Entry → Input → Validate → Transform → Branch → Error → Output → Exit) occupy significant attention budget. The LLM faithfully implements the specified computational logic but misses practical requirements that fall outside the flow model:
- CSS imports are not “computational operations” → omitted
- Configuration files are not “data flows” → incomplete
- Path separator conventions are not “branching conditions” → incorrect
The STLC Paradox: More detail in the specification led to less complete implementation, because the detail was in the wrong dimension (computational flow) rather than the dimension that needed it (practical integration).
4.4 The Autonomy Factor
Perhaps the most practically significant finding is the correlation between specification format and execution autonomy:
| Format | Autonomy | Explanation |
|---|---|---|
| NL | Low — asked human for shell choices | Ambiguity in NL created decision paralysis |
| STL | Perfect — fully autonomous | Clear structure enabled confident decisions |
| STLC | Good — but caused self-inflicted incident | Overconfidence led to destructive --force usage |
In production LLM agent workflows, execution autonomy directly translates to:
- Reduced human oversight requirements
- Faster end-to-end completion
- Lower cost per task
- Scalability of LLM agent teams
4.5 Implications for LLM-to-LLM Communication
When one LLM generates a plan for another LLM to execute (a common pattern in multi-agent systems), the specification format becomes a communication protocol between agents. This experiment suggests:
- Natural language is suboptimal for LLM-to-LLM communication — it introduces the same ambiguity problems as human-to-human communication
- STL provides an effective intermediate representation — structured enough to be unambiguous, flexible enough to not over-constrain
- Over-detailed protocols (STLC) can be counterproductive for autonomous execution — the receiver spends too much attention parsing the protocol rather than solving the problem
- The optimal communication format depends on the receiver’s capability — STLC may be valuable for less capable models (like using a senior engineer’s STLC to guide a junior developer), while STL is optimal for capable models
5. Threats to Validity
5.1 Internal Validity
- N=1 per condition: Each specification format was tested with a single LLM execution. Results may vary across runs due to LLM non-determinism.
- STLC execution disruption: The STLC version’s plan file was accidentally deleted during execution, requiring human intervention and plan regeneration. This disruption may have negatively impacted the STLC score. Without this incident, STLC may have scored closer to STL.
- Plan quality: All three plans were generated by the same author (Syn-claude). A different plan author might produce different quality levels across formats.
- Evaluation bias: Evaluations were conducted by Claude Opus 4.6, which may have systematic biases in feature assessment.
5.2 External Validity
- Single LLM executor: Only Google Gemini CLI was tested. Results may differ with Claude Code, GPT, or other LLM agents.
- Single application type: MarkView is a desktop application with a specific tech stack. Results may not generalize to web services, CLI tools, data pipelines, or other application types.
- Specific complexity level: The application is moderate complexity (~1,200-1,500 LOC). Results may differ for trivial scripts or large-scale systems.
5.3 Construct Validity
- Feature checklist granularity: The 19-item checklist may not capture all quality dimensions. Code maintainability, performance, and accessibility were not formally assessed.
- PASS/PARTIAL/FAIL scoring: The 1.0/0.5/0.0 scoring is coarse. Two PARTIAL features may have very different levels of completeness.
6. Related Work
6.1 Specification Languages for Code Generation
Prior work on structured specifications for LLM code generation includes:
- OpenAPI/Swagger for REST API generation — domain-specific, not general-purpose
- UML-to-code generation — long history but poor LLM integration
- Prompt engineering patterns (chain-of-thought, few-shot examples) — orthogonal to this work
STL differs from these approaches in being:
- Domain-agnostic — applicable to any software domain
- LLM-native — designed for LLM consumption from the ground up
- Semantically rich — carries provenance, confidence, and relationship metadata
- Incrementally formal — NL → STL → STLC represents a spectrum of formality
6.2 Multi-Agent Communication Protocols
Research on LLM agent communication has explored:
- Natural language message passing (AutoGen, CrewAI) — our NL baseline
- JSON-based structured messages — rigid, no semantic metadata
- Function call schemas — capability description, not requirement specification
STL offers a semantic middle ground — more structured than natural language, more flexible than JSON schemas, and capable of carrying epistemological metadata (confidence, provenance, rules).
7. Conclusions
7.1 Key Findings
-
STL outperforms both NL and STLC as a specification format for LLM code generation (68% vs 45% vs 53% feature completion).
-
STL enables fully autonomous LLM execution — the only format where the LLM completed the entire implementation without human intervention.
-
Over-specification can be counterproductive — STLC’s detailed computational flows displaced LLM attention from practical implementation details (CSS imports, configuration), resulting in “structurally correct but practically broken” code.
-
Specification format affects not just output quality, but execution behavior — NL causes decision paralysis (asks for human help), STL enables confident autonomous execution, STLC causes overconfidence (destructive
--forceusage). -
There exists a “sweet spot” of structural formality — enough structure to eliminate ambiguity (unlike NL), but not so much that it constrains the executor’s problem-solving (unlike STLC).
7.2 Practical Recommendations
| Scenario | Recommended Format | Rationale |
|---|---|---|
| Capable LLM executing autonomously | STL | Optimal balance of structure and freedom |
| Less capable LLM or human junior developer | STLC | Detailed flows compensate for lower capability |
| Quick prototyping, solo developer | NL | Lowest overhead, acceptable for simple tasks |
| LLM-to-LLM communication in multi-agent systems | STL | Unambiguous, semantically rich, not over-constraining |
| Safety-critical systems | STLC | Explicit error paths and validation chains |
7.3 Future Work
- Larger sample size: Repeat the experiment N=10 per condition with randomized LLM instances to establish statistical significance.
- Cross-model testing: Test with Claude Code, GPT, and other LLM agents to assess generalizability.
- Complexity scaling: Test across trivial (100 LOC), moderate (1,000 LOC), and complex (10,000+ LOC) applications.
- STLC with practical annotations: Extend STLC to include non-computational requirements (CSS dependencies, configuration, platform conventions) and re-test.
- Hybrid format: Test STL specifications with selective STLC flows for complex features only.
- Multi-agent pipeline: Test a two-stage pipeline where one LLM generates STL from NL requirements, and another LLM generates code from STL.
8. Data Availability
8.1 Artifacts
| Artifact | Location |
|---|---|
| NL Plan | plan_natural_language.md |
| STL Plan | plan_stl.md |
| STLC Plan | plan_stlc.md |
| NL Implementation | PLAN_natural_language/markview/ |
| STL Implementation | PLAN_STL/markview/ |
| STLC Implementation | PLAN_STLC/ |
8.2 Reproduction Steps
- Prepare three identical LLM agent instances with STL knowledge in memory
- Provide each instance with one plan file and the execution prompt
- Allow each instance to complete autonomously (record any human interventions)
- Evaluate each implementation against the 19-item feature checklist
- Compare feature scores, build success, autonomy, and code quality
Appendix A: Detailed LOC Breakdown
A.1 NL Version (1,203 LOC, 17 files)
| File | LOC | Category |
|---|---|---|
| App.tsx | 120 | Layout |
| FolderTree.tsx | 75 | Component |
| MarkdownView.tsx | 47 | Component |
| SearchBar.tsx | 145 | Component |
| Sidebar.tsx | 40 | Component |
| TOC.tsx | 38 | Component |
| Toolbar.tsx | 104 | Component |
| WelcomePage.tsx | 71 | Component |
| useMarkdown.ts | 65 | Hook |
| main.tsx | 11 | Entry |
| appStore.ts | 130 | State |
| globals.css | 50 | Style |
| markdown.css | 100 | Style |
| tauriCommands.ts | 20 | Utils |
| commands.rs | 155 | Rust Backend |
| lib.rs | 26 | Rust Backend |
| main.rs | 6 | Rust Backend |
A.2 STL Version (1,483 LOC, 21 files)
| File | LOC | Category |
|---|---|---|
| App.tsx | 152 | Layout |
| App.css | 116 | Style (template) |
| FolderTree.tsx | 85 | Component |
| MarkdownView.tsx | 88 | Component |
| SearchBar.tsx | 86 | Component |
| Sidebar.tsx | 45 | Component |
| TOC.tsx | 65 | Component |
| Toolbar.tsx | 111 | Component |
| WelcomePage.tsx | 44 | Component |
| useMarkdown.ts | 71 | Hook |
| useTheme.ts | 83 | Hook |
| index.css | 20 | Style |
| main.tsx | 10 | Entry |
| appStore.ts | 87 | State |
| markdown.css | 140 | Style |
| themes.css | 32 | Style |
| tauriCommands.ts | 38 | Utils |
| vite-env.d.ts | 1 | Type Decl |
| commands.rs | 177 | Rust Backend |
| lib.rs | 26 | Rust Backend |
| main.rs | 6 | Rust Backend |
A.3 STLC Version (1,198 LOC, 22 files)
| File | LOC | Category |
|---|---|---|
| App.tsx | 70 | Layout |
| App.css | 1 | Style (empty) |
| FolderTree.tsx | 75 | Component |
| MarkdownView.tsx | 132 | Component |
| SearchBar.tsx | 100 | Component |
| Sidebar.tsx | 42 | Component |
| TOC.tsx | 64 | Component |
| Toolbar.tsx | 94 | Component |
| WelcomePage.tsx | 54 | Component |
| useMarkdown.ts | 22 | Hook |
| useTheme.ts | 43 | Hook |
| main.tsx | 9 | Entry |
| appStore.ts | 112 | State |
| markdown.css | 79 | Style |
| themes.css | 34 | Style |
| fileTypes.ts | 6 | Utils |
| markdown.ts | 53 | Utils |
| tauriCommands.ts | 32 | Utils |
| vite-env.d.ts | 1 | Type Decl |
| commands.rs | 145 | Rust Backend |
| lib.rs | 24 | Rust Backend |
| main.rs | 6 | Rust Backend |
Appendix B: Feature Evaluation Details
B.1 Features Where STL Uniquely Succeeded
| Feature | STL Implementation | Why NL/STLC Failed |
|---|---|---|
| Scroll spy (IntersectionObserver) | Correct IntersectionObserver on headings, active heading state updates on scroll | NL: not implemented; STLC: attached to window instead of scrollable container |
| Theme persistence | Correctly uses @tauri-apps/plugin-store save/load | NL: plugin declared but never imported; STLC: implemented correctly |
| Recent files persistence | Full round-trip: load from store → update in memory → save on change | NL: in-memory only; STLC: loads from store but never saves back |
| External links | shell.open() correctly imported and called | NL: code was commented out; STLC: implemented correctly |
B.2 STLC’s “Structurally Correct, Practically Broken” Pattern
| Feature | JS Pipeline | CSS/Config | Result |
|---|---|---|---|
| Syntax highlighting | rehype-highlight correctly in unified pipeline | highlight.js CSS never imported | Code renders monochrome |
| KaTeX math | remark-math + rehype-katex correctly in pipeline | katex.min.css never imported | Math renders as broken HTML |
| Scroll spy | IntersectionObserver logic correct | Attached to window instead of overflow container | Never fires |
This pattern — correct computational logic paired with missing practical integration — is characteristic of over-specification: the STLC flows describe the JavaScript computation but not the CSS/configuration dependencies that make it visually functional.
Appendix C: Execution Timeline
C.1 NL Execution
Start → npx create-vite (interactive prompt appears)
→ LLM pauses, asks human to make shell selection
→ Human: "所有shell的选择都你自己决定,不要让我选择"
→ LLM resumes with non-interactive commands
→ Implementation proceeds to completion
→ Result: Does not compile (missing build.rs, TS syntax error)
C.2 STL Execution
Start → Project scaffolding
→ Component implementation
→ Rust backend implementation
→ CSS and configuration
→ Build and verify
→ End (fully autonomous, zero interruptions)
→ Result: Compiles successfully, dist/ output produced
C.3 STLC Execution
Start → npx create-tauri-app --force
→ plan_stlc.md deleted by --force flag
→ Human notices: "刚刚的STLC文件怎么不见了?"
→ LLM explains: --force cleared non-template files
→ LLM restores plan from memory
→ Human interrupts to get plan regenerated externally
→ Implementation resumes and completes
→ Result: Compiles but missing CSS imports, scroll bugs
Report generated 2026-03-01 by SCOS Lab. For questions or reproduction assistance, contact the authors via https://github.com/scos-lab.