Artem Molchanov · April 2026 · Design proposal

AI-Native Learning System

Adaptive Lesson Runtime

System design for a real-time adaptive teaching system that delivers personalized learning from structured ingredients.

AI-Native Learning System
Section 1
Framing — The AI Teacher
Not a chatbot beside static content. An AI that actively teaches from structured ingredients.
Core thesis

The AI doesn't assist — it teaches

A chatbot waits to be asked a question. An AI teacher actively delivers the learning — decides what to explain, which example to show, when to assess, when to remediate, and when to advance. It uses structured ingredients to adapt every interaction to this learner.

"The system is not a chatbot beside static content — it is an AI-native learning system where the AI is the teacher, not the assistant."
How it works
Structured
ingredients
AI teacher
adapts in real time
Learner profile
tracks + adjusts
Next step
explain / assess / remediate

Static content can't be personalized. The AI assembles each lesson interaction from ingredients, adapted to this learner's context, pace, and gaps.

What the system decides
  • How to explain this concept (style, depth, examples from learner's domain)
  • Which exercise to give and at what difficulty
  • How much support to provide (hints, scaffolding)
  • Whether to slow down or speed up · when to reinforce a topic later
"The real question is not how to generate explanations. It is how the system decides what to teach, when to assess, and when to advance."
Multi-modal delivery

The AI teacher doesn't just chat — it uses tools: slides it controls, code editors, interactive figures (step-through visualizations), charts, videos. Each ingredient specifies which tool type to use. The result feels like a live lesson, not a text conversation.

v0.0.0 — simplest possible version (chat + DB + loop)
Prompt loads: learner profile + current lesson ingredients + objective
AI teacher explains, then gives exercise
Learner answers → checker evaluates
API call → update profile in DB
DB state becomes context for next prompt

One LLM, one DB table for profile, one loop. The DB state IS the memory. Each interaction overwrites relevant profile fields → those fields become context for the next turn.

"A v0 could ship as a chat agent with a DB loop in a week. But that is the demo, not the system. Structured ingredients and explicit validation are what make it reliable and improvable."
Section 2
Scope — Bounded Adaptive Lesson
One domain. Stable lesson sequence. Local adaptation inside each lesson.
"Given this framing as an adaptive teaching system, the scope for a bounded v1 centers on one SQL domain with local adaptation inside each lesson."
Bounded v1: Intro SQL
  • Mostly fixed lesson sequence (stable macro-path)
  • Local adaptation within each lesson (explanation, exercises, pacing)
  • Compact learner profile + reinforcement scheduling
  • Out: graph-wide replanning, large retrieval, full learner modeling, product shell redesign
"The lesson sequence stays mostly stable in v1, with local adaptation, rather than making the whole curriculum replan itself from day one."
Section 3
Users · FR · NFR
Three actors. Learning actions. Layered priorities.
"With scope bounded, the next step is defining who the system serves and what correct adaptive behavior actually looks like."
Users
Learner
main user
Expert
pedagogy review
Admin
traces, safety
Functional requirements (top 3)
  • Adapt explanation style, example choice, task difficulty per learner
  • Validate responses → qualify evidence strength → update compact profile
  • Schedule reinforcement based on topic stability + forgetting curve
"Functionality is described through learning actions, not generic pages."
Non-functional requirements (top 3)
  • Pedagogical correctness — don't advance on weak evidence
  • Graceful fallback — if adaptation fails, serve stable approved lesson block
  • Observability — what branch was chosen, why, what evidence
  • Runtime responsiveness · privacy/tenant safety · cost efficiency
NFR — what each one means concretely
Pedagogical correctness

Meaning: the system must not advance a learner who got the right answer only with heavy hints. Example: learner solves WHERE exercise but only after 3 hints → evidence = weak positive, NOT enough to advance. Require independent transfer check.

How to measure: premature advance rate — % of learners who advance but fail the next topic's prerequisite check. Target: < 10%.

Graceful fallback

Meaning: if LLM adaptation fails (timeout, poor output), learner still gets a coherent next step. Example: contextual example generation fails → serve the base example from ingredients. Never silence, never "something went wrong."

How to measure: fallback rate — % of interactions that use fallback path. Target: < 5%. If higher → investigate LLM health.

Observability

Meaning: for every learner interaction, the team can answer: what branch was chosen, why, what evidence was used, how the profile changed. Without this, "it feels adaptive" can't be turned into "it IS adaptive."

How to measure: trace completeness — % of interactions with full decision trace. Target: 100% (hard requirement, not metric to optimize).

Layered NFR priorities
  • Learner runtime: availability + low latency + fallback
  • Profile updates / policy: correctness + consistency
  • Analytics: eventual consistency OK
  • Tenant boundaries: hard correctness

Different layers have different reliability contracts.

Section 4
Core Adaptive Loop
The heart. If you can't explain it as a bounded loop — the architecture is too vague.
"With that in place, the key question becomes: what is the adaptive loop that turns structured ingredients into a personalized learning experience?"
The bounded adaptive loop
Identify current lesson objective
Load lesson ingredients (with tool types)
Inspect learner profile + recent performance
Choose next action + select tool
Deliver interaction (slides / code / chat / figure)
Validate learner response
Qualify evidence strength
Update learner profile
Schedule reinforcement if needed
Trace decision path
v1 branch types
Explain
Assess
Remediate
Advance
"Stable lesson sequence + local adaptation: explanation style, task difficulty, support intensity, pacing, reinforcement."
Learner profile — compact v1

Three onboarding fields (context):

role_and_industry
goals_and_problems
background

Performance signals (updated each interaction):

current_lesson
correctness_history
speed_hesitation
hint_dependence
weak_topic_flags
next_reinforcement_due

Explicit profile, NOT chat history as truth. Three context fields + performance signals = enough for meaningful adaptation.

Validation → Evidence → Profile Update (3-step split)
1. Checker
  • Correctness
  • Syntax vs logic error
  • Misconception-linked?
  • Hint dependency
  • Transfer success
2. Evidence qualifier
  • Weak evidence
  • Medium evidence
  • Strong evidence
  • Negative evidence
3. Profile update
  • Increase confidence
  • Flag weak topic
  • Schedule reinforcement
  • Slow pacing
  • Advance / retry
"Validation checks correctness. The evidence update decides if it is strong enough to change the learner model."
System architecture diagram

Real-time learner path (left) + offline pipeline (right).

Learning system — runtime architecture ── real-time ── Learner API / LB Runtime service LLM adapter explain, hints Validation service Profile service Trace writer Response ── offline ── Reinforcement sched. Analytics pipeline Feedback → ingestion ── storage ── Postgres Redis S3 LangSmith CDN purple = runtime · coral = LLM · teal = validation · amber = offline · blue = storage
4 learner scenarios — how the loop behaves
A · Syntax error
Checker flags syntax_error → targeted correction via hint ladder → retry → repeated → micro-remediation
B · Logic error + misconception
ORDER BY instead of WHERE → logic_error + misconception ↑ → targeted remediation with contrast explanation (contextualized to learner's domain: travel destinations) → blocked until corrective evidence strong
C · Correct but hint-heavy
Success after 3 hints → weak evidence → NOT enough to advance → require transfer check (different context, same skill) → advance only on independent evidence
D · Fast learner
Repeatedly correct + passes transfer → compress path → advance earlier → schedule reinforcement via forgetting curve in 5 days
Direct lookup vs retrieval

v1: ingredients tightly linked to each lesson → direct lesson-linked lookup. No retrieval layer needed yet.

"Retrieval only when the asset pool grows large enough that branch quality depends on selecting among many candidates."
Section 5
Deterministic vs Model
Model must never own truth, permissions, or progression decisions.
"With the loop established, the control split becomes explicit: what stays deterministic, what the model helps with, and where the model must never decide."
Split
Deterministic
  • Validation logic + evidence thresholds
  • State-transition guards
  • Release blockers + tenant isolation
Model-based
  • Explanation style adaptation
  • Contextualization (learner's domain)
  • Hint wording + error response
"Deterministic where correctness matters. Model where semantic flexibility creates leverage."
Observability — LangSmith vs OTel vs Prometheus
LangSmith / LangFuse
  • LLM-specific traces
  • Prompt → completion
  • Token cost · eval scoring

Debug model behavior, prompt iteration.

OpenTelemetry
  • Cross-service tracing
  • Span-level latency
  • Correlation IDs

E2E request tracing, SLA monitoring.

Prometheus + Grafana
  • Aggregate metrics
  • Dashboards + alerting
  • Historical trends

Ops dashboards, cost tracking.

"LangSmith for model quality, OTel for pipeline health, Prometheus for ops. They complement, not compete."
Section 6
Bottlenecks · Trade-offs · Failure
The system sounds adaptive but moves learners forward on weak evidence. That's the real failure.
Hardest failure

Coherent-looking but pedagogically wrong path

Learner appears to progress, system sounds adaptive — but mastery is overclaimed, misconceptions survive. Especially: learner passes direct tasks but fails transfer to new context.

"Much more dangerous than a visibly broken answer."
Main trade-off

Bounded inspectable adaptation > autonomy theater

  • Stronger control + simpler debugging
  • Safer progression + measurable evidence
  • Give up: surface magic, fully dynamic paths, broad autonomy
"Bounded, inspectable adaptation over autonomy theater."
7 bottlenecks (top 3)
  • Weak learner profile — wrong model = wrong decisions
  • Weak validation — can't validate = can't adapt
  • Premature advancement — looks done but isn't
  • Weak evidence thresholds · poor ingredients · over-remediation · latency/cost
Scale: addressing at volume
  • Validation: domain-specific checker configs; hybrid — deterministic for syntax, model for logic classification
  • Premature advance: transfer gate mandatory; delayed retention checks; confidence decay
  • Latency: pre-generate exercise variants; cache explanations; async trace writes
  • Cost: smaller model for routine; larger for ambiguous; per-interaction token budget
  • Fallback cascade: LLM timeout → cached variant → base ingredient → static content. Never silence
  • Circuit breaker: on LLM adapter — error rate > threshold → deterministic-only mode
Section 7
Metrics · Logs · Rollout
One headline metric. 4 quality layers. Learning ≠ engagement.
"Measurement focuses not on how fluent the system looks, but on whether learners actually master the material — and whether the system knows when they have not."
Headline metric
Time to stable mastery
4 quality layers
Runtime
Latency · fallback rate · trace completeness
Adaptation
Premature advance · over-remediation · right-next-step rate
Learning
Mastery gain · transfer success · delayed retention
Engagement
Continuation · return rate · hint usage patterns

Learning quality and engagement can diverge. Track both, optimize learning.

First 5 logs to tune from
  • Branch decisions: what did runtime choose, why, what evidence?
  • Validation outcomes: syntax vs logic vs misconception; hint dependency
  • Profile transitions: when confidence changes, when advancement, when later collapse
  • Reinforcement effectiveness · transfer failures = false mastery
"Tune through branch decisions, validation outcomes, profile transitions, and reinforcement effectiveness."
Feedback loop: runtime → ingestion
  • New error types: errors not matching any known pattern → flag for ingredient update
  • Weak hints: hint shown but error persists → ineffective, needs re-authoring
  • Missing misunderstandings: remediation fires for errors not in ingredient set → gap
  • Context performance: learners in "travel" domain outperform "quantum physics" → contextualization quality signal
"The runtime does not just consume ingredients — it generates the evidence to improve them."
Rollout — first milestone
  • One stable lesson sequence · one adaptive loop · direct lookup
  • Compact learner profile · bounded validation · reinforcement
  • Strong traces · strict quality gates
"One trustworthy adaptive lesson loop. Not a fully autonomous tutor — one bounded, measurable, improvable loop."
Section 8
Closing · Evolution · Vision
Strong close. Deepening hooks. Future vision.
Strong closing
"Start with one bounded adaptive loop: stable lesson sequence, structured ingredients, compact learner profile, explicit validation, strong traces. The AI teaches from ingredients, not from improvisation. Then evolve toward richer progression, retrieval, and more dynamic adaptation — once the bounded loop is proven and measurable."
Cycle 2 hook

Services, state, validators, storage, what stays simple in v1.

"The next level of detail covers the concrete system shape."
Cycle 3 hook

Orchestration, framework choices, representative flows, fallback, checker outputs.

"This can be made concrete: how learner behavior changes the next step."
Evolution roadmap
  • Phase 1: fixed lesson sequence, local adaptation, direct lookup, compact profile, bounded validators
  • Phase 2: graph-driven progression, richer profile, more branch types, retrieval when asset pool grows
  • Phase 3: dynamic path planning, cohort calibration, tenant-aware content, policy A/B
  • Fallback: always serve pre-authored content if adaptation fails. Never leave learner empty
  • Framework: start from loop; LangChain for utilities; LangGraph if staged loops + HITL
  • Storage: Postgres profiles + ingredients; S3 traces; Redis session cache; queue for async
Vision: neuroscience-informed learning compression
Time-to-mastery compression

The ultimate metric: how fast can this learner genuinely master this concept? Not by skipping — by optimizing delivery: cognitively optimal chunks, examples connected to existing mental models, interleaved practice at the right ratio, contrast examples at confusion points.

Neuroscience and behavioral research background is directly relevant — understanding how people process information and how to structure material for maximum retention.

Forgetting curve + spaced reinforcement

Track per-topic retention decay. Reinforce not by calendar but by predicted forgetting point. Fast learner on WHERE → review in 7 days. Struggling learner → 2 days.

Soft remediation woven into future lessons

Instead of "go back and review" — weak spots reinforced naturally in upcoming lessons. Learner doesn't feel remediated — course flows naturally.

Agent-based testing

Synthetic learner agents walk through lessons before real learners. Expose weak hints, broken remediation, missing misconceptions. Combined with human review → continuous auto-improvement loop.

"The endgame: a system that teaches, measures, and improves its own teaching — with humans focusing only on the highest-value decisions."
Copied!