AI-Native Learning System — Adaptive Lesson Runtime

Section 1

Framing — The AI Teacher

Not a chatbot beside static content. An AI that actively teaches from structured ingredients.

Core thesis

The AI doesn't assist — it teaches

A chatbot waits to be asked a question. An AI teacher actively delivers the learning — decides what to explain, which example to show, when to assess, when to remediate, and when to advance. It uses structured ingredients to adapt every interaction to this learner.

"The system is not a chatbot beside static content — it is an AI-native learning system where the AI is the teacher, not the assistant."

How it works

Structured
ingredients

→

AI teacher
adapts in real time

→

Learner profile
tracks + adjusts

→

Next step
explain / assess / remediate

Static content can't be personalized. The AI assembles each lesson interaction from ingredients, adapted to this learner's context, pace, and gaps.

What the system decides

How to explain this concept (style, depth, examples from learner's domain)
Which exercise to give and at what difficulty
How much support to provide (hints, scaffolding)

Whether to slow down or speed up · when to reinforce a topic later

"The real question is not how to generate explanations. It is how the system decides what to teach, when to assess, and when to advance."

Multi-modal delivery

The AI teacher doesn't just chat — it uses tools: slides it controls, code editors, interactive figures (step-through visualizations), charts, videos. Each ingredient specifies which tool type to use. The result feels like a live lesson, not a text conversation.

›

v0.0.0 — simplest possible version (chat + DB + loop)

① Prompt loads: learner profile + current lesson ingredients + objective

② AI teacher explains, then gives exercise

③ Learner answers → checker evaluates

④ API call → update profile in DB

⑤ DB state becomes context for next prompt

One LLM, one DB table for profile, one loop. The DB state IS the memory. Each interaction overwrites relevant profile fields → those fields become context for the next turn.

"A v0 could ship as a chat agent with a DB loop in a week. But that is the demo, not the system. Structured ingredients and explicit validation are what make it reliable and improvable."

Section 2

Scope — Bounded Adaptive Lesson

One domain. Stable lesson sequence. Local adaptation inside each lesson.

"Given this framing as an adaptive teaching system, the scope for a bounded v1 centers on one SQL domain with local adaptation inside each lesson."

Bounded v1: Intro SQL

Mostly fixed lesson sequence (stable macro-path)
Local adaptation within each lesson (explanation, exercises, pacing)
Compact learner profile + reinforcement scheduling

Out: graph-wide replanning, large retrieval, full learner modeling, product shell redesign

"The lesson sequence stays mostly stable in v1, with local adaptation, rather than making the whole curriculum replan itself from day one."

Section 3

Users · FR · NFR

Three actors. Learning actions. Layered priorities.

"With scope bounded, the next step is defining who the system serves and what correct adaptive behavior actually looks like."

Users

Learner
main user

→

Expert
pedagogy review

→

Admin
traces, safety

Functional requirements (top 3)

Adapt explanation style, example choice, task difficulty per learner
Validate responses → qualify evidence strength → update compact profile
Schedule reinforcement based on topic stability + forgetting curve

"Functionality is described through learning actions, not generic pages."

Non-functional requirements (top 3)

Pedagogical correctness — don't advance on weak evidence
Graceful fallback — if adaptation fails, serve stable approved lesson block
Observability — what branch was chosen, why, what evidence

Runtime responsiveness · privacy/tenant safety · cost efficiency

›

NFR — what each one means concretely

Pedagogical correctness

Meaning: the system must not advance a learner who got the right answer only with heavy hints. Example: learner solves WHERE exercise but only after 3 hints → evidence = weak positive, NOT enough to advance. Require independent transfer check.

How to measure: premature advance rate — % of learners who advance but fail the next topic's prerequisite check. Target: < 10%.

Graceful fallback

Meaning: if LLM adaptation fails (timeout, poor output), learner still gets a coherent next step. Example: contextual example generation fails → serve the base example from ingredients. Never silence, never "something went wrong."

How to measure: fallback rate — % of interactions that use fallback path. Target: < 5%. If higher → investigate LLM health.

Observability

Meaning: for every learner interaction, the team can answer: what branch was chosen, why, what evidence was used, how the profile changed. Without this, "it feels adaptive" can't be turned into "it IS adaptive."

How to measure: trace completeness — % of interactions with full decision trace. Target: 100% (hard requirement, not metric to optimize).

›

Layered NFR priorities

Learner runtime: availability + low latency + fallback
Profile updates / policy: correctness + consistency
Analytics: eventual consistency OK
Tenant boundaries: hard correctness

Different layers have different reliability contracts.

Section 4

Core Adaptive Loop

The heart. If you can't explain it as a bounded loop — the architecture is too vague.

"With that in place, the key question becomes: what is the adaptive loop that turns structured ingredients into a personalized learning experience?"

The bounded adaptive loop

① Identify current lesson objective

② Load lesson ingredients (with tool types)

③ Inspect learner profile + recent performance

④ Choose next action + select tool

⑤ Deliver interaction (slides / code / chat / figure)

⑥ Validate learner response

⑦ Qualify evidence strength

⑧ Update learner profile

⑨ Schedule reinforcement if needed

⑩ Trace decision path

v1 branch types

Explain

→

Assess

→

Remediate

→

Advance

"Stable lesson sequence + local adaptation: explanation style, task difficulty, support intensity, pacing, reinforcement."

›

Learner profile — compact v1

Three onboarding fields (context):

role_and_industry

goals_and_problems

background

Performance signals (updated each interaction):

current_lesson

correctness_history

speed_hesitation

hint_dependence

weak_topic_flags

next_reinforcement_due

Explicit profile, NOT chat history as truth. Three context fields + performance signals = enough for meaningful adaptation.

›

Validation → Evidence → Profile Update (3-step split)

1. Checker

Correctness
Syntax vs logic error
Misconception-linked?
Hint dependency
Transfer success

2. Evidence qualifier

Weak evidence
Medium evidence
Strong evidence
Negative evidence

3. Profile update

Increase confidence
Flag weak topic
Schedule reinforcement
Slow pacing
Advance / retry

"Validation checks correctness. The evidence update decides if it is strong enough to change the learner model."

›

System architecture diagram

Real-time learner path (left) + offline pipeline (right).

›

4 learner scenarios — how the loop behaves

A · Syntax error

Checker flags syntax_error → targeted correction via hint ladder → retry → repeated → micro-remediation

B · Logic error + misconception

ORDER BY instead of WHERE → logic_error + misconception ↑ → targeted remediation with contrast explanation (contextualized to learner's domain: travel destinations) → blocked until corrective evidence strong

C · Correct but hint-heavy

Success after 3 hints → weak evidence → NOT enough to advance → require transfer check (different context, same skill) → advance only on independent evidence

D · Fast learner

Repeatedly correct + passes transfer → compress path → advance earlier → schedule reinforcement via forgetting curve in 5 days

›

Direct lookup vs retrieval

v1: ingredients tightly linked to each lesson → direct lesson-linked lookup. No retrieval layer needed yet.

"Retrieval only when the asset pool grows large enough that branch quality depends on selecting among many candidates."

Section 5

Deterministic vs Model

Model must never own truth, permissions, or progression decisions.

"With the loop established, the control split becomes explicit: what stays deterministic, what the model helps with, and where the model must never decide."

Split

Deterministic

Validation logic + evidence thresholds
State-transition guards
Release blockers + tenant isolation

Model-based

Explanation style adaptation
Contextualization (learner's domain)
Hint wording + error response

"Deterministic where correctness matters. Model where semantic flexibility creates leverage."

›

Observability — LangSmith vs OTel vs Prometheus

LangSmith / LangFuse

LLM-specific traces
Prompt → completion
Token cost · eval scoring

Debug model behavior, prompt iteration.

OpenTelemetry

Cross-service tracing
Span-level latency
Correlation IDs

E2E request tracing, SLA monitoring.

Prometheus + Grafana

Aggregate metrics
Dashboards + alerting
Historical trends

Ops dashboards, cost tracking.

"LangSmith for model quality, OTel for pipeline health, Prometheus for ops. They complement, not compete."

Section 6

Bottlenecks · Trade-offs · Failure

The system sounds adaptive but moves learners forward on weak evidence. That's the real failure.

Hardest failure

Coherent-looking but pedagogically wrong path

Learner appears to progress, system sounds adaptive — but mastery is overclaimed, misconceptions survive. Especially: learner passes direct tasks but fails transfer to new context.

"Much more dangerous than a visibly broken answer."

Main trade-off

Bounded inspectable adaptation > autonomy theater

Stronger control + simpler debugging
Safer progression + measurable evidence

Give up: surface magic, fully dynamic paths, broad autonomy

"Bounded, inspectable adaptation over autonomy theater."

7 bottlenecks (top 3)

Weak learner profile — wrong model = wrong decisions
Weak validation — can't validate = can't adapt
Premature advancement — looks done but isn't

Weak evidence thresholds · poor ingredients · over-remediation · latency/cost

›

Scale: addressing at volume

Validation: domain-specific checker configs; hybrid — deterministic for syntax, model for logic classification
Premature advance: transfer gate mandatory; delayed retention checks; confidence decay
Latency: pre-generate exercise variants; cache explanations; async trace writes
Cost: smaller model for routine; larger for ambiguous; per-interaction token budget
Fallback cascade: LLM timeout → cached variant → base ingredient → static content. Never silence
Circuit breaker: on LLM adapter — error rate > threshold → deterministic-only mode

Section 7

Metrics · Logs · Rollout

One headline metric. 4 quality layers. Learning ≠ engagement.

"Measurement focuses not on how fluent the system looks, but on whether learners actually master the material — and whether the system knows when they have not."

Headline metric

Time to stable mastery

4 quality layers

Runtime

Latency · fallback rate · trace completeness

Adaptation

Premature advance · over-remediation · right-next-step rate

Learning

Mastery gain · transfer success · delayed retention

Engagement

Continuation · return rate · hint usage patterns

Learning quality and engagement can diverge. Track both, optimize learning.

›

First 5 logs to tune from

Branch decisions: what did runtime choose, why, what evidence?
Validation outcomes: syntax vs logic vs misconception; hint dependency
Profile transitions: when confidence changes, when advancement, when later collapse

Reinforcement effectiveness · transfer failures = false mastery

"Tune through branch decisions, validation outcomes, profile transitions, and reinforcement effectiveness."

›

Feedback loop: runtime → ingestion

New error types: errors not matching any known pattern → flag for ingredient update
Weak hints: hint shown but error persists → ineffective, needs re-authoring
Missing misunderstandings: remediation fires for errors not in ingredient set → gap
Context performance: learners in "travel" domain outperform "quantum physics" → contextualization quality signal

"The runtime does not just consume ingredients — it generates the evidence to improve them."

Rollout — first milestone

One stable lesson sequence · one adaptive loop · direct lookup
Compact learner profile · bounded validation · reinforcement
Strong traces · strict quality gates

"One trustworthy adaptive lesson loop. Not a fully autonomous tutor — one bounded, measurable, improvable loop."

Section 8

Closing · Evolution · Vision

Strong close. Deepening hooks. Future vision.

Strong closing

"Start with one bounded adaptive loop: stable lesson sequence, structured ingredients, compact learner profile, explicit validation, strong traces. The AI teaches from ingredients, not from improvisation. Then evolve toward richer progression, retrieval, and more dynamic adaptation — once the bounded loop is proven and measurable."

Cycle 2 hook

Services, state, validators, storage, what stays simple in v1.

"The next level of detail covers the concrete system shape."

Cycle 3 hook

Orchestration, framework choices, representative flows, fallback, checker outputs.

"This can be made concrete: how learner behavior changes the next step."

›

Evolution roadmap

Phase 1: fixed lesson sequence, local adaptation, direct lookup, compact profile, bounded validators
Phase 2: graph-driven progression, richer profile, more branch types, retrieval when asset pool grows
Phase 3: dynamic path planning, cohort calibration, tenant-aware content, policy A/B

Fallback: always serve pre-authored content if adaptation fails. Never leave learner empty
Framework: start from loop; LangChain for utilities; LangGraph if staged loops + HITL
Storage: Postgres profiles + ingredients; S3 traces; Redis session cache; queue for async

›

Vision: neuroscience-informed learning compression

Time-to-mastery compression

The ultimate metric: how fast can this learner genuinely master this concept? Not by skipping — by optimizing delivery: cognitively optimal chunks, examples connected to existing mental models, interleaved practice at the right ratio, contrast examples at confusion points.

Neuroscience and behavioral research background is directly relevant — understanding how people process information and how to structure material for maximum retention.

Forgetting curve + spaced reinforcement

Track per-topic retention decay. Reinforce not by calendar but by predicted forgetting point. Fast learner on WHERE → review in 7 days. Struggling learner → 2 days.

Soft remediation woven into future lessons

Instead of "go back and review" — weak spots reinforced naturally in upcoming lessons. Learner doesn't feel remediated — course flows naturally.

Agent-based testing

Synthetic learner agents walk through lessons before real learners. Expose weak hints, broken remediation, missing misconceptions. Combined with human review → continuous auto-improvement loop.

"The endgame: a system that teaches, measures, and improves its own teaching — with humans focusing only on the highest-value decisions."