Run the dual-stage pipeline that powers this project: lexical alignment, semantic reranking, and rich evidence surfaces for every teacher + student pair.
Pipeline snapshot
Anchor extraction, edit distance, and semantic overlap establish a safe floor before reranking.
Concept clustering + drift scoring highlight where the student summary diverges.
What the live demo shows
Everything renders in-browser using scorer.js,
explainability modules, and temporal drift research.
Live evaluation workspace
Paste a teacher reference and a student response. We will run the dual-stage pipeline and surface explainability artifacts instantly. 💡 Temporal Analysis: Submit multiple answers to see learning trajectory, improvement scores, and consistency metrics.
Fill the form to see explainable results.
Green = positive impact, Red = negative impact compared to average.
This chart stacks each feature contribution to show how the final score is built.
Each bar shows how strongly the student response covers that metric.
Normalized snapshot of semantic, lexical, and coverage signals.
Bright blocks surface the most helpful or harmful sentences.
Alignment of Teacher Script (X: 100 parts) vs. Student Summary (Y: 10 parts).
Bulk evaluation
Automate grading for an entire cohort with CSV or pair long transcripts with multi-student summaries via our XLSX workflow.
Upload a CSV file containing columns like `question`, `reference_answer`, and `student_answer` for bulk processing.
Drop CSV here or browse
Grade multiple student summaries (.xlsx) against a long Meet transcript (.docx).
No file selected yet
No file selected yet