Intelligenceisn'twhatyouknow.It'swhetheryouknowthatyouknow.
Independent AI researcher. Cognitive learning systems builder.
I study how humans understand things — and build tools that measure it.
It started with a question no textbook thought to ask.
Karachi, 2024
Why do students fail even when they study? Why does knowing the answer ≠ understanding it?
# The gap between these two values is what I research.
15 research phases later, HCMS was born. A DOI-backed preprint. A formal framework. At 16.
I don't build AI to follow a roadmap.
I build it because the problem is real and someone has to go first.
The Wrong Question
Data ≠ Understanding
A model trained on 10 million examples doesn't understand any of them.
Prediction ≠ Understanding
Getting the right answer doesn't mean knowing why it's right.
Accuracy ≠ Understanding
97% accuracy can coexist with 0% cognitive stability.
Understanding = Confidence Calibration + Reasoning Consistency + Cognitive Stability
This is what HCMS measures.
Read the Research →Human Cognition
Measurement System
"Beyond Correctness: Measuring Cognitive Stability
and Confidence Calibration in Human Understanding"
Shahid, M.R. (2026). Zenodo.
DOI: 10.5281/zenodo.18269740
Every test you've ever taken assumed correctness equals understanding. HCMS proves it doesn't. Across 15 structured research phases, HCMS models the gap between getting something right and truly knowing it — measuring confidence calibration, reasoning consistency, and cognitive stability under pressure. This is what assessment looks like when the question matters more than the answer.
At 16, that framework became a DOI-backed preprint. The research isn't finished — it's just begun.
Confidence Calibration
Measures the gap between how confident a learner claims to be and how accurately they actually perform. Overconfidence with low accuracy is the most dangerous cognitive state — it blocks the self-awareness needed to improve.
Calibration gap visualization
Research Contributions
Introduces cognitive stability as a measurable dimension beyond correctness
Demonstrates confidence–accuracy misalignment predicts reasoning degradation
Provides diagnostic framework vs predictive scoring model
Interpretable, reproducible signals for education & cognitive research
Includes sub-systems: Cognitive Robustness Benchmark, Learning Analytics Engine, Confidence Calibration Module
Three Laws of Understanding
Law I "Understanding requires more than correctness."
Law II "Confidence without calibration is noise."
Law III "Intelligence that cannot explain itself is incomplete."
This framework is open-source and citable.
Shahid, M.R. (2026). Beyond Correctness: Measuring Cognitive Stability and Confidence Calibration in Human Understanding.
Zenodo. DOI: 10.5281/zenodo.18269740
Featured Projects
23 repositories. 7 deployed systems. 1 published preprint. Here are the ones worth your attention.
UnderstandIQ
Cognitive assessment engine that measures whether learners truly understand — not just whether they answered correctly. Confidence calibration, misconception detection, cognitive archetypes from a live AI system.
COGNITIVE LEARNING SYSTEMS · EDTECH AI · ASSESSMENT INFRASTRUCTURE
Your assessment system
measures recall.
Mine measures understanding.
I've spent a year researching the gap between getting an answer right and truly understanding it. Now I build that insight into real EdTech products.
Every quiz, every AI tutor, every LMS right now:
"Did they get it right?"
Correctness. Binary. Easy to fake.
What your platform could be asking:
"Do they know that they got it right?"
Calibration. Depth. Impossible to fake.
That gap — between correctness and understanding — is what I build systems to measure.
What I build for EdTech platforms
Confidence-Aware Assessment
I add the confidence calibration layer your quiz system doesn't have. Before learners see results, they rate how certain they are. The gap between confidence and accuracy is your most valuable pedagogical signal — and no standard platform captures it.
What you get
- ✦Confidence rating per question (before results)
- ✦Calibration gap analysis across topics
- ✦Overconfidence and underconfidence detection
- ✦Learner cognitive archetype profiling
- ✦Misconception pattern identification
Built for: Assessment platforms, AI tutors, adaptive learning systems
→ Talk about your platformDocument-to-Learning Systems
Upload any content — PDFs, notes, lectures, research papers. The system generates adaptive assessments that probe surface recall, conceptual understanding, and applied reasoning. Not just MCQs. Four question types. Confidence capture. Cognitive feedback.
What you get
- ✦Multi-type question generation (MCQ, Short Answer, Application, Explain-It)
- ✦Depth-level targeting (recall vs. conceptual vs. applied)
- ✦Per-topic performance breakdown
- ✦AI-generated study recommendations
- ✦Downloadable learner reports
Built for: Course creators, bootcamps, corporate training, EdTech platforms
→ Talk about your platformLearner Analytics & Insight Dashboards
Turn raw quiz data into decisions. I build dashboards that show not just who failed, but why — which topics are misunderstood, where confidence diverges from reality, and which learners need intervention now.
What you get
- ✦Topic-level accuracy and confidence heatmaps
- ✦Weak-area detection per learner and cohort
- ✦Misconception clustering across a student group
- ✦Progress tracking over time
- ✦Exportable data and reports
Built for: Tutoring platforms, schools, online academies, LMS builders
→ Talk about your platformMisconception Detection
A student scoring 80% with a specific misconception in the remaining 20% is more at risk than a student scoring 60% who knows exactly where their gaps are. I build systems that find the misconception, name it, and generate targeted remediation — not generic 'try again' feedback.
What you get
- ✦Rule-based and AI-powered misconception identification
- ✦Named misconception patterns per topic
- ✦Confidence-weighted wrong-answer analysis
- ✦Targeted remediation suggestions per learner
- ✦Integration with existing assessment pipelines
Built for: Adaptive learning platforms, AI tutors, test prep companies
→ Talk about your platformResearch Collaboration
If you're a researcher, academic, or R&D team working on learning systems, cognitive measurement, or AI assessment — I'm not a contractor. I'm a potential collaborator. I bring HCMS, experimental design experience, and a genuine obsession with the problem.
What you get
- ✦Joint experimental design on assessment frameworks
- ✦Literature synthesis and implementation from papers
- ✦Cognitive measurement instrument design
- ✦Statistical analysis and validation
- ✦Co-authorship where work is genuinely joint
Built for: University labs, cognitive science researchers, EdTech R&D teams
→ Talk about your platformWhy does it matter that I'm a researcher?
Most developers implement. I diagnosed the problem first — then built the system.
The Framework Exists
HCMS isn't a pitch — it's a published, DOI-backed framework. The theoretical foundation for every system I build has already been validated in structured research. You're not getting a feature. You're getting a grounded idea.
The System Exists
UnderstandIQ is live at understandiq.streamlit.app — not a mockup, not a demo, not a pitch deck. A real system that real learners can use today. That's what I build for your platform.
The Insight Drives the Code
The reason overconfidence predicts learning failure better than raw accuracy isn't obvious. I know it because I researched it. That's the difference between a developer who builds what you ask for and a researcher who builds what you need.
How a project works
You share your problem
What's your platform? What does your assessment do now? What's missing?
2-3 daysI diagnose
I look at your current system and identify specifically where understanding is being left unmeasured.
1-2 daysI build
Clean code, daily updates, no surprises. You can see the build in real time.
3-14 daysYou get results
Deployed, documented, and designed to be extended. Not a one-time deliverable — a foundation.
IncludedReady to measure understanding — not just correctness?
Tell me about your platform. I'll tell you exactly how I'd improve it.
The research and the product exist. Not as claims — as links you can click.
What People Say
Collecting feedback from first EdTech projects and research collaborators.
Testimonials appear here as they come in.
Proof, not promises.
Every number below is a shipped output, a published result, or a real system.
Years old. Building what most wait decades to attempt.
In HCMS. Not iterations. Structured phases.
On GitHub. Every one shipped.
DOI-backed. Zenodo. At 16.
Fake News Detector. Real-world data.
Real inference. Real users.
What I build with.
I don't list skills I've read about. Every tool here has a GitHub commit or a published paper behind it.
Hover nodes to explore connections
Thinking Out Loud
Research notes, systems thinking, and ideas in motion — across long-form, short-form, and live code.
Substack
@muhammedrayanshahid
Research notes, half-formed ideas, and questions I can't stop asking. Long-form thinking on cognitive measurement, AI assessment, and building systems that understand understanding.
Read on Substack ↗X (Twitter)
@MRayanShahid
Short-form thinking on AI research, learning systems, and building in public. The same mind behind HCMS, in tweet form.
Follow on X ↗Live System
understandiq.streamlit.app
The live cognitive assessment engine built on HCMS research. Upload any document. Discover your cognitive fingerprint — free, no signup.
Try UnderstandIQ ↗
The Manifesto
Most people spend years preparing to do research.
I started doing it.
At 16, I published a DOI-backed cognitive science preprint — HCMS, the Human Cognition Measurement System. Not because a professor told me to. Because I realized that every exam I'd taken was measuring the wrong thing. Correctness is easy to fake.Deep understanding isn't.
I work at the intersection of machine learning, cognitive science, and human-centered AI. My research asks: can we formally measure how a person understands something — not just whether they answered correctly? HCMS is the first answer to that question.
My thesis is simple: intelligence is a stability, not a score. A calibration. A consistency under pressure.
I'm not building AI to get a job.
I'm building things that don't exist yet. That's the only reason worth having.
For EdTech founders and platform builders: I take on selective projects where the problem genuinely intersects with this work. If your platform assesses learners and you want to know not just what they got right — but whether they truly understand it — that's exactly what I build. Let's talk →
229 contributions in 2025 · Joined GitHub Jun 2025 · 23 public repos
Let's build something that matters.
Researchers, universities, EdTech founders, learning platform builders — reach out. I read every message and respond to all of them.
Preferred topics
Direct links
Response within 24 hours. EdTech project inquiries: I'll send a specific question about your platform within 48 hours.