Independent AI Researcher · Age 16

Intelligenceisn'twhatyouknow.
It'swhetheryouknowthatyouknow.

Independent AI researcher. Cognitive learning systems builder.

I study how humans understand things — and build tools that measure it.

scroll to explore

It started with a question no textbook thought to ask.

Karachi, 2024

Why do students fail even when they study? Why does knowing the answer ≠ understanding it?

# The hypothesis that changed everything
correctness_score: float = student. get_answer()
understanding_level: float = hcms. measure(student)
assert correctness_score != understanding_level # ← always true
# The gap between these two values is what I research.
15

15 research phases later, HCMS was born. A DOI-backed preprint. A formal framework. At 16.

I don't build AI to follow a roadmap.

I build it because the problem is real and someone has to go first.

The Wrong Question

Data ≠ Understanding

A model trained on 10 million examples doesn't understand any of them.

Prediction ≠ Understanding

Getting the right answer doesn't mean knowing why it's right.

Accuracy ≠ Understanding

97% accuracy can coexist with 0% cognitive stability.

Understanding = Confidence Calibration + Reasoning Consistency + Cognitive Stability

This is what HCMS measures.

Read the Research →
PUBLISHED RESEARCH · ZENODO DOI

Human Cognition
Measurement System

"Beyond Correctness: Measuring Cognitive Stability
and Confidence Calibration in Human Understanding"
Shahid, M.R. (2026). Zenodo.
DOI: 10.5281/zenodo.18269740

Every test you've ever taken assumed correctness equals understanding. HCMS proves it doesn't. Across 15 structured research phases, HCMS models the gap between getting something right and truly knowing it — measuring confidence calibration, reasoning consistency, and cognitive stability under pressure. This is what assessment looks like when the question matters more than the answer.

At 16, that framework became a DOI-backed preprint. The research isn't finished — it's just begun.

Confidence Calibration

Measures the gap between how confident a learner claims to be and how accurately they actually perform. Overconfidence with low accuracy is the most dangerous cognitive state — it blocks the self-awareness needed to improve.

Calibration gap visualization

15
Research Phases
Python + TeX
Methodology
DOI-Backed
Validation
Phase 1–15 Complete
Status

Research Contributions

Introduces cognitive stability as a measurable dimension beyond correctness

Demonstrates confidence–accuracy misalignment predicts reasoning degradation

Provides diagnostic framework vs predictive scoring model

Interpretable, reproducible signals for education & cognitive research

Includes sub-systems: Cognitive Robustness Benchmark, Learning Analytics Engine, Confidence Calibration Module

Input QuestionUnderstanding AnalysisConfidence CalibrationConsistency CheckRobustness TestingExplainability LayerCognitive Profile OutputCognitive ProfilePartial | MiscalibratedConsistency: 0.83

Three Laws of Understanding

Law I "Understanding requires more than correctness."

Law II "Confidence without calibration is noise."

Law III "Intelligence that cannot explain itself is incomplete."

🔬

This framework is open-source and citable.

Shahid, M.R. (2026). Beyond Correctness: Measuring Cognitive Stability and Confidence Calibration in Human Understanding.
Zenodo. DOI: 10.5281/zenodo.18269740

View on Zenodo
rayan@hcms — zsh

Featured Projects

23 repositories. 7 deployed systems. 1 published preprint. Here are the ones worth your attention.

★ FLAGSHIP PROJECT

UnderstandIQ

Cognitive assessment engine that measures whether learners truly understand — not just whether they answered correctly. Confidence calibration, misconception detection, cognitive archetypes from a live AI system.

PythonStreamlitGroq APILLaMA 3.3 70Bfpdf2
ResearchLive

UnderstandIQ

Cognitive assessment engine that measures whether learners truly understand — not just whether they answered correctly. Confidence calibration, misconception detection, cognitive archetypes from a live AI system.

PythonStreamlitGroq APILLaMA 3.3 70B+3
DOI-Backed Accuracy
NLPDeployed

Fake News Detection AI

NLP classifier achieving 97% accuracy on real-world news data. Deployed live on Hugging Face Spaces using TF-IDF + Passive Aggressive Classifier.

PythonScikit-learnNLTKGradio
97% Accuracy
Computer VisionDeployed

Emotion Classifier AI

CNN trained on FER2013 dataset detecting 7 emotion classes in real-time via webcam. Keras/TensorFlow pipeline with live inference capabilities.

PythonTensorFlowKerasOpenCV+1
Computer VisionResearch Complete

Medical Imaging AI

Multi-label chest condition detection on ChestMNIST dataset using Convolutional Neural Networks for healthcare screening applications.

PyTorchCNNMedMNISTHealthcare AI
NLPDeployed

Speech-to-Text Translator

Real-time multilingual audio transcription powered by OpenAI Whisper + Google Translate. Supports 50+ language pairs with live audio processing.

WhisperGoogle Translate APIPythonGradio
AutomationOperational

Social Media Automation Engine

AI-powered content engine that auto-generates captions, hashtags, and cross-posts to LinkedIn, Instagram, Facebook via Make.com + GPT-4.

Make.comOpenAI APIGPT-4REST APIs
Computer VisionComplete

Road Lane Detection (OpenCV)

Real-time road lane line detection using OpenCV computer vision pipeline. Processes video frames to identify and track lane boundaries for autonomous driving contexts.

PythonOpenCVNumPyComputer Vision
Computer VisionComplete

Casting Defect Detection (CNN)

Industrial quality control system using Convolutional Neural Networks to detect surface defects in metal casting products. Computer vision for manufacturing automation.

PythonTensorFlowCNNJupyter Notebook
MLComplete

Telco Customer Churn Predictor

Machine learning model predicting customer churn in telecommunications. Feature engineering, classification, and business insight generation from behavioral data.

PythonScikit-learnPandasJupyter Notebook
Computer VisionDeployed

Vehicle Detection & Counting (YOLO)

Real-time vehicle detection and traffic counting system using YOLO object detection. Processes video streams to count vehicles by type across lanes.

PythonYOLOOpenCVComputer Vision
NLPComplete

Speech Emotion Recognition

Deep learning model that classifies human emotional states from raw audio signals. Uses MFCCs and spectral features with neural network classification.

PythonLibrosaTensorFlowAudio ML

COGNITIVE LEARNING SYSTEMS · EDTECH AI · ASSESSMENT INFRASTRUCTURE

Currently accepting EdTech projects

Your assessment system
measures recall.
Mine measures understanding.

I've spent a year researching the gap between getting an answer right and truly understanding it. Now I build that insight into real EdTech products.

Every quiz, every AI tutor, every LMS right now:

"Did they get it right?"

Correctness. Binary. Easy to fake.

What your platform could be asking:

"Do they know that they got it right?"

Calibration. Depth. Impossible to fake.

That gap — between correctness and understanding — is what I build systems to measure.

What I build for EdTech platforms

Core Specialty

Confidence-Aware Assessment

I add the confidence calibration layer your quiz system doesn't have. Before learners see results, they rate how certain they are. The gap between confidence and accuracy is your most valuable pedagogical signal — and no standard platform captures it.

What you get

  • Confidence rating per question (before results)
  • Calibration gap analysis across topics
  • Overconfidence and underconfidence detection
  • Learner cognitive archetype profiling
  • Misconception pattern identification

Built for: Assessment platforms, AI tutors, adaptive learning systems

→ Talk about your platform
High Demand

Document-to-Learning Systems

Upload any content — PDFs, notes, lectures, research papers. The system generates adaptive assessments that probe surface recall, conceptual understanding, and applied reasoning. Not just MCQs. Four question types. Confidence capture. Cognitive feedback.

What you get

  • Multi-type question generation (MCQ, Short Answer, Application, Explain-It)
  • Depth-level targeting (recall vs. conceptual vs. applied)
  • Per-topic performance breakdown
  • AI-generated study recommendations
  • Downloadable learner reports

Built for: Course creators, bootcamps, corporate training, EdTech platforms

→ Talk about your platform

Learner Analytics & Insight Dashboards

Turn raw quiz data into decisions. I build dashboards that show not just who failed, but why — which topics are misunderstood, where confidence diverges from reality, and which learners need intervention now.

What you get

  • Topic-level accuracy and confidence heatmaps
  • Weak-area detection per learner and cohort
  • Misconception clustering across a student group
  • Progress tracking over time
  • Exportable data and reports

Built for: Tutoring platforms, schools, online academies, LMS builders

→ Talk about your platform
Research-Backed

Misconception Detection

A student scoring 80% with a specific misconception in the remaining 20% is more at risk than a student scoring 60% who knows exactly where their gaps are. I build systems that find the misconception, name it, and generate targeted remediation — not generic 'try again' feedback.

What you get

  • Rule-based and AI-powered misconception identification
  • Named misconception patterns per topic
  • Confidence-weighted wrong-answer analysis
  • Targeted remediation suggestions per learner
  • Integration with existing assessment pipelines

Built for: Adaptive learning platforms, AI tutors, test prep companies

→ Talk about your platform

Research Collaboration

If you're a researcher, academic, or R&D team working on learning systems, cognitive measurement, or AI assessment — I'm not a contractor. I'm a potential collaborator. I bring HCMS, experimental design experience, and a genuine obsession with the problem.

What you get

  • Joint experimental design on assessment frameworks
  • Literature synthesis and implementation from papers
  • Cognitive measurement instrument design
  • Statistical analysis and validation
  • Co-authorship where work is genuinely joint

Built for: University labs, cognitive science researchers, EdTech R&D teams

→ Talk about your platform

Why does it matter that I'm a researcher?

Most developers implement. I diagnosed the problem first — then built the system.

01

The Framework Exists

HCMS isn't a pitch — it's a published, DOI-backed framework. The theoretical foundation for every system I build has already been validated in structured research. You're not getting a feature. You're getting a grounded idea.

02

The System Exists

UnderstandIQ is live at understandiq.streamlit.app — not a mockup, not a demo, not a pitch deck. A real system that real learners can use today. That's what I build for your platform.

03

The Insight Drives the Code

The reason overconfidence predicts learning failure better than raw accuracy isn't obvious. I know it because I researched it. That's the difference between a developer who builds what you ask for and a researcher who builds what you need.

How a project works

01

You share your problem

What's your platform? What does your assessment do now? What's missing?

2-3 days
02

I diagnose

I look at your current system and identify specifically where understanding is being left unmeasured.

1-2 days
03

I build

Clean code, daily updates, no surprises. You can see the build in real time.

3-14 days
04

You get results

Deployed, documented, and designed to be extended. Not a one-time deliverable — a foundation.

Included

Ready to measure understanding — not just correctness?

Tell me about your platform. I'll tell you exactly how I'd improve it.

Available now·EdTech focus·Remote·Research-backed approach
1
Published Framework
1
Live System Built
97%
Peak Accuracy
15
Research Phases

The research and the product exist. Not as claims — as links you can click.

What People Say

Collecting feedback from first EdTech projects and research collaborators.
Testimonials appear here as they come in.

Proof, not promises.

Every number below is a shipped output, a published result, or a real system.

0

Years old. Building what most wait decades to attempt.

0

In HCMS. Not iterations. Structured phases.

0

On GitHub. Every one shipped.

0

DOI-backed. Zenodo. At 16.

0%

Fake News Detector. Real-world data.

0

Real inference. Real users.

What I build with.

I don't list skills I've read about. Every tool here has a GitHub commit or a published paper behind it.

LanguagesPythonTeXJavaScriptML / DLTensorFlowKerasPyTorchScikit-learnNLPNLTKTF-IDFWhisperGPT APIDeployHuggingFaceVercelStreamlitGradioResearchZenodoLaTeXJupyterDataPandasNumPyMatplotlibMR

Hover nodes to explore connections

Thinking Out Loud

Research notes, systems thinking, and ideas in motion — across long-form, short-form, and live code.

Substack

@muhammedrayanshahid

Research notes, half-formed ideas, and questions I can't stop asking. Long-form thinking on cognitive measurement, AI assessment, and building systems that understand understanding.

Read on Substack ↗

X (Twitter)

@MRayanShahid

Short-form thinking on AI research, learning systems, and building in public. The same mind behind HCMS, in tweet form.

Follow on X ↗

Live System

understandiq.streamlit.app

The live cognitive assessment engine built on HCMS research. Upload any document. Discover your cognitive fingerprint — free, no signup.

Try UnderstandIQ ↗
Muhammad Rayan Shahid
Independent AI Researcher · Karachi, Pakistan

The Manifesto

Most people spend years preparing to do research.
I started doing it.

At 16, I published a DOI-backed cognitive science preprint — HCMS, the Human Cognition Measurement System. Not because a professor told me to. Because I realized that every exam I'd taken was measuring the wrong thing. Correctness is easy to fake.Deep understanding isn't.

I work at the intersection of machine learning, cognitive science, and human-centered AI. My research asks: can we formally measure how a person understands something — not just whether they answered correctly? HCMS is the first answer to that question.

My thesis is simple: intelligence is a stability, not a score. A calibration. A consistency under pressure.

I'm not building AI to get a job.
I'm building things that don't exist yet. That's the only reason worth having.


For EdTech founders and platform builders: I take on selective projects where the problem genuinely intersects with this work. If your platform assesses learners and you want to know not just what they got right — but whether they truly understand it — that's exactly what I build. Let's talk →

229 contributions in 2025 · Joined GitHub Jun 2025 · 23 public repos

Let's build something that matters.

Researchers, universities, EdTech founders, learning platform builders — reach out. I read every message and respond to all of them.

Preferred topics

Research collaborationAcademic opportunitiesEdTech platform workCognitive assessment systemsLearning analytics

Response within 24 hours. EdTech project inquiries: I'll send a specific question about your platform within 48 hours.