Independent AI Researcher · Age 16

Intelligenceisn't
whatyouknow.
It'swhetheryou
knowthatyouknow.

Building AI that measures how humans think — not just whether they're right.

Karachi, Pakistan

scroll to explore

It started with a question no textbook thought to ask.

Karachi, 2024

Why do students fail even when they study? Why does knowing the answer ≠ understanding it?

# The hypothesis that changed everything
correctness_score: float = student. get_answer()
understanding_level: float = hcms. measure(student)
assert correctness_score != understanding_level # ← always true
# The gap between these two values is what I research.
15

15 research phases later, HCMS was born. A DOI-backed preprint. A formal framework. At 16.

I don't build AI to follow a roadmap.

I build it because the problem is real and someone has to go first.

The Wrong Question

Data ≠ Understanding

A model trained on 10 million examples doesn't understand any of them.

Prediction ≠ Understanding

Getting the right answer doesn't mean knowing why it's right.

Accuracy ≠ Understanding

97% accuracy can coexist with 0% cognitive stability.

Understanding = Confidence Calibration + Reasoning Consistency + Cognitive Stability

This is what HCMS measures.

Read the Research →
PUBLISHED RESEARCH · ZENODO DOI

Human Cognition
Measurement System

"Beyond Correctness: Measuring Cognitive Stability
and Confidence Calibration in Human Understanding"
Shahid, M.R. (2026). Zenodo.
DOI: 10.5281/zenodo.18269740

Every test you've ever taken assumed correctness equals understanding. HCMS proves it doesn't. Across 15 structured research phases, HCMS models the gap between getting something right and truly knowing it — measuring confidence calibration, reasoning consistency, and cognitive stability under pressure. This is what assessment looks like when the question matters more than the answer.

At 16, that framework became a DOI-backed preprint. The research isn't finished — it's just begun.

Confidence Calibration

Measures the alignment between a student's confidence and their accuracy. Misalignment indicates overconfidence or underconfidence, both signs of unstable understanding.

Calibration gap visualization

15
Research Phases
Python + TeX
Methodology
DOI-Backed
Validation
Phase 1–15 Complete
Status

Research Contributions

Introduces cognitive stability as a measurable dimension beyond correctness

Demonstrates confidence–accuracy misalignment predicts reasoning degradation

Provides diagnostic framework vs predictive scoring model

Interpretable, reproducible signals for education & cognitive research

Includes sub-systems: Cognitive Robustness Benchmark, Learning Analytics Engine, Confidence Calibration Module

Input QuestionUnderstanding AnalysisConfidence CalibrationConsistency CheckRobustness TestingExplainability LayerCognitive Profile OutputCognitive ProfilePartial | MiscalibratedConsistency: 0.83

Three Laws of Understanding

Law I "Understanding requires more than correctness."

Law II "Confidence without calibration is noise."

Law III "Intelligence that cannot explain itself is incomplete."

🔬

This framework is open-source and citable.

Shahid, M.R. (2026). Beyond Correctness: Measuring Cognitive Stability and Confidence Calibration in Human Understanding.
Zenodo. DOI: 10.5281/zenodo.18269740

View on Zenodo
rayan@hcms — zsh

Featured Projects

23 repositories. 7 deployed systems. 1 published preprint. Here are the ones worth your attention.

★ FLAGSHIP RESEARCH

Human Cognition Measurement System (HCMS)

A 15-phase, research-grade cognitive assessment framework that goes beyond correctness. Models confidence calibration, reasoning consistency, and cognitive stability. DOI-backed preprint published on Zenodo.

PythonTeXLaTeXStatistical AnalysisPsychometrics
DOI: 10.5281/zenodo.18269740
DOI-Backed Preprint
ResearchPublished

Human Cognition Measurement System (HCMS)

A 15-phase, research-grade cognitive assessment framework that goes beyond correctness. Models confidence calibration, reasoning consistency, and cognitive stability. DOI-backed preprint published on Zenodo.

PythonTeXLaTeXStatistical Analysis+1
NLPDeployed

Fake News Detection AI

NLP classifier achieving 97% accuracy on real-world news data. Deployed live on Hugging Face Spaces using TF-IDF + Passive Aggressive Classifier.

PythonScikit-learnNLTKGradio
97% Accuracy
Computer VisionDeployed

Emotion Classifier AI

CNN trained on FER2013 dataset detecting 7 emotion classes in real-time via webcam. Keras/TensorFlow pipeline with live inference capabilities.

PythonTensorFlowKerasOpenCV+1
Computer VisionResearch Complete

Medical Imaging AI

Multi-label chest condition detection on ChestMNIST dataset using Convolutional Neural Networks for healthcare screening applications.

PyTorchCNNMedMNISTHealthcare AI
NLPDeployed

Speech-to-Text Translator

Real-time multilingual audio transcription powered by OpenAI Whisper + Google Translate. Supports 50+ language pairs with live audio processing.

WhisperGoogle Translate APIPythonGradio
AutomationOperational

Social Media Automation Engine

AI-powered content engine that auto-generates captions, hashtags, and cross-posts to LinkedIn, Instagram, Facebook via Make.com + GPT-4.

Make.comOpenAI APIGPT-4REST APIs
Computer VisionComplete

Road Lane Detection (OpenCV)

Real-time road lane line detection using OpenCV computer vision pipeline. Processes video frames to identify and track lane boundaries for autonomous driving contexts.

PythonOpenCVNumPyComputer Vision
Computer VisionComplete

Casting Defect Detection (CNN)

Industrial quality control system using Convolutional Neural Networks to detect surface defects in metal casting products. Computer vision for manufacturing automation.

PythonTensorFlowCNNJupyter Notebook
MLComplete

Telco Customer Churn Predictor

Machine learning model predicting customer churn in telecommunications. Feature engineering, classification, and business insight generation from behavioral data.

PythonScikit-learnPandasJupyter Notebook
Computer VisionDeployed

Vehicle Detection & Counting (YOLO)

Real-time vehicle detection and traffic counting system using YOLO object detection. Processes video streams to count vehicles by type across lanes.

PythonYOLOOpenCVComputer Vision
NLPComplete

Speech Emotion Recognition

Deep learning model that classifies human emotional states from raw audio signals. Uses MFCCs and spectral features with neural network classification.

PythonLibrosaTensorFlowAudio ML

Proof, not promises.

Every number below is a shipped output, a published result, or a real system.

0

Years old. Building what most wait decades to attempt.

0

In HCMS. Not iterations. Structured phases.

0

On GitHub. Every one shipped.

0

DOI-backed. Zenodo. At 16.

0%

Fake News Detector. Real-world data.

0

Real inference. Real users.

What I build with.

I don't list skills I've read about. Every tool here has a GitHub commit or a published paper behind it.

LanguagesPythonTeXJavaScriptML / DLTensorFlowKerasPyTorchScikit-learnNLPNLTKTF-IDFWhisperGPT APIDeployHuggingFaceVercelStreamlitGradioResearchZenodoLaTeXJupyterDataPandasNumPyMatplotlibMR

Hover nodes to explore connections

Thinking Out Loud

Research notes, half-formed ideas, and questions I can't stop asking. On Substack.

Muhammad Rayan Shahid on Substack

Independent researcher working on human-centered AI and cognitive measurement. Interested in how understanding, confidence, and stability can be formally measured — not assumed.

Muhammad Rayan Shahid
Independent AI Researcher · Karachi, Pakistan

The Manifesto

Most people spend years preparing to do research.
I started doing it.

At 16, I published a DOI-backed cognitive science preprint — HCMS, the Human Cognition Measurement System. Not because a professor told me to. Because I realized that every exam I'd taken was measuring the wrong thing. Correctness is easy to fake.Deep understanding isn't.

I work at the intersection of machine learning, cognitive science, and human-centered AI. My research asks: can we formally measure how a person understands something — not just whether they answered correctly? HCMS is the first answer to that question.

My thesis is simple: intelligence is a stability, not a score. A calibration. A consistency under pressure.

I'm not building AI to get a job.
I'm building things that don't exist yet. That's the only reason worth having.

229 contributions in 2025 · Joined GitHub Jun 2025 · 23 public repos

Let's do something
that matters.

Researchers, universities, collaborators, people who read HCMS and had thoughts — reach out. I read everything.

Response time: usually within 24 hours.
Preferred topics: research collaboration, academic opportunities, AI systems.