Reading Log · Mansi Sheth

🧱 Foundations 🤖 LLM Systems 🔍 RAG & Retrieval 🔐 Security × AI 📚 Reading Log

Courses

Machine Learning

Andrew Ng · Stanford Online · See notes →

Finished

What is a Coding Assistant — Anthropic Claude Code in Action

Anthropic · See notes →

In Progress

Books

The Worlds I See: Curiosity, Exploration, and Discovery at the Dawn of AI

Fei-Fei Li

Read

The Coming Wave

Mustafa Suleyman

Reading

Why Machines Learn: The Elegant Math Behind Modern AI

Anil Ananthaswamy

Reading

Videos

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning

Andrew Ng · See notes →

Watched

Deep Dive into LLMs like ChatGPT

Andrej Karpathy · See notes →

Watched

How to Build Effective AI Agents (without the hype)

· See notes →

Watched

Building AI Agents in Pure Python

· See notes →

Watched

Software Is Changing (Again)

Andrej Karpathy · See notes →

Watched

Podcasts

Invisible Prompt Injection and LLM Security

Rapid Synthesis · See notes →

Listened

AI - Pen Testing

How AI Pen Testing Actually Works and Where It Breaks

Listened

Notes

Mostly low hanging fruits — routine, boring, mundane tasks of pen testing: login, maintain an authenticated session
Where it’s not helping: subtle issues that require chaining a bunch of things together and providing rich context
Can scale and speed up compared to manual testing

Scope control (how to stop it from going off into prod):

Domain blocks, network-level restrictions, URL blocks
Agent that checks each command before execution
Don’t show motivation or thinking behind a command — just show the command to execute (“deaf card”) — because LLMs are great at coming up with convincing arguments, so more context = LLM convincing itself it’s fine to proceed

Cost considerations:

Older model training is getting drastically cheaper — if VC money dries up, teams may fall back to existing/older models
As scale increases, cost increases — find sensible non-AI ways to crawl/gather data and only feed relevant pieces to agents. You don’t need AI to do everything.

Mistakes seen:

Lows: Smaller issues made to look like a huge deal (e.g., security headers missing)
Highs: Creative findings, such as passing /etc/passwd as an image

What AI is good at finding:

Great at verifying what it has done
XSS, SQLi, arbitrary file reads
Can generate a Python script to verify findings

What AI is not good at:

Authorization issues
Business logic flaws — “I wasn’t supposed to do that right there”
This is where the biggest research/engineering effort is being focused