Agentic AI Browser Assistant — 6-Week Learning Course

Week 1

Python — Intermediate Level

Week 1 cont.

HTML, DOM & CSS Selectors

Week 2

Playwright — Browser Automation

Week 3

LLMs, APIs & Prompt Engineering

Week 4

Agentic AI Design & LangChain

Week 5

FastAPI, Databases & WebSockets

Week 5 cont.

External APIs — Gmail, Calendar, PDF

Week 6

React — UI & Real-Time Updates

Week 6 stretch

Memory, Embeddings & Multi-Agent

Weeks 1–6

Learning Phase

9 topics · 6 assignments

Wk 01

Python & Web Foundations

Topics 1 & 2 — Python internals + HTML/DOM/CSS selectors

async/await DOM CSS selectors

Why this week matters: Every single line of agent code you write runs on Python. Before anything else works — Playwright, FastAPI, LangChain — you must be solid on async/await (Playwright is fully async), JSON handling (every LLM response is JSON), error handling (browser automation breaks constantly), decorators (FastAPI uses them everywhere), and virtual environments. The HTML/DOM/CSS section is equally non-negotiable: the browser automation engine finds elements using CSS selectors, so you need to be able to look at any webpage and identify the right selector to click or type into.

async / await JSON parsing try / except decorators file I/O virtual environments dataclasses / Pydantic HTML structure CSS selectors form elements DOM tree XPath basics iframes

A · Python — intermediate level

Getting started (skip if you know Python basics)

▶
Python Full Course for Beginners — freeCodeCamp (4h) Best single-video intro. Variables, lists, functions, file I/O. Watch at 1.5×.
▶
Python Full Course for Beginners — Mosh (2h) Faster alternative if you're already coding in another language.
📄
Official Python Tutorial — python.org The authoritative reference. Chapters 1–9 are essential. Bookmark and return often.

Intermediate Python (used directly in this project)

▶
Async/Await in Python — Corey Schafer Playwright and API calls are asynchronous. This is the most important video this week.
▶
Python Decorators Explained — Tech With Tim Used heavily in FastAPI route definitions — you'll see @app.post() everywhere in Week 5.
📄
Working with JSON in Python — Real Python Every LLM response and API call returns JSON. You must handle it fluently.
📄
Python Exceptions & Error Handling — Real Python Browser automation breaks constantly: element not found, timeout, navigation error. Proper error handling is a core skill.
▶
Python Virtual Environments — Corey Schafer Set up venv and pip properly. Every professional Python project uses this.
📄
Python Dataclasses — Real Python Pydantic (used in FastAPI) builds on dataclasses. Understanding this will make Week 5 click much faster.

Practice platform

💻
HackerRank — 30 Days of Code Do Days 0–7. Practical problem-solving beats passive watching.
💻
Exercism — Python Track Mentor-reviewed exercises. Great for honest feedback on code style.

B · HTML, DOM & CSS selectors

▶
HTML Full Course — freeCodeCamp (2h) Focus on forms, inputs, labels, and document structure — these are what you'll automate.
📄
Introduction to the DOM — MDN Web Docs The official, best explanation of what the DOM is. Short read, high value.
📄
CSS Selectors — MDN Playwright uses CSS selectors. Learn div, .class, #id, input[type], nth-child, and descendant selectors.
▶
HTML Forms Deep Dive Forms are the #1 thing you'll automate. Understand name, id, action, and every input type.
💻
CSS Diner — Interactive Selector Game The best way to learn selectors hands-on. Finish all 32 levels — it only takes an hour.
▶
Git & GitHub Crash Course — Traversy Media Learn init, add, commit, push, and branches. Commit your work every week from now on.

Assignment 1 Environment Setup + Python Warmup

Set up your working environment and prove Python readiness before moving on.

Create a Python 3.10+ virtual environment in a folder called ai-browser-agent — install nothing yet except requests
Write an async Python script that reads a JSON file of user info (name, email, phone, address) and prints it nicely — this becomes your "memory" layer later
Open any website in Chrome DevTools and identify 3 CSS selectors for form inputs, 2 for buttons, 1 for a dropdown — screenshot them
Commit everything to a GitHub repo with a README that describes what the final project will be
Complete HackerRank Days 0–5

✓

Checkpoint 1 — Dev Environment Ready

You can write async Python, handle JSON, manage exceptions, and identify CSS selectors on any webpage in DevTools. Everything that follows builds on this.

Wk 02

Playwright — Browser Automation

Topic 3 — Control any browser with code

locators wait strategies async Playwright file upload

This is the execution backbone of your agent. Playwright is the library that physically clicks buttons, types into forms, navigates URLs, takes screenshots, manages tabs, and uploads files — on your behalf. It is the most-used library in the entire project; every feature module touches it. You're choosing Playwright over Selenium because it has native async support, auto-waits for elements (critical for modern single-page apps), and a much cleaner API.

async Playwright locators fill / click / type wait strategies file upload multiple pages storage state (cookies) screenshot evaluate (JS injection)

A · Playwright core

📋
Playwright for Python — Official Getting Started Read this completely first. Then re-read the selectors and actions sections. This is your primary reference for the whole project.
▶
Playwright Python Full Tutorial — Playlist The best playlist for Python-specific Playwright. Covers installation, browsers, pages, clicks, typing, screenshots. Watch the first 8 videos.
▶
Web Scraping with Playwright — Practical Example See Playwright used for real data extraction — directly relevant to the agent's page-reading module.
📋
Playwright Locators — Official Docs The preferred modern way to find elements. More stable than raw CSS selectors. Read this carefully.
📋
Input Actions Reference — Playwright Docs Complete reference for fill(), type(), check(), select_option(), upload_file(), drag_and_drop(). Bookmark this page.
📋
Waiting Strategies — Playwright Docs wait_for_selector, wait_for_navigation, wait_for_load_state. Mastering these fixes 80% of timing bugs.

B · DOM parsing & data extraction

▶
BeautifulSoup Tutorial — Corey Schafer For parsing HTML after Playwright fetches it. Great combo: Playwright gets the page, BS4 parses it.
📄
Web Scraping with BeautifulSoup — Real Python Comprehensive guide. Focus on find(), find_all(), and CSS selector usage for extracting form fields and article text.
📋
Authentication & Storage State — Playwright Docs Save login state to JSON, reuse across sessions. Avoids re-logging-in every time the agent runs.
📋
Multiple Pages — Playwright Docs How to manage multiple tabs simultaneously. Used in the tab manager feature module.

Common failure modes to know now: Elements load dynamically on modern SPAs — use auto-wait or wait_for_selector(). Cookie banners and pop-ups appear before you can interact — detect and dismiss them first. Session expiry shows a login prompt mid-task — handle re-auth gracefully.

Assignment 2 Browser Automation Scripts

Build 3 scripts that will later become building blocks of your agent:

Script 1 — Navigator: Open a news site (BBC, HN), extract titles of the top 5 articles, save to a JSON file
Script 2 — Form Filler: Go to demoqa.com/automation-practice-form, fill every field from a JSON file, screenshot before submitting
Script 3 — Tab Manager: Open 5 tabs in parallel, capture each title, then close all except the first
Wrap all scripts in async functions. Handle at least 2 error conditions per script (element not found, timeout)

✓

Checkpoint 2 — Browser Automation Engine Working

You can control a browser programmatically — clicking, typing, navigating, and extracting data. This is the execution layer your agent will use.

Wk 03

LLMs, APIs & Prompt Engineering

Topic 4 — Give your agent a brain

function calling structured outputs few-shot prompting JSON mode

The intelligence layer. Your agent converts natural language — "fill this form with my details" — into structured JSON that maps to browser actions. This translation is done entirely by an LLM. How you prompt it determines whether the agent works or not. Prompt engineering is not optional; it is the single most important skill in this entire course. A well-prompted LLM can handle 90% of your agent's logic. A poorly-prompted one fails on every edge case.

OpenAI API function calling structured outputs system prompts few-shot examples chain-of-thought JSON mode token limits GPT-4o vision

A · How LLMs work — conceptual foundation

▶
Intro to Large Language Models — Andrej Karpathy (1h) The best conceptual overview of LLMs by one of their creators. Watch every minute — this shapes how you think about prompting.
📄
The Illustrated Transformer — Jay Alammar Visual explanation of attention mechanism powering all modern LLMs. A classic that will cement your understanding.

B · LLM APIs — OpenAI, Claude, Gemini

📋
OpenAI API Quickstart — Official Docs Get your first completion in 10 minutes. Start here before any video.
📋
Function Calling / Tool Use — OpenAI Docs Force the LLM to return structured JSON matching a schema you define. This is the core mechanism of your intent parser.
▶
OpenAI Function Calling Tutorial Watch this after reading the docs. Shows real-world usage of structured output extraction.
📋
Structured Outputs — OpenAI Docs Guarantee valid JSON that matches your exact schema. This is how your intent parser outputs action plans reliably.
📋
Anthropic Claude API — Getting Started Strong alternative to OpenAI, excellent instruction-following, free tier available. Good for complex multi-step tasks.
📋
Google Gemini API Quickstart Another free-tier alternative with strong multimodal support — good when you want to send screenshots to the LLM.

C · Prompt engineering — the most important section

Read this section carefully. The quality of your prompts determines whether your agent is reliable or not. This is not an exaggeration.

📄
Prompt Engineering Guide — promptingguide.ai The most comprehensive free resource. Read zero-shot, few-shot, chain-of-thought, and structured output sections.
▶
ChatGPT Prompt Engineering for Developers — DeepLearning.AI (free) Andrew Ng + OpenAI. The best structured course on prompting. About 1.5 hours total.
📄
Learn Prompting — Full Interactive Course Focus on "Structuring Output" and "Role Prompting" chapters. Both are directly used in the agent system prompt.
📋
OpenAI Prompt Engineering Strategies — Official Guide Six strategies for reliable structured outputs. Read "Write clear instructions" and "Use structured output" carefully.

Assignment 3 Intent Parser Prototype

Build the core intelligence module — a function that converts natural language into structured browser actions:

Write a function parse_intent(user_command: str) → dict that calls an LLM API and returns structured JSON
Define a schema: {"action": "fill_form"|"navigate"|"email"|"summarize"|"click", "target_url": "...", "data": {...}, "steps": [...]}
Test with 10 different commands: "apply to this job", "close all tabs", "email this summary to my boss", etc.
Add few-shot examples in your system prompt for at least 3 action types
Bonus: For ambiguous commands, make the LLM ask a clarifying question before outputting an action plan

✓

Checkpoint 3 — Natural Language → Structured Action Plan

Your intent parser reliably converts user commands into structured JSON action plans. This is the "brain" of your agent.

Wk 04

Agentic AI Design & LangChain

Topic 5 — Multi-step reasoning, tool use, memory, planning

ReAct pattern AgentExecutor custom tools memory

From chatbot to agent. A chatbot answers questions. An agent plans, uses tools, remembers state, and executes multi-step tasks in the real world. This week you learn the design patterns — ReAct, Plan-and-Execute, tool-calling — that make this possible, and LangChain, the framework that implements them. The ReAct pattern is what your agent runs on: Reason about what to do → Act with a tool → Observe the result → Reason again.

ReAct pattern LangChain agents custom tools AgentExecutor tool schemas conversation memory plan-and-execute error recovery loop

A · Agentic AI design patterns

📄
LLM Powered Autonomous Agents — Lilian Weng (OpenAI) The best technical overview of agent architecture: planning, memory, and tool use. A must-read before touching LangChain.
📑
ReAct: Synergizing Reasoning and Acting — Original Paper The foundational pattern your agent uses. Read the abstract and the examples section — skip the math sections.
▶
Deep Dive into LLMs like ChatGPT — Andrej Karpathy Builds an agent from first principles. Reveals the fundamental loop beneath all frameworks.
📄
Building Effective Agents — Anthropic Engineering Blog Practical patterns for reliable agents: sequential vs parallel flows, managing tool failures, when not to use agents.

B · LangChain — the framework

📋
LangChain Introduction — Official Docs Start here. Understand the core abstractions: LLMs, Chains, Agents, Tools, Memory. This is your reference going forward.
▶
LangChain Crash Course — Patrick Loeber (1h) Best single video to get up to speed on LangChain. Builds 3 real examples end-to-end.
📋
How to Create Custom Tools — LangChain Docs You'll use this to wrap your Playwright actions as LangChain tools. Study the Tool schema structure carefully.
📋
Conversation Memory — LangChain Docs How to maintain context across a multi-turn task ("apply to this job, use my stored resume"). Short-term memory layer.
📄
ChromaDB Getting Started — Medium Long-term memory via vector database. Stores user preferences, past tasks, and resume data so the agent can retrieve them semantically.

Assignment 4 LangChain Agent with Playwright Tools

Integrate the Week 2 and Week 3 work into a real agent that uses the browser:

Define 3 LangChain tools: navigate_to(url), click_element(selector), type_text(selector, text)
Create a LangChain agent that uses these tools to complete tasks like "go to google.com and search for AI news"
Add conversation memory so the agent remembers what it just did and can follow up
Test the full loop: user command → agent reasoning → tool execution → result → next step
Bonus: Add a simple file-based user profile store that the agent can query for name, email, and resume path

Wk 05

FastAPI, Databases, WebSockets & External APIs

Topics 6 & 8 — Backend server + Gmail/Calendar/PDF integrations

FastAPI routes WebSockets SQLite + SQLModel Gmail API

The server everything talks through. Your agent needs a backend to receive commands from the React UI, run browser sessions in the background, stream live status updates, and store user profiles. FastAPI is the modern Python choice: async-native, fast, and auto-generates Swagger documentation. The second half of this week connects to real-world services — Gmail for sending emails, Google Calendar for scheduling, and PDF parsing to read uploaded resumes.

FastAPI routes Pydantic models background tasks WebSockets SQLite + SQLModel REST API CORS env variables Gmail API Google Calendar API OAuth 2.0 PDF parsing MIME email format

A · FastAPI

📋
FastAPI Official Tutorial — fastapi.tiangolo.com The best framework documentation in the Python ecosystem. Read First Steps through Request Body. Run every example.
▶
FastAPI Crash Course — Patrick Loeber 40-minute video. Builds a complete API with GET, POST, path params, query params, and request body.
📄
Python REST APIs with FastAPI — Real Python In-depth tutorial covering CRUD operations, Pydantic validation, and proper error handling.
📋
Background Tasks — FastAPI Docs Running browser automation in the background while the API returns a task_id immediately. Essential pattern.
📋
WebSockets — FastAPI Docs Stream real-time status updates to the React frontend as the agent works ("✓ Opened form... ✓ Typed name...").

B · Data persistence — SQLite & SQLModel

📋
SQLModel — Official Docs By the FastAPI author. Combines SQLAlchemy + Pydantic. Perfect for storing user profiles and task history.
▶
FastAPI with Database — ArjanCodes Connects FastAPI to SQLite. Shows exactly how to persist user data your agent reads at runtime.

C · External APIs — Gmail, Calendar, PDF

📋
Gmail API Python Quickstart — Google Developers How to send emails programmatically. Follow the OAuth setup exactly — it's the tricky part. Start here, not the video.
📄
Sending Email via Gmail API — Google Docs Full Python code for constructing and sending MIME messages. Follow this after the quickstart.
📋
Google Calendar API Quickstart — Google Developers Create, read, and update calendar events. Used for the "schedule a meeting" feature in Week 9.
📄
Sending Emails with Python (SMTP) — Real Python Simpler SMTP alternative to Gmail API. Good for the MVP before you set up OAuth.
📄
Reading PDF Files in Python — freeCodeCamp Parse resume PDFs. The agent extracts text from uploaded resumes to fill job application forms automatically.

Assignment 5 Backend API Server

Build the server that the frontend will communicate with:

Build a FastAPI server with: POST /command (receives text command, returns task_id), GET /status/{task_id} (returns task progress), GET/POST /user/profile (read/write user memory)
Store user profiles in SQLite: name, email, phone, address, resume text
Connect the Week 4 LangChain agent to run as a background task when /command is called
Add a WebSocket endpoint that streams live status updates as the agent works step by step
Test all endpoints with FastAPI's auto-generated Swagger UI at /docs

Wk 06

React, System Design & Memory (Stretch)

Topics 7 & 9 — Frontend UI + embeddings/multi-agent as bonus

React components WebSocket client useState/useEffect ChromaDB

The face of the product. The React UI is what you demo. A command bar to type instructions, a live activity feed that shows each agent step in real time over WebSocket, a profile settings page, and a task history view. The stretch goal (memory, embeddings, multi-agent) is optional but makes your agent feel genuinely intelligent over time — the vector database lets the agent retrieve past interactions without exact-match search.

React components useState / useEffect WebSocket client fetch / axios conditional rendering forms in React Tailwind CSS ChromaDB embeddings semantic search LangGraph multi-agent orchestration

A · React fundamentals

▶
React Full Tutorial — Programming with Mosh Best modern intro. Covers hooks, state, and components — everything needed for the project UI.
📋
React Official Tutorial — react.dev The new official tutorial is excellent. Complete "Thinking in React" specifically — it shapes how you structure the UI.
📄
React Hooks: useState and useEffect — freeCodeCamp useState for command input and status display. useEffect for the WebSocket connection to the backend.
📄
WebSockets with React + FastAPI — Medium Exactly the pattern you'll use: React UI receives real-time agent status updates over WebSocket from your FastAPI server.

B · System design & testing

▶
System Design Basics — ByteByteGo How to think about components, data flows, and failure modes in distributed systems.
📄
Pytest for Beginners — Real Python Learn fixtures, parametrize, and mocking. You'll write tests for the intent parser and each agent tool.
▶
Mocking in Python Tests — ArjanCodes Mock the LLM API in tests so you don't spend tokens on every test run.
📋
Playwright Test Runners — Official Docs pytest integration. Write tests that verify your agent's browser actions work correctly and reliably.

C · Stretch goal — memory, embeddings & multi-agent

Optional but powerful. These topics are for Week 9–10 polish, not Week 6. Skim them now so they feel familiar later.

📄
ChromaDB Official Docs The vector store for long-term agent memory. Stores past task outcomes, user preferences, resume sections.
▶
Vector Databases Explained — Fireship Short, clear explanation of embeddings and semantic search. The conceptual foundation for ChromaDB.
📄
LangGraph — Official Docs Multi-agent orchestration: one agent plans, another browses, another drafts the email — all coordinated in a graph.
▶
LangGraph Crash Course Builds a multi-agent system step by step. Watch this when you're comfortable with LangChain (Week 4 done) and want to go further.

Assignment 6 UI Prototype + Architecture Document

Prepare for the build phase with a working UI shell and a clear design plan:

Build a React UI with: a command input bar, a live activity log panel (each agent step in real time), and a user profile settings page
Connect the UI to your Week 5 backend via WebSocket — show real-time status as the agent works
Write a 1-page architecture document with a diagram: UI → FastAPI → AgentExecutor → [LLM, Browser Tools, Memory] → External APIs
Write 5 pytest tests for your intent parser — test each action type (navigate, fill_form, email, summarize, click)
Define Pydantic models for UserProfile, Task, and AgentAction — these become your data contracts for Weeks 7–10

✓

Checkpoint 4 — Learning Phase Complete

All 9 topic areas covered. You have: async Python, a working browser automation engine, a prompting approach, a LangChain agent, a FastAPI backend, a React UI, and a clear architecture plan. Weeks 7–10 are building time.

Reference library — consult as needed in build weeks

These are deeper resources for Weeks 7–10. Not required for the learning phase, but valuable when you hit specific problems.

AI & Agents — advanced

📑
WebArena: Realistic Web Environments for Agents Benchmark for web-navigating agents. Shows what hard real-world tasks look like.
📋
LangSmith — LLM Observability Trace every LLM call in your agent. Essential for debugging when prompts fail silently.
📄
Planning Agents — LangChain Blog Plan-and-Execute agents for complex 5+ step tasks. Needed for Week 9 robustness work.
📄
Vision Understanding — OpenAI GPT-4V Docs Send a screenshot to the LLM and ask "where is the submit button?" — vision-based fallback for hard-to-parse pages.

Engineering & deployment

📋
Docker Getting Started — Official Docs Containerize your agent for reproducible deployment. Useful for the Week 10 polish stage.
📄
Railway Deployment Docs Easiest way to deploy a FastAPI + Playwright backend. Free tier available.
📄
Map-Reduce Summarization — LangChain Docs Summarize long webpages that exceed the LLM context window. Used in the summarization feature module.
📋
Dialogs & Popups — Playwright Docs Handle alert(), confirm(), prompt() dialogs that appear mid-task on real websites.

Build an Agentic AIBrowser Assistant

Build an Agentic AI
Browser Assistant