Build an Agentic AI
Browser Assistant

Master the 9 foundational skill areas — from Python internals and browser automation to LLMs, FastAPI, and React — so you're fully prepared to build your AI agent in Weeks 7–10. You do not need to consume every resource; pick the format that works for you (video vs. text), understand the concept, then move on.

6Weeks
9Core Topics
6Assignments
4Checkpoints
Week 1
Python — Intermediate Level
Week 1 cont.
HTML, DOM & CSS Selectors
Week 2
Playwright — Browser Automation
Week 3
LLMs, APIs & Prompt Engineering
Week 4
Agentic AI Design & LangChain
Week 5
FastAPI, Databases & WebSockets
Week 5 cont.
External APIs — Gmail, Calendar, PDF
Week 6
React — UI & Real-Time Updates
Week 6 stretch
Memory, Embeddings & Multi-Agent
Weeks 1–6
Learning Phase
9 topics · 6 assignments
Wk 01
Python & Web Foundations
Topics 1 & 2 — Python internals + HTML/DOM/CSS selectors
async/await DOM CSS selectors
+
Why this week matters: Every single line of agent code you write runs on Python. Before anything else works — Playwright, FastAPI, LangChain — you must be solid on async/await (Playwright is fully async), JSON handling (every LLM response is JSON), error handling (browser automation breaks constantly), decorators (FastAPI uses them everywhere), and virtual environments. The HTML/DOM/CSS section is equally non-negotiable: the browser automation engine finds elements using CSS selectors, so you need to be able to look at any webpage and identify the right selector to click or type into.
async / await JSON parsing try / except decorators file I/O virtual environments dataclasses / Pydantic HTML structure CSS selectors form elements DOM tree XPath basics iframes
A · Python — intermediate level
Getting started (skip if you know Python basics)
Intermediate Python (used directly in this project)
Practice platform
B · HTML, DOM & CSS selectors
Assignment 1 Environment Setup + Python Warmup

Set up your working environment and prove Python readiness before moving on.

  • Create a Python 3.10+ virtual environment in a folder called ai-browser-agent — install nothing yet except requests
  • Write an async Python script that reads a JSON file of user info (name, email, phone, address) and prints it nicely — this becomes your "memory" layer later
  • Open any website in Chrome DevTools and identify 3 CSS selectors for form inputs, 2 for buttons, 1 for a dropdown — screenshot them
  • Commit everything to a GitHub repo with a README that describes what the final project will be
  • Complete HackerRank Days 0–5
Checkpoint 1 — Dev Environment Ready
You can write async Python, handle JSON, manage exceptions, and identify CSS selectors on any webpage in DevTools. Everything that follows builds on this.
Wk 02
Playwright — Browser Automation
Topic 3 — Control any browser with code
locators wait strategies async Playwright file upload
+
This is the execution backbone of your agent. Playwright is the library that physically clicks buttons, types into forms, navigates URLs, takes screenshots, manages tabs, and uploads files — on your behalf. It is the most-used library in the entire project; every feature module touches it. You're choosing Playwright over Selenium because it has native async support, auto-waits for elements (critical for modern single-page apps), and a much cleaner API.
async Playwright locators fill / click / type wait strategies file upload multiple pages storage state (cookies) screenshot evaluate (JS injection)
A · Playwright core
B · DOM parsing & data extraction
Common failure modes to know now: Elements load dynamically on modern SPAs — use auto-wait or wait_for_selector(). Cookie banners and pop-ups appear before you can interact — detect and dismiss them first. Session expiry shows a login prompt mid-task — handle re-auth gracefully.
Assignment 2 Browser Automation Scripts

Build 3 scripts that will later become building blocks of your agent:

  • Script 1 — Navigator: Open a news site (BBC, HN), extract titles of the top 5 articles, save to a JSON file
  • Script 2 — Form Filler: Go to demoqa.com/automation-practice-form, fill every field from a JSON file, screenshot before submitting
  • Script 3 — Tab Manager: Open 5 tabs in parallel, capture each title, then close all except the first
  • Wrap all scripts in async functions. Handle at least 2 error conditions per script (element not found, timeout)
Checkpoint 2 — Browser Automation Engine Working
You can control a browser programmatically — clicking, typing, navigating, and extracting data. This is the execution layer your agent will use.
Wk 03
LLMs, APIs & Prompt Engineering
Topic 4 — Give your agent a brain
function calling structured outputs few-shot prompting JSON mode
+
The intelligence layer. Your agent converts natural language — "fill this form with my details" — into structured JSON that maps to browser actions. This translation is done entirely by an LLM. How you prompt it determines whether the agent works or not. Prompt engineering is not optional; it is the single most important skill in this entire course. A well-prompted LLM can handle 90% of your agent's logic. A poorly-prompted one fails on every edge case.
OpenAI API function calling structured outputs system prompts few-shot examples chain-of-thought JSON mode token limits GPT-4o vision
A · How LLMs work — conceptual foundation
B · LLM APIs — OpenAI, Claude, Gemini
C · Prompt engineering — the most important section
Read this section carefully. The quality of your prompts determines whether your agent is reliable or not. This is not an exaggeration.
Assignment 3 Intent Parser Prototype

Build the core intelligence module — a function that converts natural language into structured browser actions:

  • Write a function parse_intent(user_command: str) → dict that calls an LLM API and returns structured JSON
  • Define a schema: {"action": "fill_form"|"navigate"|"email"|"summarize"|"click", "target_url": "...", "data": {...}, "steps": [...]}
  • Test with 10 different commands: "apply to this job", "close all tabs", "email this summary to my boss", etc.
  • Add few-shot examples in your system prompt for at least 3 action types
  • Bonus: For ambiguous commands, make the LLM ask a clarifying question before outputting an action plan
Checkpoint 3 — Natural Language → Structured Action Plan
Your intent parser reliably converts user commands into structured JSON action plans. This is the "brain" of your agent.
Wk 04
Agentic AI Design & LangChain
Topic 5 — Multi-step reasoning, tool use, memory, planning
ReAct pattern AgentExecutor custom tools memory
+
From chatbot to agent. A chatbot answers questions. An agent plans, uses tools, remembers state, and executes multi-step tasks in the real world. This week you learn the design patterns — ReAct, Plan-and-Execute, tool-calling — that make this possible, and LangChain, the framework that implements them. The ReAct pattern is what your agent runs on: Reason about what to do → Act with a tool → Observe the result → Reason again.
ReAct pattern LangChain agents custom tools AgentExecutor tool schemas conversation memory plan-and-execute error recovery loop
A · Agentic AI design patterns
B · LangChain — the framework
Assignment 4 LangChain Agent with Playwright Tools

Integrate the Week 2 and Week 3 work into a real agent that uses the browser:

  • Define 3 LangChain tools: navigate_to(url), click_element(selector), type_text(selector, text)
  • Create a LangChain agent that uses these tools to complete tasks like "go to google.com and search for AI news"
  • Add conversation memory so the agent remembers what it just did and can follow up
  • Test the full loop: user command → agent reasoning → tool execution → result → next step
  • Bonus: Add a simple file-based user profile store that the agent can query for name, email, and resume path
Wk 05
FastAPI, Databases, WebSockets & External APIs
Topics 6 & 8 — Backend server + Gmail/Calendar/PDF integrations
FastAPI routes WebSockets SQLite + SQLModel Gmail API
+
The server everything talks through. Your agent needs a backend to receive commands from the React UI, run browser sessions in the background, stream live status updates, and store user profiles. FastAPI is the modern Python choice: async-native, fast, and auto-generates Swagger documentation. The second half of this week connects to real-world services — Gmail for sending emails, Google Calendar for scheduling, and PDF parsing to read uploaded resumes.
FastAPI routes Pydantic models background tasks WebSockets SQLite + SQLModel REST API CORS env variables Gmail API Google Calendar API OAuth 2.0 PDF parsing MIME email format
A · FastAPI
B · Data persistence — SQLite & SQLModel
C · External APIs — Gmail, Calendar, PDF
Assignment 5 Backend API Server

Build the server that the frontend will communicate with:

  • Build a FastAPI server with: POST /command (receives text command, returns task_id), GET /status/{task_id} (returns task progress), GET/POST /user/profile (read/write user memory)
  • Store user profiles in SQLite: name, email, phone, address, resume text
  • Connect the Week 4 LangChain agent to run as a background task when /command is called
  • Add a WebSocket endpoint that streams live status updates as the agent works step by step
  • Test all endpoints with FastAPI's auto-generated Swagger UI at /docs
Wk 06
React, System Design & Memory (Stretch)
Topics 7 & 9 — Frontend UI + embeddings/multi-agent as bonus
React components WebSocket client useState/useEffect ChromaDB
+
The face of the product. The React UI is what you demo. A command bar to type instructions, a live activity feed that shows each agent step in real time over WebSocket, a profile settings page, and a task history view. The stretch goal (memory, embeddings, multi-agent) is optional but makes your agent feel genuinely intelligent over time — the vector database lets the agent retrieve past interactions without exact-match search.
React components useState / useEffect WebSocket client fetch / axios conditional rendering forms in React Tailwind CSS ChromaDB embeddings semantic search LangGraph multi-agent orchestration
A · React fundamentals
B · System design & testing
C · Stretch goal — memory, embeddings & multi-agent
Optional but powerful. These topics are for Week 9–10 polish, not Week 6. Skim them now so they feel familiar later.
  • 📄
    ChromaDB Official Docs The vector store for long-term agent memory. Stores past task outcomes, user preferences, resume sections.
  • Vector Databases Explained — Fireship Short, clear explanation of embeddings and semantic search. The conceptual foundation for ChromaDB.
  • 📄
    LangGraph — Official Docs Multi-agent orchestration: one agent plans, another browses, another drafts the email — all coordinated in a graph.
  • LangGraph Crash Course Builds a multi-agent system step by step. Watch this when you're comfortable with LangChain (Week 4 done) and want to go further.
Assignment 6 UI Prototype + Architecture Document

Prepare for the build phase with a working UI shell and a clear design plan:

  • Build a React UI with: a command input bar, a live activity log panel (each agent step in real time), and a user profile settings page
  • Connect the UI to your Week 5 backend via WebSocket — show real-time status as the agent works
  • Write a 1-page architecture document with a diagram: UI → FastAPI → AgentExecutor → [LLM, Browser Tools, Memory] → External APIs
  • Write 5 pytest tests for your intent parser — test each action type (navigate, fill_form, email, summarize, click)
  • Define Pydantic models for UserProfile, Task, and AgentAction — these become your data contracts for Weeks 7–10
Checkpoint 4 — Learning Phase Complete
All 9 topic areas covered. You have: async Python, a working browser automation engine, a prompting approach, a LangChain agent, a FastAPI backend, a React UI, and a clear architecture plan. Weeks 7–10 are building time.
Reference library — consult as needed in build weeks

These are deeper resources for Weeks 7–10. Not required for the learning phase, but valuable when you hit specific problems.

AI & Agents — advanced
Engineering & deployment