Agentic AI Browser Assistant

WK 01

Python, HTML/DOM & Dev Environment Setup

Build your foundation — if you can write this week, the rest follows

Python Web

+

Why this week matters: This project lives at the intersection of Python, web technologies, and AI APIs. Before you automate a browser, you need to be fluent in Python (we'll use it for everything) and understand how HTML/DOM works (because the browser automation engine reads from it). Even if you know Python, revisit the advanced parts — decorators, async/await, and context managers will appear throughout the project.

A. Python — Essentials to Intermediate

If you're new: Watch the full crash course first.

Getting started

▶
Python Full Course for Beginners — freeCodeCamp (4hr)Best single-video intro. Covers variables through file I/O. Watch 1.5x speed.
▶
Python Full Course for Beginners (2hr)
📄
Official Python Tutorial — python.orgThe definitive reference. Chapters 1–9 are essential. Bookmark and return often.

Intermediate Python — needed directly in this project

▶
Async/Await in Python — Corey SchaferPlaywright and API calls are asynchronous. This is non-negotiable to understand.
▶
Python Decorators Explained — Tech With TimUsed heavily in FastAPI route definitions.
📄
Working with JSON in Python — Real PythonEvery LLM response and API call will be JSON. Must know this deeply.
📄
Python Exceptions & Error Handling — Real PythonBrowser automation breaks constantly. Proper error handling is a core skill.
▶
Python Virtual Environments — Corey SchaferLearn venv and pip properly. Every professional project uses this.

Practice platform

💻
HackerRank Python — 30 Days of CodeDo the first 10 challenges. Practical problem-solving beats passive watching.
💻
Exercism — Python TrackMentor-reviewed exercises. Excellent for getting feedback on your code style.

B. HTML, CSS & the DOM

The browser automation engine reads the DOM — the tree structure of an HTML page. To tell it "click the submit button," you need to understand selectors, element types, and page structure.

▶
HTML Full Course — freeCodeCamp (2hr)Watch sections on forms, inputs, and document structure — these are what you'll automate.
📄
Introduction to the DOM — MDN Web DocsThe official, best explanation of what the DOM is. Short read, high value.
📄
CSS Selectors — MDNPlaywright uses CSS selectors to find elements. Know div, class, id, input[type], nth-child.
▶
HTML Forms Deep DiveForms are the #1 thing you'll automate. Understand name, id, action, input types.
💻
CSS Diner — Interactive CSS Selector GameBest interactive way to learn selectors. Finish all 32 levels.

C. Dev Environment Setup

📋
VS Code Python Setup — Official DocsInstall Python extension, linting (pylint), and debugger. This is your IDE for the project.
▶
Git & GitHub Crash Course — Traversy MediaEvery week's work should be committed. Learn init, add, commit, push, branches.
📄
Virtual Environments Primer — Real PythonSet up your project venv properly from day one.

Assignment 1 Environment Setup + Python Warmup

By end of this week, you should have a working dev environment and demonstrate comfort with Python.

Set up a Python 3.10+ virtual environment in a project folder called ai-browser-agent
Write a Python script that reads a JSON file of user info (name, email, phone, address) and prints it formatted — you'll reuse this as your "memory" layer later
Open any website in Chrome DevTools and identify: 3 form input selectors, 2 button selectors, 1 dropdown — screenshot them
Commit everything to a GitHub repo with a proper README
Complete HackerRank Day 0–5 challenges

WK 02

Browser Automation with Playwright

Control any browser with code — clicking, typing, scraping, navigation

Web Python

+

Why Playwright over Selenium? Playwright is faster, has better async support, auto-waits for elements (critical for modern SPAs), and supports Chromium, Firefox and WebKit. It's the modern standard. You'll use it as the execution backbone of your agent.

A. Playwright Core

📋
Playwright for Python — Official Getting StartedRead this completely. Then re-read the selectors and actions sections. This is your primary reference.
▶
Playwright Python Full TutorialBest Playlist. Covers installation, browsers, pages, clicks, typing, screenshots.
▶
Web Scraping with Playwright See Playwright used for real data extraction. Directly relevant to the agent's page-reading module.
📋
Playwright Locators — Official DocsThe new, preferred way to find elements. More stable than raw selectors.

B. DOM Parsing & Data Extraction

▶
BeautifulSoup Tutorial — Corey SchaferFor parsing HTML after Playwright fetches it. Great combo: Playwright gets the page, BS4 parses it.
📄
Web Scraping with BeautifulSoup — Real PythonComprehensive tutorial with real examples. Focus on find(), find_all(), and CSS selector usage.
📋
JavaScript Evaluate — Playwright DocsRun JS in the page context from Python. Sometimes needed to extract data from complex SPAs.

Assignment 2 Browser Automation Scripts

Build 3 small automation scripts that will later become building blocks of your agent:

Script 1 — Navigator: Open a news site (e.g. BBC), extract the titles of the top 5 articles, and save them to a JSON file
Script 2 — Form Filler: Go to demoqa.com/automation-practice-form, fill every field using data from a JSON file, and take a screenshot before submitting
Script 3 — Tab Manager: Open 5 tabs in parallel, capture the title of each, then close all except the first one
Wrap all scripts in async functions. Handle at least 2 error conditions per script (element not found, timeout)

Checkpoint 1 Basic Browser Automation Engine ✓

After this week, you should be able to control a browser programmatically — clicking, typing, navigating, and extracting data. This is the execution layer of your agent.

WK 03

LLMs, APIs & Prompt Engineering

Give your agent a brain — understand how to talk to AI models effectively

AI ML

+

The core insight of this week: Your agent converts natural language ("fill this form with my details") into structured JSON that maps to browser actions. This translation is done entirely by an LLM. How you prompt it determines whether the agent works or doesn't. Prompt engineering is not optional — it's the intelligence layer.

A. How LLMs Work — Conceptual Foundation

▶
Intro to Large Language Models — Andrej Karpathy (1hr)The best conceptual overview of LLMs by one of their creators. Watch every minute.
📄
The Illustrated Transformer — Jay AlammarVisual explanation of the attention mechanism powering all modern LLMs. A classic.
▶
How ChatGPT Works — ComputerphileShort, accessible explanation of RLHF and fine-tuning. 10 minutes, worth it.

B. OpenAI API & Alternatives

📋
OpenAI API Quickstart — Official DocsGet your first completion in 10 minutes. Start here.
📋
Function Calling / Tool Use — OpenAI DocsCritical feature: force the LLM to return structured JSON matching a schema you define. This is how your intent parser works.
▶
OpenAI Function Calling TutorialWatch this after reading the docs. Shows real-world usage of structured output extraction.
📄
Anthropic Claude API — Getting StartedAlternative to OpenAI. Has better instruction-following for complex tasks. Free tier available.
📋
Google Gemini API QuickstartAnother strong free-tier alternative. Good for multimodal tasks (screenshot understanding).

C. Prompt Engineering — The Intelligence Layer

This section is the most important in the entire course. A well-prompted LLM can do 90% of your agent's logic. A poorly prompted one will fail on every edge case.

📋
Prompt Engineering Guide — promptingguide.aiThe most comprehensive free resource. Read: zero-shot, few-shot, chain-of-thought, and structured output sections.
📄
Learn Prompting — Full Interactive CourseCovers basic to advanced prompting. Especially read the "Structuring Output" and "Role Prompting" chapters.
▶
ChatGPT Prompt Engineering Course — DeepLearning.AI (free)Andrew Ng + OpenAI. One of the best structured courses on prompting. 1.5 hours total.
📄
OpenAI Prompt Engineering Strategies — Official GuideSix strategies for getting reliable structured outputs. Read "Write clear instructions" and "Use structured output" sections.

D. JSON Mode & Structured Outputs

📋
Structured Outputs — OpenAI DocsForce the LLM to return valid JSON matching your exact schema. This is how your intent parser outputs action plans.
📄
Structured Outputs Complete Guide Practical walkthrough with real code examples of defining schemas and parsing responses.

Assignment 3 Intent Parser Prototype

Build the core intelligence module of your agent — a function that converts natural language into structured browser actions:

Write a Python function parse_intent(user_command: str) → dict that calls an LLM API and returns structured JSON
Define a JSON schema: {"action": "fill_form"|"navigate"|"email"|"summarize"|"click", "target_url": "...", "data": {...}, "steps": [...]}
Test it with 10 different commands: "apply to this job", "close all tabs", "email this summary to my boss", etc.
Add few-shot examples in your system prompt for at least 3 action types
Bonus: Handle ambiguous commands by making the LLM ask a clarifying question

Checkpoint 2 Natural Language → Task Conversion ✓

Your intent parser correctly converts user commands into structured JSON action plans. This is the "brain" of your agent.

WK 04

Agentic AI Design & LangChain

Multi-step reasoning, tool use, memory, and planning patterns

AI ML

+

From chatbot to agent: A chatbot answers questions. An agent plans, uses tools, remembers state, and executes multi-step tasks in the real world. This week you'll learn the design patterns (ReAct, Plan-and-Execute, tool-calling) that make this possible — and LangChain, the framework that implements them.

A. Agentic AI Design Patterns

📑
ReAct: Synergizing Reasoning and Acting — Original PaperThe foundational pattern your agent uses: Reason about what to do → Take Action → Observe result → Reason again. Read the abstract and examples section.
📄
LLM Powered Autonomous Agents — Lilian Weng (OpenAI)The best technical overview of agent architecture: planning, memory, and tool use. A must-read.
▶
Deep Dive into LLMs like ChatGPTKarpathy builds an agent from first principles. Reveals the fundamental loop beneath all agent frameworks.
📄
Building Effective Agents — Anthropic Engineering BlogPractical patterns for reliable agents: when to use sequential vs parallel flows, managing tool failures.

B. LangChain — Framework for Agent Building

📋
LangChain Introduction — Official DocsStart here. Understand the core abstractions: LLMs, Chains, Agents, Tools, Memory.
▶
LangChain Crash Course — Patrick Loeber (1hr)Best single video to get up to speed on LangChain quickly. Build 3 real examples.
▶
Building Custom Agents with LangChainReal code walkthrough of building a custom tool-using agent. Close to what you'll build.

C. Memory in Agents

📋
Conversation Memory — LangChain DocsShort-term: how to maintain context across a multi-turn task like "apply to this job, use my stored resume."
📄
ChromaDB Getting Started — MediumLong-term memory: vector database for storing user preferences, previous tasks, resume data.
📋
ChromaDB Official DocsReference for the vector store you'll use for the agent's persistent memory layer.

Assignment 4 LangChain Agent with Playwright Tools

Integrate the Week 2 and Week 3 work into a LangChain agent that can actually use the browser:

Define 3 LangChain tools: navigate_to(url), click_element(selector), type_text(selector, text)
Create a LangChain agent that uses these tools to complete tasks like "go to google.com and search for AI news"
Add conversation memory so the agent remembers what it just did
Test the full loop: user command → agent reasoning → tool execution → result → next step
Bonus: Implement a simple file-based user profile store (name, email, resume path) that the agent can query

WK 05

Backend Development, APIs & User Memory Layer

FastAPI server, REST APIs, data persistence, email/calendar integrations

Python Web

+

Why a backend? Your agent needs a server — to receive commands from a frontend UI, manage browser sessions, store user data securely, and call external APIs (Gmail, Google Calendar). FastAPI is the modern Python choice: fast, async-native, and auto-generates API documentation.

A. FastAPI

📋
FastAPI Official Tutorial — fastapi.tiangolo.comThe best framework documentation in the Python ecosystem. Read First Steps through Request Body. Run every example.
▶
FastAPI Crash Course — Patrick Loeber40-minute video. Builds a complete API with GET, POST, path params, query params, and request body.
📄
Python REST APIs with FastAPI — Real PythonIn-depth tutorial covering CRUD operations, Pydantic validation, and error handling.
📋
Background Tasks — FastAPI DocsRunning browser automation in the background while the API returns immediately. Essential for long-running agent tasks.
📋
WebSockets — FastAPI DocsStream real-time status updates to the frontend as the agent completes steps ("✓ Opened form... ✓ Typed name...").

B. Data Persistence — SQLite & SQLModel

📋
SQLModel — Official DocsBy the FastAPI author. Combines SQLAlchemy + Pydantic. Perfect for storing user profiles, task history.
▶
FastAPI with Database — ArjanCodesConnects FastAPI to a SQLite database. Shows exactly how to persist user data your agent will read.

C. External API Integrations

📋
Gmail API Python Quickstart — Google DevelopersHow to send emails programmatically. Follow the OAuth setup exactly — it's the tricky part.
📋
Google Calendar API Quickstart — Google DevelopersCreate, read and update calendar events. Used for the "schedule a meeting" feature.
📄
Sending Emails with Python — Real PythonSMTP alternative to Gmail API. Simpler setup, good for the MVP.
📄
Reading PDF Files in PythonParse resume PDFs. The agent needs to extract text from uploaded resumes to fill forms.

Assignment 5 Backend API Server

Build the server that the frontend will communicate with:

Build a FastAPI server with these endpoints: POST /command (receives text command, returns task_id), GET /status/{task_id} (returns task progress), GET/POST /user/profile (read/write user memory)
Store user profiles in SQLite: name, email, phone, address, resume text
Connect the agent from Week 4 to run as a background task when /command is called
Add a WebSocket endpoint that streams live status updates as the agent works
Test all endpoints with FastAPI's auto-generated Swagger UI at /docs

WK 06

Frontend UI, System Architecture & Testing

React interface, system design thinking, and building resilient systems

Web Design

+

The final learning week: You'll build the user-facing interface and think through the complete system architecture before entering the build phase. Understanding testing and system design now will save enormous debugging time later.

A. React Fundamentals

▶
React Full TutorialBest modern intro. Covers hooks, state, components — everything you need for the project UI.
📋
React Official Tutorial — react.devThe new official tutorial is excellent. Complete "Thinking in React" — it's how you'll structure the UI.
📄
React Hooks: useState and useEffect — freeCodeCampuseState for managing command input and status display. useEffect for WebSocket connection to backend.
📄
WebSockets with React + FastAPI — MediumExactly the pattern you'll use: React UI receives real-time agent status updates over WebSocket.

B. System Design for Your Agent

▶
System Design Basics — ByteByteGoHow to think about components, data flows, and failure modes in distributed systems.
📄
Designing AI Agent Architecture Patterns — MediumMaps out the exact architecture layers your project has: UI → API → Planner → Tools → Browser.
📄
The Twelve-Factor App — 12factor.netBest practices for building maintainable software. Apply factors 3 (config), 7 (port binding), 11 (logs) to your project.

C. Testing Automation Scripts

📋
Playwright Test Runners — Official Docspytest integration. Write tests that verify your agent's browser actions work correctly.
📄
Pytest for Beginners — Real PythonLearn fixtures, parametrize, and mocking — all needed to test your agent's components reliably.
▶
Mocking in Python Tests — ArjanCodesMock the LLM API in tests so you don't spend tokens on every test run.

Assignment 6 UI Prototype + Architecture Document

Prepare for the build phase with clear plans and a working UI shell:

Build a React UI with: a command input bar, a live activity log panel (showing each agent step), and a user profile settings page
Connect the UI to your Week 5 backend via WebSocket — show real-time agent status
Write a 1-page architecture document with a diagram showing: UI → FastAPI → AgentExecutor → [LLM, Browser Tools, Memory] → External APIs
Write 5 pytest tests for your intent parser from Week 3 (test each action type)
Define your project's data schemas (Pydantic models for UserProfile, Task, AgentAction)

WK 07

Core Agent Engine — Intent Parser + Execution Loop

Build the heart of the system: NL → plan → execute → observe

Build AI

+

This is where everything comes together: combine the LLM intent parser (Week 3), LangChain agent framework (Week 4), and Playwright browser tools (Week 2) into a unified agent execution engine. By end of week, the agent should handle 3-4 basic task types end-to-end.

What to Build This Week

IntentParser class — wraps LLM call, enforces JSON schema, handles retries on malformed output. Support: navigate, click, type, extract, summarize, email action types
TaskPlanner class — takes an IntentParser output and decomposes complex instructions into ordered step lists. "Apply to this job" → [navigate, read_form_fields, fill_name, fill_email, fill_phone, attach_resume, click_submit]
BrowserExecutor class — maps each step type to a Playwright action. Maintains browser session state across steps. Handles navigation errors, missing elements, timeouts
AgentLoop — orchestrates the cycle: parse → plan → execute step → observe result → re-plan if needed → next step
Context module — after navigating to a page, automatically extract key page info (form fields, buttons, links) and inject into the planning prompt

Reference Implementations to Study

💻
Playwright MCP Server — Microsoft GitHubOfficial reference for how to expose browser actions as tool-call endpoints. Study the action schema.
💻
How to Create Custom ToolsStudy how existing tools are structured: __call__, description, args_schema. Use as template.
📄
Frames & iFrames — Playwright DocsMany real-world forms live inside iframes (especially payment forms, embedded widgets). Know how to handle.
📄
Input Actions Reference — Playwright DocsComplete reference for: fill(), type(), check(), select_option(), upload_file(), drag_and_drop(). Bookmark.

Common Failure Modes — Prepare For These

Web automation is brittle by nature. Plan for these from day one:

• Elements load dynamically (SPA) → use Playwright's auto-wait or wait_for_selector()
• Captchas and bot detection → use realistic delays, proper user-agent
• Pop-ups and cookie banners → detect and dismiss before proceeding
• Form validation errors → read error messages and re-plan
• Session expiry → detect login prompts and handle re-auth

📋
Waiting Strategies — Playwright DocswaitForSelector, waitForNavigation, waitForLoadState. Master these to fix 80% of timing bugs.
📄
Dialogs & Popups — Playwright DocsHandle alert(), confirm(), prompt() dialogs. Critical for real-world websites.

Milestone 7 Working Agent Core — End-to-End Demo

Demonstrate a complete task execution across these scenarios:

Demo 1: "Go to Wikipedia and summarize the article on Artificial Intelligence" → agent navigates, extracts text, calls LLM to summarize, returns result
Demo 2: "Fill the practice form at demoqa.com with my profile" → agent reads stored user profile, identifies all form fields, fills them all, takes completion screenshot
Demo 3: "Open google.com, search for 'latest AI news', and tell me the top 3 headlines" → agent navigates, types query, extracts results, reports back
All 3 demos run end-to-end through the FastAPI server with real-time WebSocket updates to the React UI

Checkpoint 3 Multi-Task Workflow Execution ✓

The agent can now handle chained tasks: navigate → read context → act → report. The core execution loop is working.

WK 08

Feature Modules — Form Automation, Email & Summarization

Build the specialized capabilities that make your agent genuinely useful

Build AI

+

From core to product: This week you implement the high-value features outlined in the project spec — intelligent form filling, email automation, content summarization, and tab management. Each is a self-contained module that plugs into the agent's tool registry.

Module 1: Intelligent Form Filling

Form Scanner Tool: After navigating to a URL, extract all form fields: (label text, input type, name/id attribute, required flag, options if select). Return as structured JSON
Field Mapper: Send the form schema + user profile to LLM. Ask it to map each field to the correct user data. Handle mismatches gracefully ("I don't have a LinkedIn URL in your profile — should I skip?")
File Upload Handler: Support resume PDF upload using Playwright's set_input_files(). Parse resume text as fallback if upload fails
Test on at least 5 different real-world forms (job applications, contact forms, signup pages)

📋
File Upload — Playwright Docsset_input_files() for file upload inputs. Handle multiple files and non-standard upload buttons.
📄
PDF Text Extraction — Real PythonExtract text from resume PDFs using pdfminer or pypdf2. Feed to LLM for structured parsing.

Module 2: Email Automation

Email Drafter: Given a command like "email a summary of this page to john@example.com", call LLM to draft the email body, then send via Gmail API or SMTP
Compose from Context: After summarization, automatically draft and optionally send the summary email
Reply-to Handling: For "reply to the last email from my boss", use Gmail API to fetch the thread and generate a contextual reply

📄
Sending Email via Gmail API — Google DocsFull guide with Python code for constructing and sending MIME messages. Follow exactly.
📄
Gmail Agent with LangChain — MediumFull example of building an email-sending LangChain tool. Adapt this for your agent.

Module 3: Webpage Summarization

Content Extractor: Use Playwright + BeautifulSoup to extract main article content (strip nav, ads, footers). Try Mozilla Readability via JS injection for cleaner extraction
Chunked Summarization: Long pages exceed LLM context. Split text into chunks, summarize each, then summarize the summaries (map-reduce pattern)
Structured Summary: Return: 3-sentence TL;DR, 5 key points, sentiment, and suggested tags

📋
Summarization with Map Reduce — LangChain DocsThe official LangChain pattern for summarizing long documents. Use this for long webpages.
📄
Self-Healing Playwright Scripts — MediumInject Mozilla's Readability.js into any page to extract clean article content. Dramatically improves summaries.

Module 4: Tab & Browser Management

Tab Tracker: Maintain a registry of open tabs (page objects, titles, URLs). Support "close all tabs except this one", "switch to the GitHub tab", "list all open tabs"
Multi-tab Workflow: Open multiple pages in parallel (asyncio.gather), collect results concurrently — used for comparing multiple listings or job postings simultaneously
Session Persistence: Save browser storage state (cookies, localStorage) so the agent doesn't need to re-login between sessions

📋
Multiple Pages — Playwright DocsHow to work with multiple browser contexts and pages simultaneously.
📋
Authentication & Storage State — Playwright DocsSave login state to JSON, reuse across sessions. Avoids re-logging-in every agent run.

Checkpoint 4 + Milestone 8 Form Automation System + Multi-Module Demo

Demo the form filler on 3 different real-world job application forms (Indeed, LinkedIn Easy Apply, a company careers page)
Show the full chain: "Summarize this article and email it to me" → summarize → draft email → send → confirm
All 4 feature modules working and registered as tools in the LangChain agent
User profile stored persistently; agent reads it automatically without being told where to find it

WK 09

Robustness, Multi-Step Planning & Advanced Features

Make the agent production-grade: retries, fallbacks, complex task chains

Build AI

+

From demo to reliable system: Demos work on the happy path. This week you build the error handling, retry logic, and adaptive planning that makes your agent work on the messy, unpredictable real web. Also: tackle the hardest multi-step tasks that chain 5+ actions.

Robustness Engineering

Retry with Backoff: Wrap every browser action in a retry decorator (3 attempts, exponential backoff). Different retry strategies for different error types (TimeoutError vs ElementNotFound vs NavigationError)
Dynamic Selector Healing: If a CSS selector fails, ask the LLM to generate 2 alternative selectors based on the page HTML snapshot. Try each fallback before giving up
Anti-Bot Handling: Detect bot-detection pages (Cloudflare, reCAPTCHA). Implement realistic human delays (random sleep 1–3s between actions), proper user-agent, non-headless mode option
Error Recovery Loop: When a step fails, capture the current page state (screenshot + HTML), send to LLM with error context, get a corrected action plan, continue
Task State Machine: Model each task as a state machine (PENDING → RUNNING → WAITING_FOR_INPUT → COMPLETED/FAILED). Persist state to DB so tasks can be resumed

Advanced Multi-Step Planning

Plan Decomposition: Implement a two-stage planner: high-level plan (5–10 macro steps) → micro-execution (each macro step decomposed into browser actions on the actual page)
Dynamic Re-planning: After each macro step, evaluate success and re-generate the remaining plan based on current observed state. Critical for complex tasks where pages differ from expectations
Calendar Scheduling: Implement "schedule a meeting with X for next Tuesday at 3pm" — check calendar availability, create event, send invite
Multi-site Workflow: "Compare prices for this product across Amazon, Flipkart, and Snapdeal" — parallel browser contexts, structured result aggregation

📄
Planning Agents — LangChain BlogLangChain's approach to Plan-and-Execute agents. Exactly the pattern for complex multi-step tasks.
📑
AgentBench: Evaluating LLM Agents — arXivBenchmark paper with real task examples. Study the web-browsing task category for ideas and difficulty calibration.
▶
AI-Based Tools for Self-Healing Locators — MediumThe concept of dynamic selector repair in automation. Adapt this to your agent's recovery loop.

Visual Understanding (Advanced / Optional)

For dynamic pages that are hard to parse with DOM selectors, use vision-based understanding — take a screenshot and ask a vision LLM where to click.

📋
Vision Understanding — OpenAI GPT-4V DocsSend a screenshot to GPT-4o and ask "where is the submit button?" — returns coordinates you can use to click.
📄
Screen Agent with GPT-4V + Playwright — GithubFull implementation of vision-based browser automation. Use as fallback when DOM-based approach fails.

Milestone 9 Stress Test Your Agent

Run the agent against 10 different websites you haven't tested on. Document every failure and fix at least 7
Implement and demonstrate the error recovery loop: intentionally break something (wrong selector), show the agent detecting the error, re-planning, and completing the task anyway
Implement the meeting scheduling workflow end-to-end via Google Calendar API
Build one complex 5+ step pipeline of your choice (e.g., "research this company, save key info, draft a cold email, schedule a follow-up reminder")

WK 10

Final Polish, UI/UX, Testing & Demo Preparation

Make it presentable, reliable, and ready to impress

Build Web

+

The final sprint: The last week is about presentation quality. A working agent that demos badly is worse than a 70%-working agent that demos flawlessly. Focus on the experience: smooth UI, fast response, clear status messages, and a polished 5-minute demo flow.

UI/UX Polish Checklist

Command Bar: Auto-suggest based on common commands. Show typing indicator while agent works. Support multi-turn follow-ups ("now email that summary to my professor")
Live Activity Feed: Each agent step shown in real-time with: icon, description, timestamp, status (✓ done / ⟳ in progress / ✗ failed). Click any step to see details
Profile Manager: Clean form to edit user profile. Show what data the agent has access to. Allow uploading resume PDF
Task History: List of past tasks with status, timestamp, and "replay" button
Error Display: When something fails, show a clear message with what was attempted and a "retry" option

Final Testing Sprint

End-to-end test all 5 checkpoint scenarios in one clean session
Test on Chrome, Firefox (Playwright supports both) — verify compatibility
Test with 3 different user profiles — ensure profile switching works correctly
Load test: run 3 concurrent tasks — verify the backend handles parallel browser sessions
Write a TROUBLESHOOTING.md documenting the 10 most common failures and their fixes

Documentation & Deliverables

README.md: Project overview, architecture diagram, setup instructions (must work from fresh clone in under 10 minutes), feature list, known limitations
Technical Report (3–5 pages): Architecture decisions made and why, challenges encountered and solutions, performance metrics (avg task completion time, success rate across website types), what you'd improve with more time
Demo Video (3–5 min): Record 3 complete task executions. No cuts — let the agent run live. Narrate what the agent is "thinking" at each step
Code Quality: Add docstrings to all classes/functions. Remove debug print statements. Ensure .env.example exists for API keys

Checkpoint 5 — Final Demo End-to-End Demo System ✓

Present a fully functional demo covering all 5 milestones:

✓ Milestone 1: Control browser — navigate, click, type (Week 2)
✓ Milestone 2: Natural language → structured action (Week 3)
✓ Milestone 3: Fill a real-world job application form end-to-end (Week 8)
✓ Milestone 4: "Summarize this article and email it to me" — chained task (Week 8)
✓ Milestone 5: Complex multi-step workflow of your choice — demonstrates robustness (Week 9)
Live demo via the React UI. Audience can type commands and see the agent work in real time

Agentic AI BrowserAssistant Course

Agentic AI Browser
Assistant Course