Mentorship Program 2025

Agentic AI Browser
Assistant Course

A complete 10-week curriculum to build an AI-powered workflow automation agent from scratch — no prior ML knowledge required. You do not need to watch or read every resource; use whatever works best for you, focus on understanding, and once a topic clicks move on to the next part. Do not get stressed by the number of links — you can always ask AI to help explain anything confusing and come back to the resources later.

10Weeks
6Learning Weeks
4Build Weeks
5Checkpoints
WEEK 1
Python & Web Fundamentals
WEEK 2
Browser Automation & DOM
WEEK 3
LLMs, APIs & Prompt Engineering
WEEK 4
Agentic AI & LangChain
WEEK 5
Backend, Memory & APIs
WEEK 6
Frontend & System Design
WEEK 7
Core Agent Engine
WEEK 8
Feature Modules
WEEK 9
Robustness & Integration
WEEK 10
Polish & Final Demo
Phase 1 — Learn
Foundations
Weeks 1–6
WK 01
Python, HTML/DOM & Dev Environment Setup
Build your foundation — if you can write this week, the rest follows
Python Web
+
Why this week matters: This project lives at the intersection of Python, web technologies, and AI APIs. Before you automate a browser, you need to be fluent in Python (we'll use it for everything) and understand how HTML/DOM works (because the browser automation engine reads from it). Even if you know Python, revisit the advanced parts — decorators, async/await, and context managers will appear throughout the project.
If you're new: Watch the full crash course first.
Getting started
Intermediate Python — needed directly in this project
Practice platform
The browser automation engine reads the DOM — the tree structure of an HTML page. To tell it "click the submit button," you need to understand selectors, element types, and page structure.
Assignment 1 Environment Setup + Python Warmup

By end of this week, you should have a working dev environment and demonstrate comfort with Python.

  • Set up a Python 3.10+ virtual environment in a project folder called ai-browser-agent
  • Write a Python script that reads a JSON file of user info (name, email, phone, address) and prints it formatted — you'll reuse this as your "memory" layer later
  • Open any website in Chrome DevTools and identify: 3 form input selectors, 2 button selectors, 1 dropdown — screenshot them
  • Commit everything to a GitHub repo with a proper README
  • Complete HackerRank Day 0–5 challenges
WK 02
Browser Automation with Playwright
Control any browser with code — clicking, typing, scraping, navigation
Web Python
+
Why Playwright over Selenium? Playwright is faster, has better async support, auto-waits for elements (critical for modern SPAs), and supports Chromium, Firefox and WebKit. It's the modern standard. You'll use it as the execution backbone of your agent.
Assignment 2 Browser Automation Scripts

Build 3 small automation scripts that will later become building blocks of your agent:

  • Script 1 — Navigator: Open a news site (e.g. BBC), extract the titles of the top 5 articles, and save them to a JSON file
  • Script 2 — Form Filler: Go to demoqa.com/automation-practice-form, fill every field using data from a JSON file, and take a screenshot before submitting
  • Script 3 — Tab Manager: Open 5 tabs in parallel, capture the title of each, then close all except the first one
  • Wrap all scripts in async functions. Handle at least 2 error conditions per script (element not found, timeout)
Checkpoint 1 Basic Browser Automation Engine ✓

After this week, you should be able to control a browser programmatically — clicking, typing, navigating, and extracting data. This is the execution layer of your agent.

WK 03
LLMs, APIs & Prompt Engineering
Give your agent a brain — understand how to talk to AI models effectively
AI ML
+
The core insight of this week: Your agent converts natural language ("fill this form with my details") into structured JSON that maps to browser actions. This translation is done entirely by an LLM. How you prompt it determines whether the agent works or doesn't. Prompt engineering is not optional — it's the intelligence layer.
This section is the most important in the entire course. A well-prompted LLM can do 90% of your agent's logic. A poorly prompted one will fail on every edge case.
Assignment 3 Intent Parser Prototype

Build the core intelligence module of your agent — a function that converts natural language into structured browser actions:

  • Write a Python function parse_intent(user_command: str) → dict that calls an LLM API and returns structured JSON
  • Define a JSON schema: {"action": "fill_form"|"navigate"|"email"|"summarize"|"click", "target_url": "...", "data": {...}, "steps": [...]}
  • Test it with 10 different commands: "apply to this job", "close all tabs", "email this summary to my boss", etc.
  • Add few-shot examples in your system prompt for at least 3 action types
  • Bonus: Handle ambiguous commands by making the LLM ask a clarifying question
Checkpoint 2 Natural Language → Task Conversion ✓

Your intent parser correctly converts user commands into structured JSON action plans. This is the "brain" of your agent.

WK 04
Agentic AI Design & LangChain
Multi-step reasoning, tool use, memory, and planning patterns
AI ML
+
From chatbot to agent: A chatbot answers questions. An agent plans, uses tools, remembers state, and executes multi-step tasks in the real world. This week you'll learn the design patterns (ReAct, Plan-and-Execute, tool-calling) that make this possible — and LangChain, the framework that implements them.
Assignment 4 LangChain Agent with Playwright Tools

Integrate the Week 2 and Week 3 work into a LangChain agent that can actually use the browser:

  • Define 3 LangChain tools: navigate_to(url), click_element(selector), type_text(selector, text)
  • Create a LangChain agent that uses these tools to complete tasks like "go to google.com and search for AI news"
  • Add conversation memory so the agent remembers what it just did
  • Test the full loop: user command → agent reasoning → tool execution → result → next step
  • Bonus: Implement a simple file-based user profile store (name, email, resume path) that the agent can query
WK 05
Backend Development, APIs & User Memory Layer
FastAPI server, REST APIs, data persistence, email/calendar integrations
Python Web
+
Why a backend? Your agent needs a server — to receive commands from a frontend UI, manage browser sessions, store user data securely, and call external APIs (Gmail, Google Calendar). FastAPI is the modern Python choice: fast, async-native, and auto-generates API documentation.
Assignment 5 Backend API Server

Build the server that the frontend will communicate with:

  • Build a FastAPI server with these endpoints: POST /command (receives text command, returns task_id), GET /status/{task_id} (returns task progress), GET/POST /user/profile (read/write user memory)
  • Store user profiles in SQLite: name, email, phone, address, resume text
  • Connect the agent from Week 4 to run as a background task when /command is called
  • Add a WebSocket endpoint that streams live status updates as the agent works
  • Test all endpoints with FastAPI's auto-generated Swagger UI at /docs
WK 06
Frontend UI, System Architecture & Testing
React interface, system design thinking, and building resilient systems
Web Design
+
The final learning week: You'll build the user-facing interface and think through the complete system architecture before entering the build phase. Understanding testing and system design now will save enormous debugging time later.
Assignment 6 UI Prototype + Architecture Document

Prepare for the build phase with clear plans and a working UI shell:

  • Build a React UI with: a command input bar, a live activity log panel (showing each agent step), and a user profile settings page
  • Connect the UI to your Week 5 backend via WebSocket — show real-time agent status
  • Write a 1-page architecture document with a diagram showing: UI → FastAPI → AgentExecutor → [LLM, Browser Tools, Memory] → External APIs
  • Write 5 pytest tests for your intent parser from Week 3 (test each action type)
  • Define your project's data schemas (Pydantic models for UserProfile, Task, AgentAction)
Phase 2 — Build
Project Implementation
Weeks 7–10
WK 07
Core Agent Engine — Intent Parser + Execution Loop
Build the heart of the system: NL → plan → execute → observe
Build AI
+
This is where everything comes together: combine the LLM intent parser (Week 3), LangChain agent framework (Week 4), and Playwright browser tools (Week 2) into a unified agent execution engine. By end of week, the agent should handle 3-4 basic task types end-to-end.
  • IntentParser class — wraps LLM call, enforces JSON schema, handles retries on malformed output. Support: navigate, click, type, extract, summarize, email action types
  • TaskPlanner class — takes an IntentParser output and decomposes complex instructions into ordered step lists. "Apply to this job" → [navigate, read_form_fields, fill_name, fill_email, fill_phone, attach_resume, click_submit]
  • BrowserExecutor class — maps each step type to a Playwright action. Maintains browser session state across steps. Handles navigation errors, missing elements, timeouts
  • AgentLoop — orchestrates the cycle: parse → plan → execute step → observe result → re-plan if needed → next step
  • Context module — after navigating to a page, automatically extract key page info (form fields, buttons, links) and inject into the planning prompt
Web automation is brittle by nature. Plan for these from day one:

• Elements load dynamically (SPA) → use Playwright's auto-wait or wait_for_selector()
• Captchas and bot detection → use realistic delays, proper user-agent
• Pop-ups and cookie banners → detect and dismiss before proceeding
• Form validation errors → read error messages and re-plan
• Session expiry → detect login prompts and handle re-auth
Milestone 7 Working Agent Core — End-to-End Demo

Demonstrate a complete task execution across these scenarios:

  • Demo 1: "Go to Wikipedia and summarize the article on Artificial Intelligence" → agent navigates, extracts text, calls LLM to summarize, returns result
  • Demo 2: "Fill the practice form at demoqa.com with my profile" → agent reads stored user profile, identifies all form fields, fills them all, takes completion screenshot
  • Demo 3: "Open google.com, search for 'latest AI news', and tell me the top 3 headlines" → agent navigates, types query, extracts results, reports back
  • All 3 demos run end-to-end through the FastAPI server with real-time WebSocket updates to the React UI
Checkpoint 3 Multi-Task Workflow Execution ✓

The agent can now handle chained tasks: navigate → read context → act → report. The core execution loop is working.

WK 08
Feature Modules — Form Automation, Email & Summarization
Build the specialized capabilities that make your agent genuinely useful
Build AI
+
From core to product: This week you implement the high-value features outlined in the project spec — intelligent form filling, email automation, content summarization, and tab management. Each is a self-contained module that plugs into the agent's tool registry.
  • Form Scanner Tool: After navigating to a URL, extract all form fields: (label text, input type, name/id attribute, required flag, options if select). Return as structured JSON
  • Field Mapper: Send the form schema + user profile to LLM. Ask it to map each field to the correct user data. Handle mismatches gracefully ("I don't have a LinkedIn URL in your profile — should I skip?")
  • File Upload Handler: Support resume PDF upload using Playwright's set_input_files(). Parse resume text as fallback if upload fails
  • Test on at least 5 different real-world forms (job applications, contact forms, signup pages)
  • Email Drafter: Given a command like "email a summary of this page to john@example.com", call LLM to draft the email body, then send via Gmail API or SMTP
  • Compose from Context: After summarization, automatically draft and optionally send the summary email
  • Reply-to Handling: For "reply to the last email from my boss", use Gmail API to fetch the thread and generate a contextual reply
  • Content Extractor: Use Playwright + BeautifulSoup to extract main article content (strip nav, ads, footers). Try Mozilla Readability via JS injection for cleaner extraction
  • Chunked Summarization: Long pages exceed LLM context. Split text into chunks, summarize each, then summarize the summaries (map-reduce pattern)
  • Structured Summary: Return: 3-sentence TL;DR, 5 key points, sentiment, and suggested tags
  • Tab Tracker: Maintain a registry of open tabs (page objects, titles, URLs). Support "close all tabs except this one", "switch to the GitHub tab", "list all open tabs"
  • Multi-tab Workflow: Open multiple pages in parallel (asyncio.gather), collect results concurrently — used for comparing multiple listings or job postings simultaneously
  • Session Persistence: Save browser storage state (cookies, localStorage) so the agent doesn't need to re-login between sessions
Checkpoint 4 + Milestone 8 Form Automation System + Multi-Module Demo
  • Demo the form filler on 3 different real-world job application forms (Indeed, LinkedIn Easy Apply, a company careers page)
  • Show the full chain: "Summarize this article and email it to me" → summarize → draft email → send → confirm
  • All 4 feature modules working and registered as tools in the LangChain agent
  • User profile stored persistently; agent reads it automatically without being told where to find it
WK 09
Robustness, Multi-Step Planning & Advanced Features
Make the agent production-grade: retries, fallbacks, complex task chains
Build AI
+
From demo to reliable system: Demos work on the happy path. This week you build the error handling, retry logic, and adaptive planning that makes your agent work on the messy, unpredictable real web. Also: tackle the hardest multi-step tasks that chain 5+ actions.
  • Retry with Backoff: Wrap every browser action in a retry decorator (3 attempts, exponential backoff). Different retry strategies for different error types (TimeoutError vs ElementNotFound vs NavigationError)
  • Dynamic Selector Healing: If a CSS selector fails, ask the LLM to generate 2 alternative selectors based on the page HTML snapshot. Try each fallback before giving up
  • Anti-Bot Handling: Detect bot-detection pages (Cloudflare, reCAPTCHA). Implement realistic human delays (random sleep 1–3s between actions), proper user-agent, non-headless mode option
  • Error Recovery Loop: When a step fails, capture the current page state (screenshot + HTML), send to LLM with error context, get a corrected action plan, continue
  • Task State Machine: Model each task as a state machine (PENDING → RUNNING → WAITING_FOR_INPUT → COMPLETED/FAILED). Persist state to DB so tasks can be resumed
  • Plan Decomposition: Implement a two-stage planner: high-level plan (5–10 macro steps) → micro-execution (each macro step decomposed into browser actions on the actual page)
  • Dynamic Re-planning: After each macro step, evaluate success and re-generate the remaining plan based on current observed state. Critical for complex tasks where pages differ from expectations
  • Calendar Scheduling: Implement "schedule a meeting with X for next Tuesday at 3pm" — check calendar availability, create event, send invite
  • Multi-site Workflow: "Compare prices for this product across Amazon, Flipkart, and Snapdeal" — parallel browser contexts, structured result aggregation
For dynamic pages that are hard to parse with DOM selectors, use vision-based understanding — take a screenshot and ask a vision LLM where to click.
Milestone 9 Stress Test Your Agent
  • Run the agent against 10 different websites you haven't tested on. Document every failure and fix at least 7
  • Implement and demonstrate the error recovery loop: intentionally break something (wrong selector), show the agent detecting the error, re-planning, and completing the task anyway
  • Implement the meeting scheduling workflow end-to-end via Google Calendar API
  • Build one complex 5+ step pipeline of your choice (e.g., "research this company, save key info, draft a cold email, schedule a follow-up reminder")
WK 10
Final Polish, UI/UX, Testing & Demo Preparation
Make it presentable, reliable, and ready to impress
Build Web
+
The final sprint: The last week is about presentation quality. A working agent that demos badly is worse than a 70%-working agent that demos flawlessly. Focus on the experience: smooth UI, fast response, clear status messages, and a polished 5-minute demo flow.
  • Command Bar: Auto-suggest based on common commands. Show typing indicator while agent works. Support multi-turn follow-ups ("now email that summary to my professor")
  • Live Activity Feed: Each agent step shown in real-time with: icon, description, timestamp, status (✓ done / ⟳ in progress / ✗ failed). Click any step to see details
  • Profile Manager: Clean form to edit user profile. Show what data the agent has access to. Allow uploading resume PDF
  • Task History: List of past tasks with status, timestamp, and "replay" button
  • Error Display: When something fails, show a clear message with what was attempted and a "retry" option
  • End-to-end test all 5 checkpoint scenarios in one clean session
  • Test on Chrome, Firefox (Playwright supports both) — verify compatibility
  • Test with 3 different user profiles — ensure profile switching works correctly
  • Load test: run 3 concurrent tasks — verify the backend handles parallel browser sessions
  • Write a TROUBLESHOOTING.md documenting the 10 most common failures and their fixes
  • README.md: Project overview, architecture diagram, setup instructions (must work from fresh clone in under 10 minutes), feature list, known limitations
  • Technical Report (3–5 pages): Architecture decisions made and why, challenges encountered and solutions, performance metrics (avg task completion time, success rate across website types), what you'd improve with more time
  • Demo Video (3–5 min): Record 3 complete task executions. No cuts — let the agent run live. Narrate what the agent is "thinking" at each step
  • Code Quality: Add docstrings to all classes/functions. Remove debug print statements. Ensure .env.example exists for API keys
Checkpoint 5 — Final Demo End-to-End Demo System ✓

Present a fully functional demo covering all 5 milestones:

  • ✓ Milestone 1: Control browser — navigate, click, type (Week 2)
  • ✓ Milestone 2: Natural language → structured action (Week 3)
  • ✓ Milestone 3: Fill a real-world job application form end-to-end (Week 8)
  • ✓ Milestone 4: "Summarize this article and email it to me" — chained task (Week 8)
  • ✓ Milestone 5: Complex multi-step workflow of your choice — demonstrates robustness (Week 9)
  • Live demo via the React UI. Audience can type commands and see the agent work in real time

These are deeper resources to consult when you need to go further on specific topics. Not required, but valuable.

AI & Agents
Engineering & Deployment