An Advanced Coding Implementation: Mastering Browser‑driven Ai In Google Colab With Playwright, Browser_use Agent & Browsercontext, Langchain, And Gemini

Trending 1 week ago
ARTICLE AD BOX

In this tutorial, we will study really to harness nan powerfulness of a browser‑driven AI supplier wholly wrong Google Colab. We will utilize Playwright’s headless Chromium engine, on pinch nan browser_use library’s high-level Agent and BrowserContext abstractions, to programmatically navigate websites, extract data, and automate analyzable workflows. We will wrap Google’s Gemini exemplary via nan langchain_google_genai connector to supply natural‑language reasoning and decision‑making, secured by pydantic’s SecretStr for safe API‑key handling. With getpass managing credentials, asyncio orchestrating non‑blocking execution, and optional .env support via python-dotenv, this setup will springiness you an end‑to‑end, interactive supplier level without ever leaving your notebook environment.

!apt-get update -qq !apt-get instal -y -qq chromium-browser chromium-chromedriver fonts-liberation !pip instal -qq playwright python-dotenv langchain-google-generative-ai browser-use !playwright install

We first refresh nan strategy package lists and instal headless Chromium, its WebDriver, and nan Liberation fonts to alteration browser automation. It past installs Playwright on pinch python-dotenv, nan LangChain GoogleGenerativeAI connector, and browser-use, and yet downloads nan basal browser binaries via playwright install.

import os import asyncio from getpass import getpass from pydantic import SecretStr from langchain_google_genai import ChatGoogleGenerativeAI from browser_use import Agent, Browser, BrowserContextConfig, BrowserConfig from browser_use.browser.browser import BrowserContext

We bring successful nan halfway Python utilities, os for situation guidance and asyncio for asynchronous execution, positive getpass and pydantic’s SecretStr for unafraid API‑key input and storage. It past loads LangChain’s Gemini wrapper (ChatGoogleGenerativeAI) and nan browser_use toolkit (Agent, Browser, BrowserContextConfig, BrowserConfig, and BrowserContext) to configure and thrust a headless browser agent.

os.environ["ANONYMIZED_TELEMETRY"] = "false"

We disable anonymous usage reporting by mounting nan ANONYMIZED_TELEMETRY situation adaptable to “false”, ensuring that neither Playwright nor nan browser_use room sends immoderate telemetry information backmost to its maintainers.

async def setup_browser(headless: bool = True): browser = Browser(config=BrowserConfig(headless=headless)) discourse = BrowserContext( browser=browser, config=BrowserContextConfig( wait_for_network_idle_page_load_time=5.0, highlight_elements=True, save_recording_path="./recordings", ) ) return browser, context

This asynchronous helper initializes a headless (or headed) Browser lawsuit and wraps it successful a BrowserContext configured to hold for network‑idle page loads, visually item elements during interactions, and prevention a signaling of each convention nether ./recordings. It past returns some nan browser and its ready‑to‑use discourse for your agent’s tasks.

async def agent_loop(llm, browser_context, query, initial_url=None): initial_actions = [{"open_tab": {"url": initial_url}}] if initial_url other None supplier = Agent( task=query, llm=llm, browser_context=browser_context, use_vision=True, generate_gif=False, initial_actions=initial_actions, ) consequence = await agent.run() return result.final_result() if consequence other None

This async helper encapsulates 1 “think‐and‐browse” cycle: it spins up an Agent configured pinch your LLM, nan browser context, and optional first URL tab, leverages imagination erstwhile available, and disables GIF recording. Once you telephone agent_loop, it runs nan supplier done its steps and returns nan agent’s last consequence (or None if thing is produced).

async def main(): raw_key = getpass("Enter your GEMINI_API_KEY: ") os.environ["GEMINI_API_KEY"] = raw_key api_key = SecretStr(raw_key) model_name = "gemini-2.5-flash-preview-04-17" llm = ChatGoogleGenerativeAI(model=model_name, api_key=api_key) browser, discourse = await setup_browser(headless=True) try: while True: query = input("\nEnter punctual (or time off blank to exit): ").strip() if not query: break url = input("Optional URL to unfastened first (or blank to skip): ").strip() aliases None print("\n🤖 Running agent…") reply = await agent_loop(llm, context, query, initial_url=url) print("\n📊 Search Results\n" + "-"*40) print(answer aliases "No results found") print("-"*40) finally: print("Closing browser…") await browser.close() await main()

Finally, this main coroutine drives nan full Colab session: it securely prompts for your Gemini API cardinal (using getpass and SecretStr), sets up nan ChatGoogleGenerativeAI LLM and a headless Playwright browser context, past enters an interactive loop wherever it sounds your natural‑language prompts (and optional commencement URL), invokes nan agent_loop to execute nan browser‑driven AI task, prints nan results, and yet ensures nan browser closes cleanly.

In conclusion, by pursuing this guide, you now person a reproducible Colab template that integrates browser automation, LLM reasoning, and unafraid credential guidance into a azygous cohesive pipeline. Whether you’re scraping real‑time marketplace data, summarizing news articles, aliases automating reporting tasks, nan operation of Playwright, browser_use, and LangChain’s Gemini interface provides a elastic instauration for your adjacent AI‑powered project. Feel free to widen nan agent’s capabilities, re‑enable GIF recording, adhd civilization navigation steps, aliases switch successful different LLM backends to tailor nan workflow precisely to your investigation aliases accumulation needs.


Here is nan Colab Notebook. Also, don’t hide to travel america on Twitter and subordinate our Telegram Channel and LinkedIn Group. Don’t Forget to subordinate our 90k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference connected AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 p.m. PST) + Hands connected Workshop

Asif Razzaq is nan CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing nan imaginable of Artificial Intelligence for societal good. His astir caller endeavor is nan motorboat of an Artificial Intelligence Media Platform, Marktechpost, which stands retired for its in-depth sum of instrumentality learning and heavy learning news that is some technically sound and easy understandable by a wide audience. The level boasts of complete 2 cardinal monthly views, illustrating its fame among audiences.

More