To Data & Beyond

To Data & Beyond

Share this post

To Data & Beyond
To Data & Beyond
Building Agents with LangGraph Course #4: Agentic Web Search

Building Agents with LangGraph Course #4: Agentic Web Search

Why Your AI Needs Agentic Search for Real-Time Data?

Youssef Hosni's avatar
Youssef Hosni
Aug 07, 2025
∙ Paid
22

Share this post

To Data & Beyond
To Data & Beyond
Building Agents with LangGraph Course #4: Agentic Web Search
3
Share

Get 50% off for 1 year

LLMs are trained on vast but static datasets; however, they can’t tell you the score of last night’s game, the current weather, or the latest news. To perform relevant, real-world tasks, they need to access live information. The most common way to do this is through a web search.

However, not all search methods are created equal. The way a human browses the web is fundamentally different from how an AI agent needs to process information. This distinction is at the heart of Agentic Search.

In this guide, we’ll explore the difference between a traditional search-and-scrape approach and a modern agentic search workflow. We’ll use a simple, practical example: asking an AI for the weather in San Francisco and whether it’s a good day to travel.


Get All My Books, One Button Away With 40% Off

Youssef Hosni
·
Jun 17
Get All My Books, One Button Away With 40% Off

I have created a bundle for my books and roadmaps, so you can buy everything with just one button and for 40% less than the original price. The bundle features 8 eBooks, including:

Read full story

1. Regular Search and Web Scraping

Let’s first try to solve our weather problem like a basic program might: find a relevant webpage and scrape it for information.

Step 1: Find a Relevant URL

First, we need to perform a web search to find a suitable weather website. We can use a library like duckduckgo-search to submit our query and get a list of URLs.

import requests
from bs4 import BeautifulSoup
from duckduckgo_search import DDGS
import re
import time
from urllib.parse import urljoin, urlparse

# Define the city and our query
city = "San Francisco"
query = f"current weather {city} site:weather.com"

# Define a search function to get URLs
def search(query, max_results=6):
    """
    Search for URLs using DuckDuckGo

    Args:
        query (str): Search query
        max_results (int): Maximum number of results to return

    Returns:
        list: List of URLs
    """
    try:
        with DDGS() as ddgs:
            results = list(ddgs.text(query, max_results=max_results))

        # Extract URLs and filter for valid ones
        urls = []
        for result in results:
            url = result.get("href") or result.get("link")
            if url and is_valid_url(url):
                urls.append(url)

        return urls

    except Exception as e:
        print(f"Search error occurred: {e}")
        # Fallback URLs
        return get_fallback_urls()

def is_valid_url(url):
    """Check if URL is valid and accessible"""
    try:
        parsed = urlparse(url)
        return bool(parsed.netloc) and bool(parsed.scheme)
    except:
        return False

def get_fallback_urls():
    """Return fallback URLs when search fails"""
    return [
        "https://weather.com/weather/today/l/USCA0987:1:US",
        "https://weather.com/weather/hourbyhour/l/54f9d8baac32496f6b5497b4bf7a277c3e2e6cc5625de69680e6169e7e38e9a8",
        "https://weather.com/weather/tenday/l/USCA0987:1:US"
    ]

def fetch_weather_content(url, timeout=10):
    """
    Fetch and parse weather content from a URL

    Args:
        url (str): URL to fetch
        timeout (int): Request timeout in seconds

    Returns:
        dict: Parsed weather information
    """
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        }

        response = requests.get(url, headers=headers, timeout=timeout)
        response.raise_for_status()

        soup = BeautifulSoup(response.content, 'html.parser')

        # Extract weather information (this would need to be customized based on the site structure)
        weather_data = {
            'url': url,
            'title': soup.title.string if soup.title else 'No title',
            'temperature': extract_temperature(soup),
            'conditions': extract_conditions(soup),
            'raw_text': soup.get_text()[:500] + '...' if len(soup.get_text()) > 500 else soup.get_text()
        }

        return weather_data

    except requests.exceptions.RequestException as e:
        print(f"Error fetching {url}: {e}")
        return {'url': url, 'error': str(e)}
    except Exception as e:
        print(f"Error parsing {url}: {e}")
        return {'url': url, 'error': str(e)}

def extract_temperature(soup):
    """Extract temperature from parsed HTML"""
    # Common temperature selectors for weather sites
    temp_selectors = [
        '[data-testid="TemperatureValue"]',
        '.temp',
        '.temperature',
        '.current-temp',
        '[class*="temp"]'
    ]

    for selector in temp_selectors:
        temp_elem = soup.select_one(selector)
        if temp_elem:
            temp_text = temp_elem.get_text().strip()
            # Extract number and degree symbol
            temp_match = re.search(r'(-?\d+)°?[FC]?', temp_text)
            if temp_match:
                return temp_match.group(0)

    return "Temperature not found"

def extract_conditions(soup):
    """Extract weather conditions from parsed HTML"""
    condition_selectors = [
        '[data-testid="WeatherConditionsPhrase"]',
        '.condition',
        '.weather-condition',
        '.current-condition',
        '[class*="condition"]'
    ]

    for selector in condition_selectors:
        condition_elem = soup.select_one(selector)
        if condition_elem:
            return condition_elem.get_text().strip()

    return "Conditions not found"

def should_travel_today(weather_data):
    """
    Simple logic to determine if conditions are good for travel

    Args:
        weather_data (dict): Weather information

    Returns:
        str: Travel recommendation
    """
    conditions = weather_data.get('conditions', '').lower()
    temperature = weather_data.get('temperature', '')

    # Extract numeric temperature
    temp_match = re.search(r'(-?\d+)', temperature)
    temp_num = int(temp_match.group(1)) if temp_match else None

    bad_conditions = ['storm', 'heavy rain', 'snow', 'blizzard', 'tornado', 'hurricane']

    if any(bad in conditions for bad in bad_conditions):
        return "❌ Not recommended - severe weather conditions"
    elif temp_num and (temp_num < 32 or temp_num > 95):
        return f"⚠️ Use caution - extreme temperature ({temperature})"
    elif 'rain' in conditions or 'shower' in conditions:
        return "🌧️ Pack an umbrella - rainy conditions"
    else:
        return "✅ Good conditions for travel"

def main():
    """Main function to run the weather search and analysis"""
    print(f"🔍 Searching for weather information in {city}...")
    print(f"Query: {query}\n")

    # Get URLs
    urls = search(query)
    print(f"Found {len(urls)} URLs:")
    for i, url in enumerate(urls, 1):
        print(f"{i}. {url}")

    print("\n" + "="*60 + "\n")

    # Fetch and analyze weather data
    weather_results = []
    for url in urls[:3]:  # Limit to first 3 URLs
        print(f"📊 Analyzing: {url}")
        weather_data = fetch_weather_content(url)
        weather_results.append(weather_data)

        if 'error' not in weather_data:
            print(f"Temperature: {weather_data.get('temperature', 'N/A')}")
            print(f"Conditions: {weather_data.get('conditions', 'N/A')}")
            print(f"Travel recommendation: {should_travel_today(weather_data)}")
        else:
            print(f"❌ Error: {weather_data['error']}")

        print("-" * 40)
        time.sleep(1)  # Be respectful to the server

    return weather_results

# Run the updated code
if __name__ == "__main__":
    results = main()

🔍 Searching for weather information in San Francisco…
Query: current weather San Francisco site:weather.com

Found 6 URLs:
1. https://zhidao.baidu.com/question/635787571195459284.html
2. https://www.current-news.co.uk/
3. https://zhidao.baidu.com/question/434865041797516004.html
4. https://zhidao.baidu.com/question/2279388287709892628.html
5. https://zhidao.baidu.com/question/572878914313099124.html
6. https://zhidao.baidu.com/question/2022402646620356908.html

This gives us a promising list of URLs. For an agent, the job has just begun. It now has to visit these pages and find the actual data. Now, we’ll take the first URL, download its HTML content using requests, and parse it with BeautifulSoup. This process, known as web scraping, allows us to extract information directly from the webpage’s structure.

Get 50% off for 1 year

# Take the first URL from our search results
url = "https://www.current-news.co.uk/"

# Scrape the webpage's content
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

# Let's try to extract the text from key HTML tags
weather_data = []
for tag in soup.find_all(['h1', 'h2', 'h3', 'p']):
    text = tag.get_text(" ", strip=True)
    weather_data.append(text)

# Join the text and clean up whitespace
weather_data = "\n".join(weather_data)
weather_data = re.sub(r'\s+', ' ', weather_data)

print(f"Website: {url}\n")
print(weather_data)

Website:

https://www.current-news.co.uk/

Latest Articles SolarEdge partners with Schaeffler for EV infrastructure rollout SMMT: BEV registrations rise in July despite overall car market decline ‘Fastest to date’: char.gy adds 300 chargepoints in Barnet in six weeks Fuse Energy partners with Easee for UK tariff, announces $10 million financing Next steps for UK government EV spending Next steps for UK government EV spending Speaking to EV Infrastructure News, Jon Evans, head of UK and IE market at charging platform Monta, says the government’s £63 million infrastructure investment “hits crucial infrastructure gaps”. Aug 6, 2025 Aug 5, 2025 Featured Article How vehicle-to-grid technology works: MathWorks on managing energy flow Graham Dudgeon, consultant product manager for electrical technology at engineering software provider MathWorks, speaks to EV Infrastructure News about the game-changing technology. Latest News Industry voices Event EV Infrastructure & Energy Summit 1 October — 2 October 2025 / London, UK This Summit is your comprehensive guide to navigating the complexities of EV charging and energy systems essential for driving the EV transition forward. As always, our Summit gathers the world’s leading EV experts in London to share the latest insights and case studies with a diverse audience including charge point operators, installers, manufacturers, fleet owners, local authorities, utilities, DNOs, PV and energy storage suppliers, and destination charging locations. Prepare to be inspired by our carefully curated line-up of speakers offering invaluable insights tailored to the UK market. 1 October — 2 October 2025 / London, UK This Summit is your comprehensive guide to navigating the complexities of EV charging and energy systems essential for driving the EV transition forward. As always, our Summit gathers the world’s leading EV experts in London to share the latest insights and case studies with a diverse audience including charge point operators, installers, manufacturers, fleet owners, local authorities, utilities, DNOs, PV and energy storage suppliers, and destination charging locations. Prepare to be inspired by our carefully curated line-up of speakers offering invaluable insights tailored to the UK market. Blogs Aug 5, 2025 Jul 29, 2025 Jul 25, 2025 Webinars Jun 12, 2025 Oct 21, 2024 Mar 12, 2024 Publications Oct 14, 2024 Oct 3, 2023 Oct 1, 2023 Speaking to EV Infrastructure News, Jon Evans, head of UK and IE market at charging platform Monta, says the government’s £63 million infrastructure investment “hits crucial infrastructure gaps”. Graham Dudgeon, consultant product manager for electrical technology at engineering software provider MathWorks, speaks to EV Infrastructure News about the game-changing technology. In this blog, we explore the UK’s booming used EV market and ask if it could be the key to encouraging wider uptake across the EV sector. Industry interviews and insights from the EV Infrastructure News team. Events 1–2 October 2025 | Hilton London Metropole London, UK 2 October 2025 | Hilton London Metropole London, UK Videos Now playing Now playing Copyright © 2025 All rights reserved. Informa Markets, a trading division of Informa PLC.

The output is much cleaner than raw HTML, but it’s still just a block of unstructured text. An AI would have to perform complex natural language processing to find the exact temperature, humidity, and wind speed. This process is brittle; if the website changes its layout, our scraper breaks. The data is not in a format that a machine can easily and reliably use.


2. Agentic Search

This is where agentic search tools, like Tavily, change the game. Agentic search is designed specifically for AI. Instead of just returning a list of links, it understands the query, scours multiple sources, and returns a structured, data-first answer.

Let’s ask the exact same query using an agentic search client.

Get 50% off for 1 year

Keep reading with a 7-day free trial

Subscribe to To Data & Beyond to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Youssef Hosni
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share