Parsing, Cleaning, and Normalizing Extracted Data
Practical techniques for handling messy real-world HTML. Structuring, deduplicating, and normalizing at scale for AI ingestion.
Workshops are run by Zyte on the first day of each summit. Whilst we make every effort to not deviate, these workshops make change.
60 minute workshops with time for Q&A, plus lightning talks
Practical techniques for handling messy real-world HTML. Structuring, deduplicating, and normalizing at scale for AI ingestion.
How to make headless browsers work for your agentic scraping needs, plus tools and techniques to increase their effectiveness and efficiency.
How detection systems work, how scrapers evade them, and what AI changes about that equation for both sides.
How to effectively self heal your spiders. Selector failure detection, LLM-driven repair loops, and writing updated parsers back to a tested store.
Agentic crawl patterns, tool-use architectures, and the open questions about autonomous data collection.
Its on you now! 5 minutes to share something you've been working on.
The world's largest
web scraping conference.