Check out the 2020 Videos.  It was fun.

Below is all the recordings from the day.  Sign up and get full access!

Mikhail
Hidden Scrapy Features That You Need To Know
Mikhail unlocks his secret treasure chest of Scrapy secrets and shares them with the masses.  Tune in an learn all about the cool features that you never knew existed.
Presented by Mikhail Korobov - Head of Development (Automatic Extraction)

Bio

For the last 7 years, Mikhail has been developing smart web crawlers He is a Scrapy, and NLTK team member, and an open-source enthusiast.
Panel: Legal Compliance In The World Of Web Scraping
The lack of clear legal guidance in the Web Scraping industry means you have to be extra cautious in the manner and the type of data you scrape. 
In this panel, Head of Legal at Zyte, Sanaea Daruwalla brings together a panel of legal experts in the field of data extraction to discuss the various aspects of web scraping compliance and updates in the legal landscape.

Host: Sanaea Daruwalla - Head of Legal at Zyte
Panelists: Sarah McKenna - CEO of Sequentum
                  Marc Zwillinger - Founder of ZwillGen PLLC
                  Paul Griffin - CEO of First Compliance
Pierluigi Vinciguerra
Running a Business on Web Scraped Data
Every day we hear sentences like "Data is the new oil" or "Web data is a gold mine" and that's definitely true. 
In this talk, we will see how establishing a business based on web scraped data has so much in common with the old traditional mining companies. Pierluigi will cover the processes, tasks, operators and tools needed to run a reliable and modern company. Find out how web scraping can help avoid the many obstacles faced while running a successful business.

Presented by Pierluigi Vinciguerra - CTO and Co-Founder of Re Analytics

Bio

Pierluigi Vinciguerra is the CTO and Co-Founder of Re Analytics, a data boutique for consumer and luxury goods. With 10+ years of experience in business Intelligence, web data integration and scraping, Pierluigi is an expert in data management. The team at Re Analytics crawls 1+ Billion price points every month to extract valuable insights for investors and C-level executives in Consumer and Luxury goods.
Utilizing The Scrapy Cloud API For A Seamless Data Pipeline
In this talk, Johnel will talk about how the team at Prospel utilized the Scrapy Cloud API to create an automated process from initial scraping to clean data. Basically, automating the E and T in the ETL process!

Presented by Johnel Bacani, Data Specialist at Prosple

Bio

Johnel Bacani is a Data Specialist at Prosple. He designs and manages various data pipelines. He loves Python and he's been using it since 2013 for both work and play.
ondra
Web Scraping Tech Stack For 2020
As the web evolved from static sites to complex JavaScript applications, even the techniques and tools needed to scrape it have changed. From plain HTTP requests to robotized browsers - this talk will show you all the tricks you need to extract data from the modern web reliably and scalably.

Presented by Ondra Urban, Technical Web Scraping Expert at Apify

Bio

Ondra is a hacker of the browser age. He extracts terabytes of publicly available data and translates them to a language that machines can understand. At Apify, Ondra leads a team of fellow hackers who grow their open source projects, break anti-scraping walls, and dabble in AI.
amanda
TellFinder Alliance: Tackling Online Exploitation with Data
Last year at Extract Summit, Amanda Towler and David Schroh talked about their five-year project to build a bleeding-edge data collection and extraction pipeline to fight human trafficking.
This year, Amanda would like to expand on this topic to discuss how the team pivots their data pipeline to tackle a broader array of online exploitation, how having a solid foundation makes this a tractable, efficient process, and the impacts they can have on the world.

Presented by Amanda Towler, Co-Founder & Principal Investigator at 
Hyperion Gray, LLC

Bio

Amanda is the Co-Founder and Principal Investigator at Hyperion Gray, LLC, a technology R&D small business working primarily with the Defense Advanced Research Projects Agency (DARPA). 
She has a decade of experience spanning OSINT, offensive security, data science, and software development. She has consulted with law enforcement on several high profile dark web child exploitation cases.
ivan
Introducing AutoCrawl - The AI-Powered Crawler
AI is disrupting the ecosystem, altering every single process with new machine-learning powered approaches.
In this talk, Iván will show how this impacts the world of data crawling by introducing AutoCrawl, an AI-powered crawler capable of gathering data from websites automatically.

Presented by Iván de Prado Alonso, Data Scientist at Zyte

Bio

Iván is a Data Scientist at Zyte who loves Deep Learning and Computer Vision. He has 10+ years of experience working for and with startups, dealing with the greatest technical challenges at each.
victor
Separating Extraction From Crawling Logic With Web-Poet
What are Web-poet and Scrapy-poet projects? How do they work and how could they be helpful? In this talk, Victor will take you through the state of development, the foreseeable future, and their relation with AutoExtract and AutoCrawl projects.

Presented by Victor Torres, Web Scraping, Python and Scrapy Guru

Bio

Victor Torres is Full-stack developer with 5+ years of experience leading agile teams and building web applications. He currently works with Python and web scraping at Zyte.
Attila
Panel: Cutting Edge Ways To Tackle Antibot Challenges
Extracting web data at scale can provide huge value. But with scaling up, there is often an obstacle standing between you and the data, preventing easy access: antibots.
Antibots introduce a big and important challenge to solve for anyone who wants to scrape the web at scale. If you don’t have a reliable way to solve these challenges created by antibots, you will not be able to access any data.
In this session, the panel of antibot experts at Zyte will aim to dissect this problem and look at the possible solutions.

Host: Attila Toth, Technology Evangelist at Zyte
Panelists: Akshay Philar,  Head of Development at Zyte
                  Tomas Rinke, Team lead at Zyte
                  Peng-Yu Chen, Developer at Zyte
Overcoming Price Variations On The Day: In Search of Real-time Pricing
Offering a platform that can deal with B2C and B2B simultaneously is a challenge. The ever-changing and volatile market in Latin America presents even more challenges with the products going through price variations multiple times a day. This makes real-time information processing extremely difficult. 
To overcome these challenges and be able to reach both B2B and B2C markets, Alfonso will show how Prixtips has implemented technological improvements and developed various solutions that integrate databases, scraping, machine learning and networking.

Presented by Alfonso de la Guarda, CTO of Aputek and Technology Architect at Veo365.com and Prix.tips

Bio

Alfonso de la Guarda, the CTO of Aputek and a Technology Architect at Veo365.com and Prix.tips, is an old-school hacker. He collaborates and oversees projects in strategic areas such as: mining, defense and health.
DataOps and The Culture You Need If You Want To Stay Sane
Data-driven companies are in their nascent stage and most of them lack proper culture and methodology. A modern business consists of constant data updates with multidisciplinary teams of engineers, scientists and business people all working together. The customers demand complex answers within hours or minutes. The growing demands of an expanding business requires a streamlined and scalable culture in order to stay sane.

José will explain how DataOps is the perfect solution for a data company like this. Borrowing the concept from the automation spirit of DevOps applied to data delivery. In this talk he will share the lessons learned during 3 years of constant evolution and improvements in our software, data processes and culture.

Presented by José Manuel Navarro, CTO of urbanData Analytics

Bio

José Manuel is the CTO of urbanData Analytics. He’s leading all technical teams to make the most of each individual, developing the data-driven culture and coding all kinds of software. Prior to joining uDA, he was the global Lead for Mobile & API Products at Liferay.
Record Matching & Classification of your Web Data with AI
Wes Shepherd is a CEO and serial tech entrepreneur with twenty plus years of SaaS, E-commerce & Data Services experience. He founded Channel IQ, the leading provider of e-commerce intelligence to brands and has a proven history of building an eight-figure revenue software business.
Presented by Wesley Shepard, CEO of Unifyd Insights
Trials & Tribulations in Crawling the Darkweb
The dark web is a part of the web where most people have the sense never to venture. Populated with a wide range of criminal actors, the technology is now used to provide anonymity and a degree of impunity for the sale of their wares and services. Searchlight provides tools for law enforcement and private entities looking to investigate and track criminal activity.  Gareth and David will take you through some of our amusing journeys playing cat and mouse games with criminal kingpins over the years as they attempt to build clever defeat mechanisms for crawlers and other data collection capabilities. 


Presented by Gareth Owenson & David Andreas, Searchlight Security

Bio

Dr Owenson is a leading expert on darknets with a 15 year track record in academia conducting cutting edge research in darknets, cryptography and distributed systems. He has conducted some of the most influential studies into the use of darknets and now regularly works with Governments and Law Enforcement to develop technical capabilities.

David Andreas works as lead developer for the data engineering team at Searchlight Security. He has a large amount of experience in the world of reverse engineering, malware analysis and exploit development but also enjoys the challenges of bypassing darknet anti crawl measures.

Web Scraping for Real-Time Surveillance of Food Security Policies During COVID-19
At the start of the pandemic, it quickly became clear that public health professionals and policymakers would have to address the threat of growing hunger and food insecurity. Understanding the range of policy options that were available to federal and state officials to increase access to food through the Supplemental Nutrition Assistance Program was critical.

Presented by Nicky Tettamanti, covidsnap.org

Bio

Nicky Tettamanti is an epidemiology graduate student at Columbia University Mailman School of Public Health. She is interested in how the social world determines health outcomes. Specifically, she studies the relationship between gender, sexuality, occupation, and legal structures with different health outcomes. She is the editor-in-chief of Intervene Upstream, a public health publication for graduate students, and was a team lead for the COVID SNAP Project.
How intelligence Services Use Open-Source Data and Why Data Fusion is Key to Their Success

Presented by Michael McCracken (Senior Intelligence Analyst)

Web Data Extraction Summit is organised by web scraping experts, Zyte.
Zyte delivers world class web data extraction products and services.
© Web Data Extraction Summit 2021

Read the code of conduct
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram