Powered by

Konstantin Lopukhin

Head of Data Science, Zyte
Talk title:
Web Data Extraction with Deep Learning


Extracting data from websites is a topic that doesn't get as much attention or research as things like identifying objects in pictures or recognizing names in text. 

However, it's a fascinating and rewarding area to delve into. This is because websites can be viewed in so many different forms: as a snapshot of a page, as the text on the page, as the underlying HTML code, and more. This opens up a variety of creative methods for tackling the problem, often involving the combination of different types of data and ways of presenting that data within a single model. The recent emergence of large language models has introduced an entirely new approach to this task.
In this talk, Konstantin will:

• Learn about how we can think of web data extraction as a problem that machine learning can solve, and what sort of information we can use.

• Discover how Chat-GPT, a type of language model, can be used for this purpose, and understand its limitations.

• See how a sophisticated model can evolve from simple beginnings, learn handy techniques from related fields and studies, and review several modern methods for tackling this problem, like architectures based on transformers.

Speaker Bio

Konstantin is the Head of Data Science at Zyte where he leads Machine Learning research and development. He has also participated in Kaggle competitions, achieving a Grandmaster Title, and contributes to the community with talks, sharing code, and knowledge.
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram