Articles In Category:
I'd like to share the favorite things I wrote in a year of professional web scraping blogging at ScrapFly and my key takeaways.
Asynchronous programming can speed up web scrapers astronomically and is by far the most important scaling step when dealing with big projects. What exactly is it, how does it work, and what are the best ways to take advantage of it?
To efficiently scrape a web resource, understanding how it works and functions is often a vital step. Reverse engineering a website behavior is often first step when developing a web-scraper - let's take a look how!
Target discovery in web-scraping is how the scraper explores target website to find scraping targets. For example to scrape product data of an e-commerce website we would need to find urls to each individual product. This step is called "discovery". What types of discovery methods are there?
The most common web scraping target discovery technique: recursive crawling. How does it work? What are the pros and cons and the most optimal execution patterns?