Web Scraping - A Skill or a Crime?

Contact information

PromptCloud Inc, 16192 Coastal Highway, Lewes De 19958, Delaware USA 19958

We are available 24/ 7. Call Now. marketing@promptcloud.com

Abhisek Roy

July 13, 2022
Blog, Web Scraping

Table of Contents show

The success of a business is no more tied just to personnel and workers’ skills. While those are still important, today businesses rely mainly on data. No matter how good your product or service is unless you have customer or client data, to know regarding market trends –

Desired features
Competitive pricing
Product updates

Quality data has unprecedented use cases. Because web scraping is legal, it should come as no surprise that companies collate data from both internal and external sources. Among external sources, a small percentage of data is collected in the form of surveys, while the rest is scraped from the web. The data that is scraped from the web is public data, that anyone can see and copy. However, in case you are trying to copy the same data in an automated fashion, a few rules will apply.

An instance of this is that you can scrape textual content from websites. But in case you are trying to scrape data from too many web pages of a single website and it causes the site to slow down– then it might cause legal problems for you. The difference here is that in the second case, your activities caused availability problems for others using the same website.

Is Data Scraping Legal?

There’s nothing illegal about scraping data, storing the data in databases and then reusing it for business decisions or data analytics and market research. However, if you want to present the same data on your website or application, then you might be bound by different terms and conditions. These conditions would depend on the actual source of data.

For content creators who pick up figures, graphs and images from other articles on the web, a simple attribution might work. In cases where product information or original content has been picked up and a large part of it has been reused, you might need to ask for permission from the website’s owners. Some important factors that one needs to keep in mind when scraping data–

The terms and conditions of a website.
Personal data collection (especially from social media websites).
Intellectual property regulations of the region (since not all countries may have the same rules).

It’s also worth mentioning that there are multiple mentions of web-scraping running on the grey area of law– nothing could be further from the truth. It is regulated and comes with rules, but as seen in multiple court judgements, it’s not illegal!

Recent Judgements- and their Implications

Data related to individuals is now protected by laws such as the GDPR in the EU and the CCPS in California. A Polish court was the first to hand over a judgment in which it fined a data analytics company approximately 221,000 euros for scraping publicly available data of around 6 million people. While personal data may be publicly available, one needs to look at the existing laws and make sure they conform to them. In most cases, this may require a notice being sent to each individual whose data will be collected or used and scraping data only for those who don’t object.
The LinkedIn vs HiQ Labs case is possibly one of the most famous and long-running web scraping litigation. The latter is a data analytics company that consumes public data from LinkedIn profiles. The main judgement by the Ninth Circuit court mentioned that authorization did not apply to public websites. It also pointed out that selectively disallowing companies from accessing publicly available data may be considered unfair competition and have an implication under the antitrust law. This judgement creates a level playing field where all those who are scraping data from similar sources would be measured by the same yardstick.
The of the earliest web scraping lawsuit was one filed by eBay against Bidder’s Edge, an auction listing website. Although eBay won the lawsuit by claiming that the former scraping auction data at frequent intervals from their website was causing damage to their systems and more scrapers may follow suit resulting in heavier loads on their servers, multiple judges later announced that the case may not serve as a precedent in later cases and the same argument might not hold up.

The Dos and the Don’ts

No matter what your end goal is, there are some basic rules and best practices that you can follow when scraping data from the web–

For sites like Twitter where APIs are provided for developers, try to use those instead of scraping the website, even if you need to pay when your API consumption crosses a set limit.
Try to keep a healthy time interval of a few seconds when scraping multiple pages from a single website to avoid overburdening the system.
Make sure you know the rules of the region and have read the terms and conditions of a website before using data scraped from it for commercial purposes.
When scraping data that lies behind a login page, make sure you follow the rules of the concerned website.
Try using the services of a managed DaaS provider like PromptCloud.

Web Scraping is not just used by companies but also by researchers, students, data scientists and almost anyone who is trying to aggregate data. It is usually a precursor to testing machine learning models and algorithms. If the data isn’t reused in its original format, and only the outcomes or trends are used, as it is in most cases, you would rarely face any hurdles. That would hold good as long as the data was obtained correctly. However, using scraped data commercially may be a different ball game. If it is proved that you are using the scraped data without any changes, in the same way, that the original web page showcased it, you may be fined and may have to pay damages to the website that you scraped the data from.