Data Scraping - Guidelines from Dutch Data Protection Authority

Recent guidelines from the Dutch Data Protection Authority (“Dutch DPA“) state that in most situations, it may be challenging to find a suitable legal basis under the GDPR for the practice of general data scraping, which includes collection of personal data by private entities.

Data scraping (including web crawling) is a process that automatically extracts large amounts of information from websites or other online sources. It is a common practice in various industries and applications. For instance, scraped data is used to expand sales and marketing databases. Scraped data can also be used for training of artificial intelligence algorithms, which has led to a number of lawsuits over various legal issues, including lawsuits around violation of intellectual property rights.

In terms of privacy, it is important to note that, although the scraped data may be publicly available (online), it is still subject to the GDPR. Consequently, such data collection and use requires the establishment of a valid legal basis and notice must be provided to relevant individuals. Typically, companies collecting personal data through data scraping will rely on legitimate interest as the legal basis.

When relying on legitimate interest, the data controller must (i) demonstrate a legitimate interest for the controller or a third party, (ii) verify that collecting the data is necessary to achieve this interest, and (iii) ensure that its interest does not override the data subject’s rights and freedoms. In the context of data scraping, the Dutch DPA has issued the following guidelines:

Only interests protected by law will be accepted as legitimate interests. This narrows the scope of the legal basis compared to previous practices. Additionally, the guidelines have specified that interests that are solely commercial will not override the data subject’s rights and freedoms and will therefore not be considered legitimate.
Several factors must be considered when assessing the necessity of the personal data collection, such as evaluating the scope and nature of the processing, including the types of personal data collected, the number of sources of collection, and the duration of scraping. Broad parameters increase the risk of privacy violations.
The guidance recommends that companies (i) determine whether the data subject or a third party published the personal data; (ii) evaluate the data subject’s expectations; (iii) review the terms of the website from which the personal data is scraped; and (iv) assess the consequences of the data scraping for the data subject.

Legitimate interest may serve as a valid legal basis in limited, specific scenarios, such as scraping news sites for relevant industry or organization, or analyzing feedback on the controller’s own website (such as reviews), collecting data from public sources for security purposes of the relevant organization. However, legitimate interest is unlikely to be valid when using the scraped personal data to create profiles for sales, scraping non-public social media accounts, groups, or forums, or using social media data to assess the data subject’s eligibility for insurance purchases.

Furthermore, scraping activities must align with all additional GDPR requirements, such as the requirement to provide notice, minimize data processing, and ensure data accuracy. Using scraped data for AI training purposes introduces further risks, such as discrimination and the use of misleading or incorrect data.

While Europe tends to adopt a restrictive stance on data scraping due to privacy considerations, recent case law in the US points to a more lenient approach to the practice generally, although such US case law does not address privacy concerns.

Companies engaging in data scraping that includes collection of personal data should, in addition to other considerations (such as intellectual property infringement), carefully consider the legal basis for their activities, as well as the potential risks and consequences for data subjects. When the processing activities are likely to present a high risk to individuals, which is for example probable in large-scale data scraping and processing, a Data Protection Impact Assessment is required to identify and mitigate potential risks. Additionally, companies should closely monitor developments in this area in all relevant jurisdictions, as regulations and guidelines may continue to evolve and impact their practices.