Web Crawling and Screen Scraping
As the importance and value of big data continues to rise, so does the number of companies using web crawling services (or “spiders”) to obtain such data. Companies use spiders for screen scraping websites for information and data which is copied or extracted by the spider for the company to then analyse or publish on its own website.
This practice is understandably divisive since the website owners who are victim to the scraping do not want their content to be taken and used without their consent, whilst the companies undertaking the scraping argue that they should be free to make use of the information which is already in the public domain.
The best examples of screen scraping are price comparison sites, such as airline flight comparison sites. The comparison site uses a spider to scan the websites of the different airlines. The data scraped from those websites is then compiled on the comparison site, providing consumers with a very handy tool.
Legal Status of Screen Scraping
The legal status of scraping is not a simple area of the law. Whilst there is no specific law prohibiting scraping, in recent years two prominent cases have delivered differing verdicts on the matter. In both cases the decisions (as to whether the technique of screen scraping was against the law) hinged on whether: (i) intellectual property rights subsisted in the data which was mined, (ii) whether the scraping was an infringement of those rights and (iii) whether it is possible for the website owner to limit the re-use of the data through the use of T&Cs.
In Ryanair Ltd v PR Aviation BV [2015] the Court of Justice of the European Union (“CJEU”) held that no intellectual property rights subsisted in the scraped data (Ryanair’s database of flight times and prices) and therefore the company scraping the data had not infringed Ryanair’s IP. This was because the database was not the result of the requisite creative input necessary to be afforded copyright protection.
The CJEU made it clear, however, that it is possible for a website owner to restrict the re-use of the mined data through its terms and conditions. This is therefore something that companies should bear in mind – if they access a website and consent to the terms of use which contain a restriction on the re-use of the website data, if they do go on to re-use that data they may be liable for breach of contract.
The Supreme Court took a different approach to the CJEU in NLA v Meltwater [2013] where it was held that Meltwater’s use of news headlines which it had scraped from news website as links to the relevant news articles was enough to amount to copyright infringement, because unlike the database of flight times, the news headlines did require a certain amount of creative input. Meltwater subsequently went on to obtain the express consent of the NLA to mitigate its losses.
As explained above there is no specific law against scraping or using publicly available information which has been obtained through the use of scraping techniques, however, the owner of the website may have a claim against the user if the scraping and subsequent use of the information infringes the website owner’s intellectual property rights, or if the user is in breach of any terms and conditions of website use.
Most Common IP Rights
The most common IP rights which may be held to subsist in such information is copyright and database rights. In terms of copyright, copyright protection is afforded to original works and is intended to prevent copying. As demonstrated above in the two cases, whether the content which is scraped is protected by copyright will depend on the facts of the case – to what extent is the data the result of creative input and therefore protected by copyright, and how much is being copied?
For example, if significant portions (“a substantial copy”) of text from a blog (i.e. creative material) are being scraped then this may well amount to copyright infringement.
Furthermore, it is possible for website owners to prohibit companies from scraping information from their sites through the use of contractual restrictions. If the user agrees to a website’s terms of use which includes a restriction on scraping and using the publicly available information on their website but the user decides to go ahead and scrape and use that information, the website owner may be able to claim against the user for breach of contract.
Another consideration when screen scraping is data protection. If the information being gathered contains personal data (which may not be as obvious as a name and address, but rather a username or email address), the user will need to ensure that they are compliant with data protection legislation.
Currently the relevant law in the UK in relation to data protection is the Data Protection Act 1998 (“DPA”). In May 2018 the new European General Data Protection Regulation (“GDPR”) comes into force and despite Brexit, it will become law in the UK. GDPR is far more extensive than the DPA and the penalties for non-compliance are far greater.
Under the GDPR, the individual who the personal data relates to must give their consent to the processing of their data. That means consent must be freely given, relates to a specific purpose, is informed and unambiguous. It is hard to find examples of how scraping data, which includes personal data, without the individual’s consent could fall within the law.
Another consideration is the management of the mined data. Managing big data is becoming an increasingly popular discussion point given the significant increase in its use and value. The Information Commissioner’s Office has published some guidance for organisations who handle big data:
- Anonymisation – although it may not be possible to fully anonymise data, companies should try to mitigate the risk of re-identification to the point where the chance is extremely remote.
- Privacy impact assessments – companies should carry out such assessments before processing data to assess how its use of the data is likely to impact on the individuals whose data is being analysed.
Web Crawler Summary
In summary, there is no specific piece of legislation which restricts the use of a web crawler to gather information. The website owners, however, may have legal rights against the company under intellectual property law and contract law.
Each case will turn on its own facts though and this is very much dependent upon what information is scraped from the websites. Companies should beware of contractual provisions which they have agreed to in respect of a website’s terms of use – these may prohibit the user from taking and using the data off the site.
If the data being scraped includes personal data, then compliance with data protection law must also be borne in mind.
The only way to be truly certain that the rights of a website owner have not been infringed is to obtain their express consent to the screen scraping and subsequent use of the information.