Your cart is currently empty!
JISC UK Web Domain Dataset Crawled URL Index. 1996 – 2013. CDX.
The dataset comprises original compound index (CDX) files that have been re-assembled into 18 separate CDX files for each year of crawling activity represented (1996 – 2013). Please note that the individual CDX files are not sorted. In order to enable access to web archives, UKWA uses CDX files to act as indexes so that it is possible to look up which ARC or WARC files contain which URLs and responses. In partnership with the Internet Archive and JISC, UKWA had obtained access to the subset of the Internet Archive’s web collection that relates to the UK. The JISC UK…
Additional information
UniqueID | 3c39a755-5e3d-405b-9944-b13e76a87ad8 |
---|---|
BL Dataset Provider | |
User Access Level | |
BL Labs Assistance | |
Contributors | Jackson, Andrew |
Institution | UKWA Open Data |
Language | |
Contact Person | British Library Labs |
Location | Repository Cloud |
Official URL | |
Is It Being Updated | |
Any Issues With Access | No |
T&C Needed | |
Rights Assessment |
Only logged in customers who have purchased this product may leave a review.
Reviews
There are no reviews yet.