JISC UK Web Domain Dataset Geoindex. 1996 – 2010. TSV.

The dataset comprises ~2.5 billion 200 OK responses in the 1996 – 2010 tranche of the JISC UK Web Domain Dataset Dataset which have been scanned for geographic references – specifically postcodes. This set of postcode citations, found at particular URLs and crawled at particular times, forms an historical geoindex of the UK web. In partnership with the Internet Archive and JISC, UKWA had obtained access to the subset of the Internet Archive’s web collection that relates to the UK. The JISC UK Web Domain Dataset (1996 – 2013) contains all of the resources from the Internet Archive that were hosted on domains ending in ‘.uk’, or that are required in order to render those UK pages. The geoindex dataset is composed of c. 700,641,549 lines of Tab Separated Values (TSV) data, each asserting that a given web page, crawled at a given data, contained one or more references to a given postcode. Uncompressed, this is a total of 61 GB of text, and so care should be taken before downloading or attempting to use this data set. For more details about how the data was created, its format, and how to use it, see https://github.com/ukwa/opendata/tree/master/ukwa.ds.2/geo For more information: http://data.webarchive.org.uk/opendata/ukwa.ds.2/geo/

Additional information

UniqueID

503d1871-0c37-4584-ab10-95a344bd0395

BL Dataset Provider

User Access Level

BL Labs Assistance

Contributors

Jackson, Andrew N.

Institution

UK Web Archive

Language

Year Added

Contact Person

British Library Labs

Location

Repository Cloud

Official URL

https://doi.org/10.5259/ukwa.ds.2/geo/1

Is It Being Updated

Any Issues With Access

No

T&C Needed

Rights Assessment

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.