Digitised Books. c. 1510 – c. 1900. JSONL (OCR derived text + metadata)

The dataset comprises metadata and OCR generated text from 49,455 digitised books published between c. 1510 – c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in JSON Lines (JSONL) text format.

Category: Tags: ,

Additional information

UniqueID

7bf6279d-b8b1-45f4-8fe4-a0c06fdba87c

BL Dataset Provider

User Access Level

BL Labs Assistance

Contributors

van Strien, Daniel, and Filipe Bento

Institution

British Library Labs

Language

Year Added

Contact Person

British Library Labs

Location

Repository Cloud

Official URL

https://doi.org/10.23636/r7w6-zy15

Is It Being Updated

Any Issues With Access

No

Files

1700_1799.tar.gz, 1870_1879.tar.gz, 1860_1869.tar.gz, 1890_1899.tar.gz, 1880_1889.tar.gz, 1510_1699.tar.gz, 1800_1809.tar.gz, 1810_1819.tar.gz, 1840_1849.tar.gz, unk.tar.gz, 1820_1829.tar.gz, 1850_1859.tar.gz, 1830_1839.tar.gz

T&C Needed

Rights Assessment

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.