Québec magazines and newspapers

“Part of the digital collection of Bibliothèque et Archives nationales du Québec (BAnQ), the subset “Revues et journaux québécois” is particularly rich. From BAnQ’s patrimonial collections, these magazines and newspapers bear witness to the daily, cultural, political, economic and scientific life of Quebec.” (Description taken from the data provider website)

Visit the data provider website for more information.

General Information

Added to the Research Data: August 2017

Last update: 2024

Update periodicity: No update plans at the moment

Available formats: JPG, PDF, TXT, TIF, XLS

Documents’ assets available? No

Documents’ metadata available? No metadata available

Data size: 18.4Tb

Number of files: 4,627,040

Copyright: The data and all related documentation are subject to copyright. Please consult the data provider website for more details.

Data Subject Area: Various subjects. Concerns mainly the geographical area of Quebec

The Data in Graphs

../_images/overview_banq.png

All content available in “BAnQ - Revues et journaux québécois” dataset is under proprietary formats with no metadata available.

Available formats in “BAnQ - Revues et journaux québécois”

../_images/document_type_banq.png

The Data Structure

The content of the BAnQ data is not structured in a completely homogeneous way and varies from one journal/newspaper to another. For one journal/newspaper it may vary over time. Although it may vary for each journal/newspaper, note that the content of this dataset should be structured by journal/newspaper, then by year, month and issue. At the issue level, it is very variable: it can vary from one file per issue to one file per page.

PDF Availability

Some documents are available in PDF format. The other files are images (mostly .jpg).

Full-text availability

Files in .TXT format, obtained by optical character recognition (OCR), are available for some magazines/newspapers, allowing a full-text search.

There is an folder called “OCR_corpus_data” in the dataset, which contains the .tsv files with the full-text of the documents. Not all documents have a co-related full-text in the tsv files.

The structure of the tsv files is as follows:

  • Column 1: file - The column file has the path to the file in the dataset

  • Column 2: page - The column page has the page number of the document

  • Column 3: text - The column text has the full-text of the page

Metadata Availability

There are no metadata available for the “BAnQ - Revues et journaux québécois”.

Document’s Bibliographic References

There are no references available for “BAnQ - Revues et journaux québécois”.