Wikidata : Main Page
Welcome to Wikidata
the free knowledge base with 113,194,393 data items that anyone can edit .
Introduction • Project Chat • Community Portal • Help
Want to help translate? Translate the missing messages .
Wikidata is a free and open knowledge base that can be read and edited by both humans and machines.
Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.
Wikidata also provides support to many other sites and services beyond just Wikimedia projects! The content of Wikidata is available under a free license , exported using standard formats , and can be interlinked to other open data sets on the linked data web.
Learn about Wikidata
- What is Wikidata? Read the Wikidata introduction .
- Explore Wikidata by looking at a featured showcase item for author Douglas Adams (Q42) .
- Get started with Wikidata's SPARQL query service .
Contribute to Wikidata
- Learn to edit Wikidata: follow the tutorials .
- Work with other volunteers on a subject that interests you: join a WikiProject .
- Individuals and organizations can also donate data .
Meet the Wikidata community
- Visit the community portal or attend a Wikidata event .
- Create a user account .
- Talk and ask questions on the Project chat , Telegram groups , or the live IRC chat connect .
Use data from Wikidata
- Learn how you can retrieve and use data from Wikidata .
- 2024-07-10: Wikidata records its 2,200,000,000th edit .
- 2024-07-10: The Wikidata development team held the Q3 Wikidata+Wikibase office hour on July 10th at 16:00 UTC. They presented their work from the past quarter and discussed what's coming next for Q3. Find the session log here .
- 2024-05-07: Wikidata records its 2 31 th edit , the revision IDs not fitting into 32-bit signed integer anymore
- 2024-04-10: The development team at WMDE held the 2024 Q2 Wikidata+Wikibase office hour in the Wikidata Telegram group. You can read session log .
- 2024-04: Wikidata held the Leveling Up Days , an online event focused on learning more about how to contribute to Wikidata from the 5th to 7th and 12th to 14th of April.
More news... ( edit [in English] )
New to the wonderful world of data? Develop and improve your data literacy through content designed to get you up to speed and feeling comfortable with the fundamentals in no time.
- Tropical Storm Jongdari (Q129393829)
- Tropical Storm Shanshan (Q129555713)
- Mike Lynch (Q6833839) (pictured)
- Delta State College of Physical Education, Mosogar (Q107478063)
- Rabea Rogge (Q128996418)
- It Ends With Us (Q118641054)
- Christopher J. Morvillo (Q129401605)
Innovative applications and contributions from the Wikidata community
Featured WikiProject: WikiProject Music
Wikiproject Music is home to editors that help add data about artists, music releases, tracks, awards, and performances! Additionally, importing from and linking Wikidata with the many music databases and streaming services is another focus of the project. Read about our data model on our project page and come chat with us on Telegram.
- Check out Wikidata:Tools for some of our best tools and gadgets for using and exploring Wikidata.
Know of an interesting project or research conducted using Wikidata? You can nominate content to be featured on the Main page here !
- Wikidata mailing list
- Wikidata technical mailing list
- Discussion requests for specific topics
- Facebook , Mastodon , X/Twitter
- Leave a message at project chat
- Telegram General Chat , Telegram Help or on IRC connect
- Report a technical problem
- Keep up-to-date: Weekly summaries
Navigation menu
- Create account
- Contributions
Various places that have Wikimedia datasets, and tools for working with them.
Also, you can now store table and maps data using Commons Datasets, and use them from all wikis from Lua and Graphs.
Dataset Description | URL | Last Updated |
---|---|---|
Official Wikipedia database dumps | Present | |
exposes semantics of content in fully rendered , and is available for various languages and projects: , , ..., , dewikibooks, ... The prefix pattern is the wikimedia database name. include VE, Flow, Kiwix and Google. Parsoid also supports the conversion of (possibly modified) HTML back to wikitext without introducing dirty diffs. | Dead | |
Taxobox - Wikipedia Infoboxes with Taxonomic information on Animal Species | Dead | |
Wikipedia³ is a conversion of the English Wikipedia into RDF. It's a monthly updated dataset containing around 47 million triples | Dead | |
DBpedia Facts extracted from Wikipedia info boxes and link structure in RDF format(Auer et al.,2007) | 2019 | |
Multiple data sets (English Wikipedia articles that have been transformed into XML) | Dead | |
This is an alphabetical list of film articles (or sections within articles about films). It includes made for television films | Dead | |
Using the Wikipedia page-to-page link database | Dead | |
Wikipedia: Lists of common misspellings/For machines | Dead | |
Apache Hadoop is a powerful open source software package designed for sophisticated analysis and transformation of both structured and unstructured complex data. | Dead | |
Wikipedia XML Data | 2015 | |
Wikipedia Page Traffic Statistics (up to November 2015) | 2015 | |
Complete Wikipedia edit history (up to January 2008) | 2008 | |
Wikitech-l page counters | 2016 | |
MusicBrainz Database | Dead | |
Datasets of network extracted from User Talk pages | 2011 | |
Wikipedia Statistics | Present | |
List of articles created last month/week/day with most users contributing to article within the same period | Dead | |
Wikipedia Taxonomy automatically generated from the network of categories in Wikipedia(RDF Schema format)(Ponzetto and Strube, 2007 a–c; Zirn et al., 2008) | Dead | |
Semantic Wikipedia: A snapshot of Wikipedia automatically annotated with named entity tags(Zaragoza etal.,2007) | Dead | |
Cyc to Wikipedia mappings: 50,000 automatically created mappings from Cyc terms to Wikipedia articles (Medelyan and Legg, 2008) | Dead | |
Topic indexed documents: A set of 20 Computer Science technical reports indexed with Wikipedia articles as topics. 15 teams of 2 senior CS undergraduates have independently assigned topics from Wikipedia to each article (Medelyan et al., 2008) | Dead | |
Wikipedia Page Traffic API | Present | |
Articles published using the tool. Both detailed lists and summary statistics are available. |
| 2022 |
Tools to extract data from Wikipedia:
This table might be migrated to the Knowledge Extraction Wikipedia Article
Tool | Description | URL | Last Updated |
---|---|---|---|
Wikilytics | Extracting the dumps into a NoSQL database | 2017 | |
Wikipedia2text | Extracting Text from Wikipedia | 2008 | |
Traffic Statistics | Wikipedia article traffic statistics | Dead | |
Wikipedia to Plain text | Generating a Plain Text Corpus from Wikipedia | 2009 | |
DBpedia Extraction Framework | The DBpedia software that produces RDF data from over 90 language editions of Wikipedia and Wiktionary (highly configurable for other MediaWikis also). |
| 2019 |
Wikiteam | Tools for archiving wikis including Wikipedia | 2019 | |
History Flow | History flow is a tool for visualizing dynamic, evolving documents and the interactions of multiple collaborating authors | Dead | |
WikiXRay | This tool includes a set of Python and GNU R scripts to obtain statistics, graphics and quantitative results for any Wikipedia language version | 2012 | |
StatMediaWiki | StatMediaWiki is a project that aims to create a tool to collect and aggregate information available in a MediaWiki installation.Results are static HTML pages including tables and graphics that can help to analyze the wiki status and development, or a CSV file for custom processing. | Dead | |
Java Wikipedia Library (JWPL) | This is a open-source, Java-based application programming interface that allows to access all information contained in Wikipedia | 2016 | |
Wikokit | Wiktionary parser and visual interface | 2019 | |
wiki-network | Python scripts for parsing Wikipedia dumps with different goals | 2012 | |
Pywikipediabot | Python Wikipedia robot framework | 2019 | |
WikiRelate | API for computing semantic relatedness using Wikipedia (Strube and Ponzetto,2006) | 2006 | |
WikiPrep | A Perl tool for preprocessing Wikipedia XML dumps(Gabrilovich andMarkovitch,2007) | 2014 | |
W.H.A.T. Wikipedia Hybrid Analysis Tool | An analytic tool for Wikipedia with two main functionalities: an article network and extensive statistics.It contains a visualization of the article networks and a powerful interface to analyze the behavior of authors | 2013 | |
QuALiM | A Question Answering system. Given a question in a natural language returns relevant passages from Wikipedia (Kaisser, 2008) | 2008 | |
Koru | A demo of a search interface that maps topics involved in both queries and documents to Wikipedia articles. Supports automatic and interactive query expansion(Milne et al.,2007) | 2007 | |
Wikipedia Thesaurus | A large scale association thesaurus containing 78M associations(Nakayama et al.,2007a,2008) | Dead | |
Wikipedia English–Japanese dictionary | A dictionary returning translations from English into Japanese and vise versa, enriched with probabilities of these translations(Erdmann et al.,2008) | Dead | |
Wikify | Automatically annotates any text with links to Wikipedia articles(Mihalcea and Csomai,2007) | Dead | |
Wikifier | Automatically annotates any text with links to Wikipedia articles describing named entities | Dead | |
Wikipedia Cultural Diversity Observatory | Creates a dataset named Cultural Context Content (CCC) for each language edition with the articles that relate to its cultural context (geography, people, traditions, history, companies, etc.). | 2019 | |
Time-series graph of Wikipedia | Wikipedia web network stored in Neo4J database. Pagecounts data stored in Apache Cassandra database. Deployment scripts and instructions use corresponding Wikimedia dumps. | 2020 | |
Basic python parsing of dumps | A guide for how to parse Wikipedia dumps in python | 2017 | |
Wiki Dump Reader | A python package to extract text from Wikipedia dumps | 2019 | |
MediaWiki Parser from Hell | A python library to parse MediaWiki wikicode. | 2020 | |
Mediawiki Utilities | A collection of utilities for interfacing with MediaWiki: | 2020 | |
qwikidata | A python utility for interacting with WikiData | 2020 | |
Namespace Database | A python utility which: | 2020 |
- Research:Index
- Research:Query Library
- en:Category:Websites which use Wikipedia
- Data dumps/Other tools
- Research:Data
- Data dumps/More resources
- Help:Export
- CD and paper
- Toggle limited content width
W iki G raphs: A W ikipedia Text - Knowledge Graph Paired Dataset
Luyu Wang , Yujia Li , Ozlem Aslan , Oriol Vinyals
Export citation
- Preformatted
Markdown (Informal)
[WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset](https://aclanthology.org/2021.textgraphs-1.7) (Wang et al., TextGraphs 2021)
- WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset (Wang et al., TextGraphs 2021)
- Luyu Wang, Yujia Li, Ozlem Aslan, and Oriol Vinyals. 2021. WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset . In Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15) , pages 67–82, Mexico City, Mexico. Association for Computational Linguistics.
Wikidata5m is a million-scale knowledge graph dataset with aligned corpus. This dataset integrates the Wikidata knowledge graph and Wikipedia pages. Each entity in Wikidata5m is described by a corresponding Wikipedia page, which enables the evaluation of link prediction over unseen entities.
The dataset is distributed as a knowledge graph, a corpus, and aliases. We provide both transductive and inductive data splits used in the original paper .
Setting | #Entity | #Relation | #Triplet | |
---|---|---|---|---|
Transductive | Train | 4,594,485 | 822 | 20,614,279 |
Valid | 4,594,485 | 822 | 5,163 | |
Test | 4,594,485 | 822 | 5,133 | |
Inductive | Train | 4,579,609 | 822 | 20,496,514 |
Valid | 7,374 | 199 | 6,699 | |
Test | 7,475 | 201 | 6,894 |
- Knowledge graph: Transductive split , 160 MB. Inductive split , 160 MB. Raw , 168 MB.
- Corpus , 991 MB.
- Entity & relation aliases , 188 MB.
For raw knowledge graph, it may also contain entities that do not have corresponding Wikipedia pages.
Wikidata5m follows the identifier system used in Wikidata. Each entity and relation is identified by a unique ID. Entities are prefixed by Q , while relations are prefixed by P .
Knowledge Graph
The knowledge graph is stored in the triplet list format. For example, the following line corresponds to <Donald Trump, position held, President of the United States> .
Each line in the corpus is a document, indexed by entity ID. The following line shows the description for Donald Trump .
Each line lists the alias for an entity or relation. The following line shows the aliases of Donald Trump .
Publications
- KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhengyan Zhag, Zhiyuan Liu, Juanzi Li, Jian Tang TACL 2021 arXiv BibTeX
Navigation Menu
Search code, repositories, users, issues, pull requests..., provide feedback.
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
- Notifications You must be signed in to change notification settings
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
google-research-datasets/wit
Folders and files.
Name | Name | |||
---|---|---|---|---|
62 Commits | ||||
Repository files navigation
Wit : wikipedia-based image text dataset.
Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal machine learning models.
Key Advantages
A few unique advantages of WIT:
- The largest multimodal dataset (publicly available at the time of this writing) by the number of image-text examples.
- A massively multilingual dataset (first of its kind) with coverage for 108 languages.
- First image-text dataset with page level metadata and contextual information
- A collection of diverse set of concepts and real world entities.
- Brings forth challenging real-world test sets.
You can learn more about WIT Dataset from our arXiv paper .
Latest Updates
2021 April: Happy to share the good news that our paper got accepted at SIGIR Conference . From ACM site, you can find our paper, slides and presentation .
2021 September: WIT Image-Text Competition is live on Kaggle. Our collaborators from Wikimedia Research blogged about this and they have made available the raw pixels and resnet50 embeddings for the images in this set. Here is our Google AI blog post .
2022 April: We are happy to share that the WIT paper and dataset was awarded the WikiMedia Foundation's Research Award of the Year ( tweet 1 , tweet 2 ). We are deeply honored and thank you for the recognition.
2022 May: We have released the WIT validation set and test set. Please see the data page for download links.
2022 Oct: Authoring Tools for Multimedia Content proposal accepted at TREC 2023
2023 Apr: AToMiC accepted at SIGIR 2023.
2023 Apr: WikiWeb2M Dataset released.
2023 May: Accepted submissions at WikiWorkshop 2023 .
- WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset ( pdf , arXiv )
- Building Authoring Tools for Multimedia Content with Human-in-the-loop Relevance Annotations ( pdf )
- Characterizing Image Accessibility on Wikipedia across Languages ( pdf )
WIT Example
Wikipedia page.
For example, let's take the Wikipedia page for Half Dome, Yosemite in CA .
From the Wikipedia page for Half Dome : Photo by DAVID ILIFF. License: CC BY-SA 3.0
Wikipedia Page with Annotations of what we can extract
From this page, we highlight the various key pieces of data that we can extract - images, their respective text snippets and some contextual metadata.
By extracting and filtering these carefully, we get a clean, high quality image-text example that can be used in multimodal modeling.
Multimodal visio-linguistic models rely on a rich dataset to help them learn to model the relationship between images and texts. Having large image-text datasets can significantly improve performance, as shown by recent works. Furthermore the lack of language coverage in existing datasets (which are mostly only in English) also impedes research in the multilingual multimodal space – we consider this a lost opportunity given the potential shown in leveraging images (as a language-agnostic medium) to help improve our multilingual textual understanding.
To address these challenges and advance research on multilingual, multimodal learning we created the Wikipedia-based Image Text (WIT) Dataset. WIT is created by extracting multiple different texts associated with an image (e.g., as shown in the above image) from Wikipedia articles and Wikimedia image links. This was accompanied by rigorous filtering to only retain high quality image-text sets.
The resulting dataset contains over 37.6 million image-text sets – making WIT the largest multimodal dataset (publicly available at the time of this writing) with unparalleled multilingual coverage – with 12K+ examples in each of 108 languages (53 languages have 100K+ image-text pairs).
WIT: Dataset Numbers
Type | Train | Val | Test | Total / Unique |
---|---|---|---|---|
Rows / Tuples | 37.13M | 261.8K | 210.7K | 37.6M |
Unique Images | 11.4M | 58K | 57K | 11.5M |
Ref. Text | 16.9M | 150K | 104K | 17.2M / 16.7M |
Attr. Text | 34.8M | 193K | 200K | 35.2M / 10.9M |
Alt Text | 5.3M | 29K | 29K | 5.4M / 5.3M |
Context Texts | - | - | - | 119.8M |
WIT: Image-Text Stats by Language
Image-Text | # Lang | Uniq. Images | # Lang |
---|---|---|---|
total > 1M | 9 | images > 1M | 6 |
total > 500K | 10 | images > 500K | 12 |
total > 100K | 36 | images > 100K | 35 |
total > 50K | 15 | images > 50K | 17 |
total > 14K | 38 | images > 13K | 38 |
We believe that such a powerful diverse dataset will aid researchers in building better multimodal multilingual models and in identifying better learning and representation techniques leading to improvement of Machine Learning models in real-world tasks over visio-linguistic data.
WIT Dataset is now available for download. Please check the data page.
If you use the WIT dataset, you can cite our work as follows.
This data is available under the Creative Commons Attribution-ShareAlike 3.0 Unported license.
Projects using WIT
For information regarding MURAL (Multimodal, Multitask Retrieval Across Languages) paper accepted at EMNLP 2021.
For any questions, please contact [email protected] .
If WIT dataset is useful to you, please do write to us about it. Be it a blog post, a research project or a paper, we are delighted to learn about it.
Contributors 3
Subscribe to the PwC Newsletter
Join the community, edit dataset, edit dataset tasks.
Some tasks are inferred based on the benchmarks list.
Add a Data Loader
Remove a data loader.
- huggingface/datasets -
- facebookresearch/ParlAI -
Edit Dataset Modalities
Edit dataset languages, edit dataset variants.
The benchmarks section lists all benchmarks using a given dataset or any of its variants. We use variants to distinguish between results evaluated on slightly different versions of the same dataset. For example, ImageNet 32⨉32 and ImageNet 64⨉64 are variants of the ImageNet dataset.
Add a new evaluation result row
Wikiqa (wikipedia open-domain question answering).
The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. In order to reflect the true information need of general users, Bing query logs were used as the question source. Each question is linked to a Wikipedia page that potentially has the answer. Because the summary section of a Wikipedia page provides the basic and usually most important information about the topic, sentences in this section were used as the candidate answers. The corpus includes 3,047 questions and 29,258 sentences, where 1,473 sentences were labeled as answer sentences to their corresponding questions.
Benchmarks Edit Add a new result Link an existing benchmark
Trend | Task | Dataset Variant | Best Model | Paper | Code |
---|---|---|---|---|---|
Paper | Code | Results | Date | Stars |
---|
Dataset Loaders Edit Add Remove
Similar Datasets
Insuranceqa, license edit, modalities edit, languages edit.
IMAGES
COMMENTS
This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized). - GitH...
pip install mwparserfromhell. Then, you can load any subset of Wikipedia per language and per date this way: from datasets import load_dataset. load_dataset("wikipedia", language="sw", date="20220120") You can specify num_proc= in load_dataset to generate the dataset in parallel. You can find the full list of languages and dates here.
This dataset gathers 728,321 biographies from English Wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized).
Start downloading a Wikipedia database dump file such as an English Wikipedia dump. It is best to use a download manager such as GetRight so you can resume downloading the file even if your computer crashes or is shut down during the download. Download XAMPPLITE from [2] (you must get the 1.5.0 version for it to work).
Dataset Summary This Dataset contains 728321 biographies extracted from Wikipedia containing the first paragraph of the biography and the tabular infobox.
Wikipedia-biography-dataset : This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized).
WikiBio is constructed using Wikipedia biography pages, it contains the first paragraph and the infobox tokenized. The dataset follows a standarized table format.
This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized). - Davi...
Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others. Wikidata also provides support to many other sites and services beyond just Wikimedia ...
Machine learningand data mining. These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning ), computer hardware, and, less ...
Then, you can load any subset of Wikipedia per language and per date this way: from datasets import load_dataset. load_dataset("wikipedia", language="sw", date="20220120") You can specify num_proc= in load_dataset to generate the dataset in parallel. You can find the full list of languages and dates here.
This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized). It was used in our work, Neural Text Generation from Structured Data with Application to the Biography Domain. Rémi Lebret, David Grangier and Michael Auli ...
wikipedia/20230601.ace Config description: Wikipedia dataset for ace, parsed from 20230601 dump.
WIK-IBIO (Lebret et al., 2016) is a biography dataset that pairs Wikipedia infoboxes with initial sentences in corresponding Wikipedia articles. Similarly, Vou-giouklis et al. (2017) create a biography dataset from Wikipedia using the first two sentences in Wikipedia articles and the aligned data triples from DBpedia and Wikidata.
This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized). - Davi...
Datasets. From Meta, a Wikimedia project coordination wiki. Various places that have Wikimedia datasets, and tools for working with them. Also, you can now store table and maps data using Commons Datasets, and use them from all wikis from Lua and Graphs.
The UAH satellite temperature dataset, developed at the University of Alabama in Huntsville, infers the temperature of various atmospheric layers from satellite measurements of the oxygen radiance in the microwave band, using Microwave Sounding Unit temperature measurements.. It was the first global temperature datasets developed from satellite information and has been used as a tool for ...
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText ...
We present a new dataset of Wikipedia articles each paired with a knowledge graph, to facilitate the research in conditional text generation, graph generation and graph representation learning. Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences), thus limiting the capabilities of the models ...
Publications. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhengyan Zhag, Zhiyuan Liu, Juanzi Li, Jian Tang TACL 2021 arXiv BibTeX. Project page of MilaGraph group.
WIT : Wikipedia-based Image Text Dataset Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal machine learning models.
Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets. **Wiki-en** is an annotated English dataset for domain detection extracted from Wikipedia.
The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. In order to reflect the true information need of general users, Bing query logs were used as the question source. Each question is linked to a Wikipedia page that potentially has the answer. Because the summary section of a Wikipedia page ...