Introducing MathQA: a Math-Aware question answering system

Moritz Schubotz (Chair of Digital Media, University of Wuppertal, School of Electrical Information and Media Engineering, Wuppertal, Germany)
Philipp Scharpf (Information Science Group, University of Konstanz, Department of Computer and Information Science, Konstanz, Germany)
Kaushal Dudhat (Information Science Group, University of Konstanz, Department of Computer and Information Science, Konstanz, Germany)
Yash Nagar (Information Science Group, University of Konstanz, Department of Computer and Information Science, Konstanz, Germany)
Felix Hamborg (Universitat Konstanz Fachbereich Wirtschaftswissenschaften, Konstanz, Germany)
Bela Gipp (Chair of Digital Media, University of Wuppertal, School of Electrical Information and Media Engineering, Wuppertal, Germany)

Information Discovery and Delivery

ISSN: 2398-6247

Publication date: 19 November 2018



This paper aims to present an open source math-aware Question Answering System based on Ask Platypus.


The system returns as a single mathematical formula for a natural language question in English or Hindi. These formulae originate from the knowledge-based Wikidata. The authors translate these formulae to computable data by integrating the calculation engine sympy into the system. This way, users can enter numeric values for the variables occurring in the formula. Moreover, the system loads numeric values for constants occurring in the formula from Wikidata.


In a user study, this system outperformed a commercial computational mathematical knowledge engine by 13 per cent. However, the performance of this system heavily depends on the size and quality of the formula data available in Wikidata. As only a few items in Wikidata contained formulae when the project started, the authors facilitated the import process by suggesting formula edits to Wikidata editors. With the simple heuristic that the first formula is significant for the paper, 80 per cent of the suggestions were correct.


This research was presented at the JCDL17 KDD workshop.



Schubotz, M., Scharpf, P., Dudhat, K., Nagar, Y., Hamborg, F. and Gipp, B. (2018), "Introducing MathQA: a Math-Aware question answering system", Information Discovery and Delivery, Vol. 46 No. 4, pp. 214-224.

Download as .RIS



Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited

ACM Reference Format:

2018. Introducing MathQA - A Math-Aware Question Answering System. In Proceedings of The 18th ACM/IEEE Joint Conference on Digital Libraries (JCDL'18). ACM, NY, NY, USA, 11 pages.

1. Introduction

Question answering (QA) systems are information retrieval (IR) systems, allowing the user to pose questions in natural language to provide quick and succinct answers – in contrast to search engines which deliver ranked lists of documents. In this project, we developed an open source QA system, which is available at Our system can answer mathematical questions in the form of natural language, yielding a formula, which is retrieved from Wikidata. Wikidata is a free and open knowledge-base that can be read and edited by humans and machines. It stores common sources of other Wikimedia projects, especially for Wikipedia infoboxes. In addition, our system enables the user to perform arithmetic operations using the retrieved formula. (Figure 1) We developed three modules: The Question Parsing Module (1) transforms questions into a triple representation and produces a simplified dependency tree. The Formula Retrieval Module (2) then queries the Wikidata knowledge- base for the requested formula and presents the result to the user. The user can subsequently choose values for the occurring variables and order a calculation that is done by a Calculation Module (3). If available, the system retrieves the identifier names and values from Wikidata, so that the user can understand their meaning (Figure 4). Moreover, we developed a module which can answer questions in the Hindi language. In contrast to the English, which is exploiting the dependency graph of Stanford NLP, the Hindi module uses regular expressions to parse the questions and before passing the triple representation to the Formula Retrieval Module. Our QA system builds upon Ask Platypus (S. by Lexistems SAS and E. de Lyon, 2018), an existing QA engine that can answer English questions using Wikidata. We chose Ask Platypus as the best among other Wikidata-based QA systems and extended its functionality to include mathematical questions. Finally, we evaluated the system's performance and the quality of results in comparison to a commercial computational mathematical knowledge engine. Our system outperforms the reference engine on some definition and geometry type questions, and we conjecture that the validity can be expanded to the whole domain.

Before building the QA system, we performed a seeding of all currently available mathematical formulae (labeled by math-tags) from Wikipedia into Wikidata. Each section of this paper is divided into two parts: the first part describes the process of seeding the Wikidata knowledge-base with mathematical formulae from Wikipedia as a separate project, laying the foundation for the second part, its application in the QA system.

1.1 Vision

The mathematical QA system is a first motivating application that exploits the mathematical knowledge seeded into Wikidata. It is a first step toward our long-term goal of building a collaborative, semi-formal, language independent math(s) encyclopedia hosted by Wikimedia at (Corneli and Schubotz, 2017). Using the popular Wikipedia frame- work as frontend will help popularize the project and motivate many experts from the mathematical sciences to contribute. We envisage a future centralized, machine-readable repository for mathematical world-knowledge that can be used to enable cross-article queries, e.g. to automate proofs of mathematical theorems. A crucial foundation for a path towards this long-term goal is having a large amount of mathematical data in Wikidata. This paper is a starting point for the development of effective methods to automatically seed Wiki-data with mathematical formulae from Wikipedia or STEM documents.

1.2 Problem setting

Wikipedia consists of many pages related to mathematics. However, promptly grasping the essence of an article can be a difficult task as many pages contain a lot of information. Using Wikipedia means reading articles, and there currently is no way to automatically gather information scattered across multiple articles (Krötzsch et al., 2007). To overcome this problem, Wikidata can be used as a source. Wikidata provides machine-readable content that can automatically be interpreted by a computer and queried to access specific information. Thus, there is a huge potential in adding formulae related to all mathematical topics as items to Wikidata, enabling direct access to the defining formula of a requested mathematical concept. The first goal of this project was to enrich Wikidata with mathematical knowledge it currently lacks. Adding this information into Wikidata will not only increase the content of these items but also make them more meaningful and useful. Furthermore, these formulae will be machine-interpretable and can be used in many applications in the STEM disciplines. Most importantly, we are then able to develop the mathematical QA system which can directly answer mathematical questions provided by the user, using the mathematical formulae and relations available on Wikidata. As a result, instead of retrieving a whole Wikipedia page which is full of text, users will directly get the desired piece of information, the formula they are looking for.

1.3 Research objectives

Motivated by the lack of mathematical knowledge in Wikidata, the following research objective was defined:

Identify and extract defining formulae from all the available mathematical articles on Wikipedia to seed them into the Wikidata knowledge-base.

To achieve this objective, the following tasks were performed:

  • identification of mathematical articles from the Wikipedia data dump;

  • manual analysis to determine the defining formula of an individual article;

  • seeding of the retrieved formulae into Wikidata using the Primary sources tool [S. S. (Google), 2018]; and

  • evaluation of the overall correctness and accuracy of the data migration by precision, recall and f-measure.

Subsequently, we capitalized on the formulae seeded into Wikidata to:

Build a math-aware QA system, processing a mathematical natural language question to retrieve a formula from Wikidata and allow a calculation based on input values for the occurring variables provided by the user.

We performed the following subtasks:

  • development of a Question Parsing Module that determines a triple representation of the user's input;

  • development of a Formula Retrieval Module to query Wikidata using pywikibot (1. G. contributors, 2018);

  • development of a Calculation Module that performs a calculation based on the retrieved formula for the question and input values for the variables provided by the user;

  • evaluation of the overall performance and comparison to a commercial computational mathematical knowledge-engine; and

  • development of regular expressions to maximize the number of answerable questions provided by the user in the Hindi language.

1.4 Section outline

This paper is organized as follows: Section Background contains details about the Wikimedia sister projects Wikipedia and Wikidata and the concept of QA systems. Subsection Implementation describes our approach of transferring formulae from Wikipedia to Wikidata and the structure of the QA system which uses the seed. In subsection Evaluation, we describe the construction of a random sample to assess the quality of the data transfer by precision, recall and f-measure. Subsequently, we evaluate the performance of the QA system and discuss its limitations. Finally, we conclude with a summary and suggested improvements for future work.

2. Background

2.1 Wikipedia and wikidata

Started in 2001, mainly as a text-based resource, Wikipedia[1] is the world's largest online encyclopedia which allows its users to edit articles and add new information into it (W. Foundation, 2018a, 2018b). Wikipedia has collected a rapidly increasing amount of information, including numbers, coordinates, dates and other types of relationships among different domains of knowledge. Denny Vrandecic, ontologist at Google, claims that It has become a resource of enormous value, with potential applications across all areas of science, technology and culture (Vrandecic and Krötzsch, 2014).

Wikipedia is open and welcomes everyone who wants to make a positive contribution.Ward Cunningham, the inventor of Wiki, describes Wikipedia as The simplest online database that could possibly work (Leuf and Cunningham, 2001).

The following are some characteristics of Wikipedia which enable it to manage its data on a global scale:

  • Open and instantaneous editing: Wikipedia allows its users to extend and edit the available information even without creating an account. All changes are instantaneously released online.

  • Record of editing history: Wikipedia keeps a record of all the changes made to a page. The page history can be viewed by everyone. Each time a page is edited, the new version is released, and the old version is saved in the page history.

  • Linked pages: all Wikipedia pages are linked to other Wikipedia pages that are related to each other which results in a web of interlinked pages.

  • Multilingual: Wikipedia exists in many languages. Every article on Wikipedia consists of a list of languages it is available in.

  • Content standard: the information contributed to Wikipedia must be encyclopedic, neutral and verifiable.

  • Community control: Wikipedia is always supported by a team of dedicated volunteers who take the responsibility of developing content, policies and practices.

  • Continuous evolution: Wikipedia is always in a state of continuous growth. Information is continuously being added and updated and new features are added into Wikipedia to make it more useful.

  • Totally free: all Wikipedia content is free to use, anyone is free to contribute, and the content is released under a free license which means anyone may reuse it elsewhere. Wikipedia is a non-commercial project, and it has no advertisements.

Although Wikipedia comprises a huge amount of data, it does not provide direct access to specific facts, as it is still unstructured, which is unfortunate for anyone who wants to retrieve information systematically. To remedy this shortcoming, the Wikimedia Foundation launched Wikidata[2] in October 2012. Wikidata is a free and open knowledge-base that can be read and edited by humans and machines. It acts as a common source of data which can be used by Wikimedia projects such as Wikipedia, Wikivoyage, Wikisource and others. As Wikidata is one of the most recent projects of the Wikimedia Foundation, it is still in an early phase of development. Therefore, it encourages its users and organizations to donate data so that it can grow and distribute open source multilingual and educational content free of charge. Wikidata's data model basically consists of item and statement. Each item represents an entity, such as a person's name, a scientific term, a mathematical theorem, etc. and is identified by a unique number prefixed by the letter Q. For instance, the item number (QID) for the topic Computer science is Q21198. Additionally, items may have labels, description and aliases in multiple languages. Information is added to items by creating statements and stored in the form of key-value pairs with each statement consisting of a property as a key and a value associated with that property.

Figure 2 illustrates the data model used in Wikidata. In this example, London is the main item with a statement which consists of one claim and one reference for the claim. The claim itself contains a main property-value pair which represents the main fact, which is population and the corresponding value in this case. Some optional qualifiers can also be added into the claim to append additional information related to the main property. In this example, the time at which the population was recorded and the determination method are the qualifiers with their corresponding values June 2012 and estimation, respectively. Wikidata does not claim to provide users with true facts because many facts are disputed or uncertain. Therefore, Wikidata allows such conflicting facts to coexist so that different opinions can be expressed properly. The content of Wikidata is in public domain under Creative Commons CC0 license which allows every user to use, extend and edit the stored information. The Wikidata requirements (W. Foundation, 2018a, 2018b) state that the data uploaded to Wikidata must not overwhelm the community. Research in understanding how systems for collaborative knowledge creation are impacted by events like data migration is still in its early stages (Moskaliuk et al., 2012), in particular for structured knowledge (Horridge et al., 2014). Most of the research is focused on Wikipedia (Flöck et al., 2015), which is understandable considering the availability of its data sets, in particular the whole edit history (Schindler and Vrandeccic, 2011) and the availability of tools for working with Wikipedia (Milne and Witten, 2013).

2.2 Question answering systems

There has been a significant rise in the usage of open-domain QA systems since the establishment of the question answering track in the Text Retrieval Conferences, beginning with TREC-8 in 1999 (Voorhees, 2001). However, in 1960, Simmons published a survey article named Answering English Questions by Computer and a paper which describes more than 15 English language QA systems that were implemented in the previous five years (Simmons, 1965; Kwok et al., 2001). QA systems which rely on database approaches are briefly described in Hirschman and Gaizauskas (2001). The quality of a query for an IR system has a direct impact on the success of the search outcome. In fact, one of the most important, but frustrating tasks in IR is query formulation (French et al., 2001). Relevance feedback strategies, selecting important terms from documents that have been identified by the user as relevant, are frequently used (Salton and Buckley, 1997). Various tools are available which answer queries from diverse fields. The START (Katz et al., 2018) Natural Language Question Answering System aims to supply users with just right information instead of merely providing a list of hits. The NSIR (Radev et al., 2002) web-based question answering system implemented a method called Probabilistic Phrase Reranking (PPR) where a potential answer that is spatially close to the question words gets a higher score. Some QA systems are domain-specific and focus on a specialized area such as HONQA (Health On the Net Foundation) [H. O. the Net Foundation (HON), 2018] for quality health care questions. LAMP (Zhang and Lee, 2003) is a QA system which uses snippets from the results returned by a search engine like Google. This property is called snippet tolerant and is useful because it can be time-consuming to analyze and download the original Web documents. There are some tools available online that perform arithmetic operations and also answer general and mathematical questions, commercial computational mathematical knowledge-engines. A search engine which harvests the web for content representation of mathematical formulae and marks them with substitution tree indexing was implemented by Kohlhase and Sucan (2006). There has been a lot of research related to the extraction of mathematical formulae and discovering relations between a formula and its surrounding text (Pagel and Schubotz, 2014; Lee et al., 2006; Yokoi et al., 2011; Quoc et al., 2010). Our QA system for mathematical problems using Wikidata will be the first of its kind. However, there are some already existing QA systems which use Wikidata to answer questions from other domains such as Ask Wikidata! (W. user Bene, 2018) and Ask Platypus (S. by Lexistems SAS and E. de Lyon, 2018) – similar tools that can be used to find any general information from Wikidata by parsing the natural language query entered by the user.

3. Implementation

This chapter describes the implementation details of the formula seeding and QA system. Figure 3 shows the workflow of the data transfer from Wikipedia to Wikidata. The first task was to download the Wikipedia data dump and identify the articles that are related to mathematics. Subsequently, we needed to determine and extract the defining formulae from each article. The extraction process was divided into two categories. We distinguished articles related to geometry from the rest of general mathematics. After the extraction, the formulae were added into Wikidata using its Primary sources tool [S. S. (Google), 2018] that allows users to approve or reject a claim and its reference.

3.1 Seeding math into Wikidata

As mentioned before, Wikipedia consists of millions of documents in various languages. Since its content is barely machine-interpretable, we needed to find a way to distinguish mathematical articles from the rest. To achieve this, the first step was to reduce the number of articles. We only considered English Wikipedia as the primary dataset to profit from two advantages. First, it reduced the total number of articles from over 40 to around 5.5 million. Second, English Wikipedia contains the highest number of articles compared to other languages. So, we assume that a mathematical article available in any other language is also available in English Wikipedia, however, vice-versa it might not be true in numerous cases. To recover the mathematical formulae, we needed to distinguish formulae from their surrounding text by identifying the <math> tags. Performing the math tag search, we found 32.682 pages containing math formulae. After the discovery of mathematical articles in the English Wikipedia data set, we were confronted with a more significant challenge: How to determine the defining formula within a mathematical article. A Wikipedia page contains all the information related to a particular topic. Mathematical pages often contain derivations along with an equation. If we extract all the equations included in math tags, we get a lot of unrelated and irrelevant formulae. To solve this problem, we found a simple yet effective solution; we came up with after manually analyzing a set of random mathematical articles. We observed that in most of the mathematical Wikipedia articles, the first formula was the most important one related to that topic. As the approach gave false results for the Wikipedia articles related to geometry, we divided the math articles into the categories general mathematics and geometry. We then used different approaches to math extraction for both categories.

3.1.1 Geometry questions

The main reason behind separating geometry related articles from the rest is that the structure of these articles is different from those of general mathematical articles. A geometric object has various properties such as volume, area and perimeter. Thus, a single topic can have more than one property and multiple defining formulae. The Wikipedia articles of such geometric shapes (cube, circle, ellipse, etc.) may contain the formula of these properties within different subsections of the article. To solve this problem, we first identified all the pages related to geometry by using a list of 16 Wikipedia geometry categories: Elementary geometry, Theorems in geometry, Polygons, Elementary shapes, Quadrilateral, Area, Volume, Conic sections, Geometric centers, Circles, Curves, Surfaces, Cubes, Platonic solids, Polytopes and Euclidean plane geometry. We discovered 292 pages belonging to these categories, each containing multiple relevant formulae. We subsequently retrieved the first formula from each of these subsections. However, not all the subsections of the page provided defining formulae related to the topic. For further refinement, we used a simple keywords based filtration of the following property names: Area, Volume, Circumference, Perimeter, Circumradius, Inradius and Median, we considered most important for describing 2- and 3-dimensional shapes. These properties have a unique defining formula that can easily be checked for its correctness in the evaluation. We are strictly limited to adding only one defining formula for each property into Wikidata. As a result, we got 65 formulae for the properties mentioned above belonging to 49 Wikipedia articles related to geometry.

3.1.2 General formulae

As stated previously, we discovered 32.682 pages in English Wikipedia which are related to mathematics. Out of these, 292 were filtered out as a separate category of geometry. For the remaining pages, we chose a different formula retrieval approach. We extracted the first formula from each Wikipedia article, as in most cases this, in fact, yielded the defining formula instead of, e.g. parts of a derivation or proof. After the discovery and extraction of math formulae from Wikipedia, we handed the list to Wikimedia who seeded the formulae to the Primary sources [S. S. (Google), 2018] where they can now be approved or rejected by Wikidata users.

3.2 Building the math QA system

Having the formulae seeded into Wikidata, we could build our Math-aware QA system. It consists of three modules written in Python that will be described in the following.

The main aim of the Question Parsing Module is to transform questions into a tree of triples - producing a simplified and well-structured tree, without losing relevant information about the original question that was provided by the user. This is done by analyzing the grammatical structure of the question, mapping it into a normal form. For our module, we used the simplified dependency tree representation output of the Stanford Parser (T. S. N. L. P. Group, 2018).

Receiving the triple representation from the Question Parsing Module, the Formula Retrieval Module is responsible for extracting formulae from Wikidata using Pywikibot (1. G. contributors, 2018), a python library, and collection of tools that automate the work on Mediawiki sites. Typically, the triple representation (subject, predicate, object) is incomplete, with either a missing predicate or object. Once the Wikidata item for the subject is available, the module tries to retrieve the value of the predicate. There are two cases for the values of the predicates.

In the first case, if the value of the predicate is formula, Pywikibot looks for the value of the Wikidata property named defining formula (P2534) and, if available, replaces the triple object with its value. For instance, What is the formula for Pythagorean theorem? has the triple representation (Pythagorean theorem, formula,?) . The module maps the subject of the triple to the Wikidata item and returns the value of the defining formula property as object.

In the second case, if the value of the predicate is in our list of geometry properties (volume, area, radius etc.), Pywikibot looks for the value of the predicate in the has quality property (P1552) of the subject and, if available, replaces the triple object with the defining formula (P2534) value. For instance, What is the volume of a sphere? has the triple representation (sphere, volume,?). The module maps the subject to the Wikidata item, the predicate to it has quality property and returns the value of the defining formula property as object.

The Calculation Module module is responsible for calculating the result of the formula, with values for the occurring variables provided by the user. If the names and values of the identifiers are available on Wikidata as has part (P527) property, they are automatically retrieved and displayed, so that the user can understand their meaning before entering values. Once the formula is received from the Formula Retrieval Module, it is parsed from LaTeX to Sympy form using the process sympy parser (Trollbäck, 2018) to subsequently have its identifiers extracted for the calculation that is done using the python library Sympy (S. D. Team, 2018). In addition to the definition and geometry questions, our system also allows a formula as a question input to provide a calculation based on values for the identifiers.

Figure 4 shows the user web interface (GUI) for English and Hindi questions, as well as a direct formula question.

4. Evaluation

Finally, the individual success of the Wikidata seeding and our QA system was evaluated by the standard Information Retrieval measures precision, recall and the combined f-measure.

4.1 Evaluation of the Wikidata seeding

The main goal of the evaluation was to determine how effectively and accurately we were able to retrieve the mathematical formulae from Wikipedia. The evaluation was carried out separately for general mathematics and geometry.

4.1.1 General mathematics

We evaluated the success of the data transfer by precision and recall, while classifying a result as relevant if and only if the retrieved result was estimated to be one and the only general mathematical representation of the Wikipedia article it was extracted from and non-relevant if it was not the defining formula or incomplete. As the formulae were extracted from Wikipedia which is not machine-interpretable, we needed to check manually whether a formula is relevant or non-relevant. Since it would have been very exhaustive and time-consuming to check all the formulae extracted from 32.682 Wikipedia pages, we chose a random sample of an arbitrary size of 100, which can be evaluated in a moderate amount of time. We manually examined the Wikipedia articles to find out whether there were defining formulae available for the given mathematical concepts and compared them to the alleged defining formulae of the Wikidata items. To calculate precision and recall, we classified a result as relevant if there was a defining formula of the mathematical concept in its Wikipedia article and retrieved if it was the first formula that was subsequently seeded to Primary Sources. Table I shows a snapshot of the evaluation of our random sample comprising 100 formulae with their contingencies, whereas Table II contains the evaluation results for the general formula seeding. The complete list is available at

We calculated the precision of the data transfer as:


Concluding that 80 per cent of the retrieved results were relevant. Furthermore, we calculated the recall as:


Concluding that 88 per cent of the total relevant documents were successfully retrieved.

Finally, the combined (equally weighted) f-measure is:


From this result, we can conclude that the seeding of general mathematical formulae from English Wikipedia articles to Wikidata yielded an overall accuracy of 84 per cent.

4.1.2 Geometry questions

Eventually, we evaluated the accuracy of the formulae extracted from geometry related articles. The evaluation was carried out similar to the evaluation of general mathematics. However, due to the much smaller number of items, we did not choose a random sample but evaluated all the retrieved results, i.e. 65 formulae belonging to 49 Wikipedia articles.

Table III contains some of the extracted formulae with their Wikipedia title and the corresponding contingency, whereas Table IV shows our evaulation results for the geometry formula seeding. The complete list is available at

Based on these values, we calculated the precision of the data transfer as:


Concluding that 98 per cent of the retrieved results were relevant. Furthermore, we calculated the recall as:


Concluding that 81 per cent of the total relevant documents were successfully retrieved.

Finally, the combined (equally weighted) f-measure is:


From this result, we can conclude that the seeding of geometry formulae from English Wikipedia articles to Wikidata yielded an overall accuracy of 87 per cent. Issues.

Evaluating the seeding sample, we could observe some illustrative issue cases (see Figure 5) (see Issues we observed evaluating the seeding.) which will be briefly discussed in the following:

There were some Wikipedia articles that did not contain a mathematical concept in the strong sense, but instead an algorithm (# 4, 7, 48), measurement device (# 35, 38), mathematical method or field (# 25, 30, 33), a set (# 49) or even a scientist (# 9) or historical topic (# 19). Some retrieved formulae (# 6, 23) were only a part of the definition or statement which also contained natural language terms. We conclude that the <math> tag is not a sufficient marker to find mathematical concepts within the bulk of Wikipedia articles. For future work, better filters will have to be developed that discard the articles mentioned above and possibly also other types we are currently not aware of.

4.2 Evaluation of the math question answering system

Our math-aware QA system can answer mathematical questions in English and the Hindi language or use a direct formula input to deliver a calculation based on input values for the occurring identifiers.

4.2.1 Evaluation of the formula retrieval module

We evaluated our system on the basis of all formulae that were seeded correctly (true positive), determining whether a formula was retrieved (true or false) from Wikidata by the Formula Retrieval Module. The evaluation lists are available at The retrieval of general mathematical formulae yielded 34 true and 35 false results. The accuracy of the system is:

Accuracy = Number of true results/Total size of the sample = 34/(34+35)=0.49

From the results, we can conclude that the ability to successfully retrieve a general mathematical formula possessed an accuracy of 49 per cent. The retrieval of geometry formulae yielded an accuracy of 31 per cent. Our system can successfully answer questions provided by the user in the Hindi Language. So far, there is no tool available that can answer mathematical questions written in the Hindi language. So, we could not compare our results to any other tool. Issues.

Evaluating the MathQA sample, we could observe some illustrative issue cases (see Figure 6) (see Issues we observed evaluating the QA sys- tem) which will be briefly stated in the following.

Wikidata users renamed the has quality property area to area of plane shape, which impeded our system from retrieving the respective formula. In some cases, there were too many synonymous Wikidata items available, so that the system could not filter out the requested mathematical concept. Furthermore, when processing the request Volume of a prism, the system found prism – transparent optical element (Q165896) instead of prism – geometric shape (Q180544). Finally, if the name of the requested item contained an apostrophe ' or hyphen – it could not be processed properly.

4.2.2 Comparison to a commercial computational mathematical knowledge-engine

Currently, there is no known tool available which delivers a direct formula answer and performs a calculation using input values for the identifiers. So we could not fully compare our results to other systems. However, we studied the ability of our system to successfully answer questions compared to a selected computational knowledge engine that performs arithmetic operations and answers questions from different fields of general knowledge.

Quantitatively, we used 30 mathematical questions from the NTCIR-12 Task (Schubotz et al., 2016) for an evaluation of the performance of the two systems. After approving or seeding 5 missing formulae manually, our system was able to outperform the commercial engine, yielding more suitable answers (denoted by > in (Figure 7) column Performance (Perf.)) in 10/30 of the cases. The reference engine performed better than our system in only 6/30 (denoted by <), and in 14/30 questions both systems provided answers that were estimated to be equally suitable (denoted by =). All in all, our system was able to outperform the commercial engine on the NTCIR-12 sample. Nevertheless, it should be mentioned that our reference engine is continuously striving to improve on mathematical topics. For example, the question What is the formula for Logical equivalence? yielded Additional functionality for this topic is under development […] and we suspect that there are more of these cases. Qualitatively, we could observe that our system is more powerful in comparison to the reference engine when answering definition questions. As an example, the question What is the formula for gas? is answered by PV = nRT, whereas the reference engine only returns a list of gaseous compounds. Furthermore, our system can successfully answer geometry questions, whereas the reference engine provides all formulae with unit edge length and is not giving any option to enter a customized edge length. For example, the question What is the surface area of triangular cupola? is answered as A=(532)a2, whereas the reference engine only displays 3+5327.33013. However, our QA system can answer only mathematical questions, whereas the reference engine can answer questions from many other fields like people and history, health and medicine, materials, dates and times, engineering, earth science, etc. Limitations.

Our system decisively depends on the knowledge stored in Wikidata. If the item or the formula we are looking for is not available in Wikidata, we are unable to answer a given question. Furthermore, we are using the question parsing module developed by Platypus (T. S. N. L. P. Group, 2018) which is limited to the use of nouns in singular form leading to an inability to answer a question containing a plural noun. For instance, the question What is the formula for Maxwell's equations? is parsed as Maxwell's equation such that the item cannot be retrieved. Besides, our parser does not support specific La- TeX tags (\displaystyle,\frac,\left,\right,\bigg,\mathrm, etc.) or punctuation symbols (,; ! etc.) as well as integration, summation, scalar products or more complex formulae. For the Hindi language questions, we are limited to the available Wikidata items that include Hindi labels. So, we are unable to process all the Wikidata items available in English in the Hindi language also.

5. Conclusion

The overall goal of this research project was to extract mathematical knowledge in the form of formulae from English Wikipedia and seed it into Wikidata. This served as a necessary preparation for building a QA system that can answer mathematical questions in English and Hindi language. Additionally, the user can perform arithmetic calculations using the retrieved formula, after providing input values for the detected identifiers. We have been able to provide the Wikidata community with more than 17 thousand new Wikidata statements containing formulae. Our seeding achieved a precision of 80 per cent, recall of 88 per cent and a combined f-measure of 84 per cent for general mathematical formulae. For the geometry formulae, the precision was 98 per cent, the recall 81 per cent and the f-measure yielded 87 per cent. The Formula Retrieval Module of our QA system possessed an accuracy of 49 and 31 per cent for general and geometry formulae, respectively. As far as we are concerned, our QA system is the only available to answer mathematical questions in the Hindi language.

5.1 Future work

Wikipedia is the world's largest online encyclopedia and consists of a massive amount of information in numerous different languages. There are many possibilities when it comes to the task of migrating data from Wikipedia to Wikidata. This research project was only dealing with mathematical knowledge in Wikipedia. However, similar techniques can also be employed to migrate knowledge from other fields such as geography, computer science, politics and many more between these Wiki sister projects. We propose the following possibilities for future work:

  • using another database or knowledge-base than Wikidata or adding a module that can use another database if the requested item is not available in Wikidata;

  • seeding Wikidata with more mathematical formulae to enable answering more questions;

  • seeding Wikidata with more Hindi labels for Wikidata items to improve the performance of our Hindi language module;

  • developing a new LaTeX parser that can parse any latex formula without restriction;

  • improving the Formula Retrieval Module allowing for plots and more information regarding the formula; and

  • improving the Formula Calculation.

Module such that it delivers the calculated result including units of the formula and the identifiers.


Screenshot of MathQA

Figure 1

Screenshot of MathQA

Wikidata statement terminology illustrated by an example

Figure 2

Wikidata statement terminology illustrated by an example

Workflow of our extraction and loading process

Figure 3

Workflow of our extraction and loading process

MathQA GUI for (a) English and (b) Hindi questions, and (c) a direct formula question

Figure 4

MathQA GUI for (a) English and (b) Hindi questions, and (c) a direct formula question

Issues we observed evaluating the seeding

Figure 5

Issues we observed evaluating the seeding

Issues we observed evaluating the QA system

Figure 6

Issues we observed evaluating the QA system

NTCIR 12 (Schubotz et al., 2016) example questions in which our system is able to outperform the selected commercial engine

Figure 7

NTCIR 12 (Schubotz et al., 2016) example questions in which our system is able to outperform the selected commercial engine

Evaluation of the seeding of general formulae on the basis of a random sample comprising 100 formulae

No. Wikipedia title Wikidata item Retrieved result Contingency
1 Holonomy Q907926 Holx( ∇) = {Pγ ∈ GL(Ex) |γ is a loop based at x} fp
2 Nome Q7048497 f(z) = zT (Mz + q) fn
3 Jordan's lemma Q1816932 CR = R . ei Θ|θ ∈ [0, Π] fp
49 Matching (graph theory) Q1065144 |A \ B| ≤ 2|B \ A| fp
50 Gaussian function Q1054475 f(x)=ae(xb)22c2 tp
51 Reisz mean Q2152569 not available fn
98 Plastic number Q2345603 x3 = x + 1 fp
99 Hyperfocal distance Q253164 H=f2Nc+f tp
100 Coefficient of variation Q623738 Cυ=σμ tp

Contingency matrix of the general formula seeding

Relevant Non-relevant
Retrieved 71 (tp) 17 (fp)
Not retrieved 10 (fn) 2 (tn)

Evaluation of the seeding of geometry formulae from Wikipedia to Wikidata

Number Wikipedia title Property Retrieved result Contingency
1 Antiprism Volume V=n4cos2π2n1sin3π2n12sin2πna3 tp
2 Circle Circumference C = 2πr = πd tp
Area Area = πr2 tp
Area Aellipse = πab tp
48 Law of cosines Areas a2 + b2 = c2 + 2ab cosγ fp
49 Pentagon Area A=12Pr tp
Circumradius (a2 + b2c2)2 ≤ (4A)6
R+r<a+b2 fp

Contingency matrix of the geometry formula seeding

Relevant Non-relevant
Retrieved 52 (tp) 1 (fp)
Not retrieved 12 (fn) 0 (tn)



Available at:


Available at:


Corneli, J. and Schubotz, M. (2017), “math. wikipedia. org: a vision for a collaborative semi-formal, language in- dependent math (s) encyclopedia”, Proc. CAITP.

Flöck, F., Laniado, D., Stadthaus, F. and Acosta, M. (2015), “Towards better visual tools for exploring wikipedia article development-the use case of ‘gamergate controversy’”, Ninth International AAAI Conference on Web and Social Media, pp. 48-55.

French, J.C., Powell, A.L., Gey, F.C. and Perelman, N. (2001), “Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness”, Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management, Atlanta, ACM, 5-10 November 2001, pp. 199-206, doi: 10.1145/502585.502619.

1. G. contributors (2018), Pywikibot: python library to automate work on media wiki sites, available at: (accessed 4 November 2018).

H. O. the Net Foundation (HON) (2018), HON's Question Answering tool, available at:, (accessed 4 November 2018).

Hirschman, L. and Gaizauskas, R.J. (2001), “Natural language question answering: the view from here”, Natural Language Engineering, Vol. 7 No. 4, pp. 275-300, doi: 10.1017/S1351324901002807.

Horridge, M., Tudorache, T., Nyulas, C., Vendetti, J., Noy, N.F. and Musen, M.A. (2014), “WebProtégé: a collaborative web-based platform for editing biomedical ontologies”, Bioinformatics, Vol. 30 No. 16, pp. 2384-2385, doi: 10.1093/bioinformatics/btu256.

Katz, B., Felshin, S. and Barbu, A. (2018), “START natural language question answering system”, available at: (accessed 04 November 2018).

Kohlhase, M. and Sucan, I. (2006), “A search engine for mathematical formulae”, in Calmet, J., Ida, T. and Wang, D. (Eds), Artificial Intelligence and Symbolic Computation, 8th International Conference, AISC 2006, Beijing, 20-22 September, 2006, Proceedings, Vol. 4120, Lecture Notes in Computer Science, Springer, pp. 241-253, doi: 10.1007/11856290_21.

Krötzsch, M., Vrandecic, D., Völkel, M., Haller, H. and Studer, R. (2007), “Semantic wikipedia”, Journal of Web Semantics, Vol. 5 No. 4, pp. 251-261, doi: 10.1016/j.websem. 2007.09.001.

Kwok, C.C.T., Etzioni, O. and Weld, D.S. (2001), “Scaling question answering to the web”, ACM Transactions on Information Systems, Vol. 19 No. 3, pp. 242-262, doi: 10.1145/502115. 502117.

Lee, M., Cimino, J., Zhu, H.R., Sable, C., Shanker, V., Ely, J. and Yu, H. (2006), “Beyond information Retrieval - Medical question answering”, AMIA Annual Symposium Proceedings / AMIA Symposium, pp. 469-473.

Leuf, B. and Cunningham, W. (2001), “The wiki way: quick collaboration on the web

Milne, D.N. and Witten, I.H. (2013), “An open-source toolkit for mining wikipedia”, Artificial Intelligence, Vol. 194, pp. 222-239, doi: 10.1016/j.artint.2012.06.007.

Moskaliuk, J., Kimmerle, J. and Cress, U. (2012), “Collaborative knowledge building with wikis: the impact of redundancy and polarity”, Computers & Education, Vol. 58 No. 4, pp. 1049-1057, doi: 10.1016/j.compedu. 2011.11.024.

Pagel, R. and Schubotz, M. (2014), “Mathematical language processing project”, in England, M., Davenport, J.H., Kohlhase, A., Kohlhase, M., Libbrecht, P., Neuper, W., Quaresma, P., Sexton, A.P., Sojka, P., Urban, J. and Watt, S.M. (Eds), Joint Proceedings of the MathUI, OpenMath and ThEdu Workshops and Work in Progress track at CICM co-located with Conferences on Intelligent Computer Mathematics (CICM 2014), Coimbra, 7-11 July 2014, CEUR Workshop Proceedings, CEUR-, Vol. 1186.

Quoc, M.N., Yokoi, K., Matsubayashi, Y. and Aizawa, A. (2010), “Mining coreference relations between formulas and text using Wikipedia”, Proceedings of the Second Workshop on NLP Challenges in the Information Explosion Era (NLPIX 2010), pp. 69-74.

Radev, D.R., Qi, H., Wu, H. and Fan, W. (2002), “Evaluating web-based question answering systems”, Proceedings of the Third International Conference on Language Resources and Evaluation, LREC 2002, 29-31 May, 2002, European Language Resources Association, Las Palmas, Canary Islands.

S. by Lexistems SAS and E. de Lyon (2018), Ask Platypus, available at: (accessed 04 November 2018).

S. D. Team (2018), SymPy: Python library for symbolic mathematics, available at: (accessed 4 November 2018).

S. S. (Google) (2018), Wikidata: Primary sources tool, available at: (accessed 4 November 2018).

Salton, G. and Buckley, C. (1997), “Improving retrieval performance by relevance feedback”, Readings in Information Retrieval, Vol. 24 No. 5, pp. 355-363.

Schindler, M. and Vrandeccic, D. (2011), “Introducing new features to wikipedia: case studies for web science”, IEEE Intelligent Systems, Vol. 26 No. 1, pp. 56-61, doi: 10.1109/MIS.2011.17.

Schubotz, M., Meuschke, N., Leich, M. and Gipp, B. (2016), “Exploring the one-brain barrier: a manual contribution to the NTCIR-12 MathIR Task”, in Kando, N., Sakai, T. and Sanderson, M. (Eds), Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, 7-10 June, 2016, National Institute of Informatics (NII).

Simmons, R.F. (1965), “Answering English questions by computer: a survey”, Communications of the Acm, Vol. 8 No. 1, pp. 53-70, doi: 10.1145/363707.363732.

T. S. N. L. P. Group (2018), Stanford Parser, available at: (accessed 4 November 2018).

Trollbäck, A. (2018), LaTeX to SymPy parser, available at: (accessed 4 November 2018).

Voorhees, E.M. (2001), “The TREC question answering track”, Natural Language Engineering, Vol. 7 No. 4, pp. 361-378, doi: 10.1017/S1351324901002789.

Vrandecic, D. and Krötzsch, M. (2014), “Wikidata: a free collaborative knowledgebase”, Communications of the Acm, Vol. 57 No. 10, pp. 78-85, doi: 10.1145/2629489.

W. Foundation (2018a), Wikidata/Notes/Requirements, available at: (accessed 4 November 2018).

W. Foundation (2018b), Wikipedia, The Free Encyclopedia (accessed 4 November 2018).

W. user Bene (2018), Ask Wikidata! - Wikimedia Tool Labs, available at: (accessed 4 November 2018).

Yokoi, K., Nghiem, M.-Q., Matsubayashi, Y. and Aizawa, A. (2011), “Contextual analysis of mathematical expressions for advanced mathematical search”, Polibits, Vol. 43, pp. 81-86.

Zhang, D. and Lee, W.S. (2003), “A web-based question an- swering system”.


The authors would like to thank Akiko Aizawa for her advice and for hosting as visiting researchers in her lab at the National Institute of Informatics (NII) in Tokyo. Furthermore, they thank Wikimedia Foundation and Wikimedia Deutschland for providing cloud computing facilities and a research visit. Besides many Wikimedians, Lydia Pintscher and Jonas Kress were a great help in getting started with Wikidata. This work was supported by the FITWeltweit program of the German Academic Exchange Service (DAAD) as well as the German Research Foundation (DFG grant GI-1259-1).ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the US Government. As such, the US Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for government purposes only.

Corresponding author

Moritz Schubotz can be contacted at: