Make custom queries to the wikipedia api¶
Le toolkit wekeypedia inclut une classe qui permet de passer des requêtes plus fines et adaptées à des recherches d’information spécifiques et peu généralisables. Par exemple, la plupart des classes implémentées gèrent des objets à une échelle individuelle alors que pour des raisons d’optimisation, il est parfois nécessaire d’affiner les requêtes afin d’en réduire leur nombre.
- class api(lang='en')¶
Parameters: lang (string, optional) – - get(query, method='get')¶
Parameters: query (dict) – Returns: result Return type: dict
Examples¶
Here is piece of code that retrieve all links included in the Wisdom page and check if all these links (n=184) have an equivalent in the french wikipedia. It does so by asking for langlinks of 50 pages at once instead of building one query per links. In this case, the network load reduction goes from 184 queries to 4. #win
from __future__ import division
from math import ceil
from collections import defaultdict
import wekeypedia
from wekeypedia.wikipedia.api import api as api
def api_bunch(page_titles, lang, req):
results = defaultdict(list)
param = req
w = api(lang)
for i in range(0,int(ceil(len(page_titles)/50))):
param["titles"] = "|".join(page_titles[i*50:i*50+50-1])
while True:
r = w.get(param)
results.update({ p["title"]: p['langlinks'] for pageid, p in r["query"]["pages"].items() if 'langlinks' in p })
if "continue" in r:
param.update(r["continue"])
else:
break
return results
def get_lang_projection(pages, source, target):
"""
Retrieve all correspondance from a set of pages into another language
Parameters
----------
pages : list
List of page titles
Returns
-------
correspondances : list
List of `(redirect(initial page), corresponding page)`
"""
params = {
"redirects": "",
"format": "json",
"action": "query",
"prop": "info|langlinks",
"lllimit": 500,
"lllang": target,
"continue":""
}
r = api_bunch(pages, source, params)
return [ (page, t["*"]) for page,tt in r.items() for t in tt if t["lang"] == target ]
u = wekeypedia.WikipediaPage("Wisdom")
pages = list(set([ x["title"] for x in u.get_links() ]))
get_lang_projection(pages, "en", "fr")