Access issues

Forum » API Developement / General discussion » Access issues

Started by:

mlo1
Date: 21 Jun 2016 13:56
Number of posts: 15

RSS: New posts

Summary:

I've been trying and failing to use Whiffle, but have had some success with ServerProxy. I have some questions!

Unfold All Fold All More Options

Fold

Access issues

mlo1 21 Jun 2016 13:56

I have a fairly large wiki (20K pages) and I want to do a number of things by automation. I'm using Python 3.5 and the PyCharm IDE. (I'm in the c/c++/c# tradition, but I have a working knowledge of Python.)

So far, I've been working with a zipped backup — I have code which will analyze the entire site, resolve all indirect references and provide lists of most wanted pages and most referenced pages, but this approach doesn't let me get at things like tags, so I've started digging into the API. (BTW, I'd be happy to post the code if anyone is interested.)

(I started asking questions in the Community forum and got some useful help, but not enough. But one of the pieces of help was a pointer to this site.)

Problem with Whiffle

I started trying to use Whiffle, but have been entirely unable to get it to work. I fixed a few include issues which seem to be related to Whiffle still using Python 2.7 libraries, but it still simply generates exceptions. (It's possible it's failing on some more subtle inconsistency, of course.) Tracing through the code, it seems to fail on a call to Wikidot using the following URL:

'mlo1:rdSY.......80g@www.wikidot.com'

I got the response back

'<?xml version="1.0" encoding="UTF-8"?>\n<methodResponse><fault><value><struct><member><name>faultCode</name><value><int>406</int></value></member><member><name>faultString</name><value><string>Site does not exist</string></value></member></struct></value></fault></methodResponse>\n'

(The "…" is an elided section of the ID. I checked to make sure it is correct.) As far as I can see, there is nothing there to specify the wiki to be accessed, and — so far, anyway — I can't figure out why.

At any rate, I messed about with Whiffle some more and finally gave up and started looking at using ServerProxy directly and had immediate success.

Using ServerProxy Directly

For example,

s = client.ServerProxy('https://fancyclopedia:rdSY...n80g@www.wikidot.com/xml-rpc-api.php')
onepage=s.pages.get_one({"site": "fancyclopedia", "page": "_default:fapa"})

yielded exactly what I hoped for. But I still ran into problems:

First, a second call to the object "s" (above in the code box) generates an exception ('http.client.RemoteDisconnected'). It appears that I'm missing something here, but what?

Second, how would I get a list of all pages — a call to pages.select returns a list of around 250 pages followed by an error message.

Third, with 20K pages, walking the site as a prelude to a test or analysis is going to be pretty slow, so I'm thinking it may make sense to create a local database of the site and then keep it up-to-date by using the Recent Changes feature to just download changed pages. But how do I get at the Recent Changes list? (I could write code to parse the wiki's page, but that's necessarily rather ugly and fragile code.)

Fourth, is there a way to get content from pages with embedded things like "[[include …]]" and similarly a [[listpages …]]?

Many thanks for any help!

Reply Options

Unfold Access issues by

mlo1, 21 Jun 2016 13:56

Fold

Re: Access issues

leiger 22 Jun 2016 08:41

I have no direct experience with Whiffle - I have used the Wikidot API via Java and PHP only. However, I can help you out with the more general (language-independent) questions.

Second, how would I get a list of all pages — a call to pages.select returns a list of around 250 pages followed by an error message.

My suggestion to extract a list of all pages would be to firstly get a list of categories, then extract a list of pages one category at a time.

Pseudocode:

// Get a list of all pages
pageNames = ()
categories = categories.select('site' => 'mysite')
foreach category in categories
{
    pages = pages.select('site' => 'mysite', 'categories' => category)
    foreach page in pages
    {
        pageNames.addToArray(page)
    }
}

// Extract full page metadata - 10 pages at a time
pageMeta = ()
for (i=0, i<pageNames.size, i+10)
{
    temp = pages.get_meta('site' => 'mysite', 'pages' => (pageNames[i] ... pageNames[i+9]))
    pageMeta.add10ItemsToArray(temp)
}

Note that there is a limit of 240 requests per minute to the API, as per the documentation. So if there is a chance that you're going to go above that, consider adding in deliberate delays between method calls.

how do I get at the Recent Changes list?

Pseudocode:

pages.select('site' => 'mysite', 'order' => 'created_at desc')

Fourth, is there a way to get content from pages with embedded things like "[[include …]]" and similarly a [[listpages …]]?

You can parse the content of pages and include from your local cached version of the database, if you set that up.

Alternatively, note that you can get a generated HTML version of the compiled pages (and forum posts!) from the API as well. I believe (although it needs to be verified) that these contain include/module output.

Pseudocode:

result = pages.get_one(...) OR posts.get(...)
echo result['html']

~ Leiger - Wikidot Community Admin - Volunteer
Wikidot: Official Documentation | Wikidot Discord server | NEW: Wikiroo, backup tool (in development)

Last edited on 22 Jun 2016 08:42 by leiger Show more

Reply Options

Unfold Re: Access issues by

leiger, 22 Jun 2016 08:41

Fold

Re: Access issues

mlo1 22 Jun 2016 12:45

Thank you!

(1) Unless I'm missing something, the categories scheme doesn't work for me. Basically, all but 100 of the 20,000 pages are in the default category, so I still can't get the whole list. I'm currently thinking about getting a starting list of pages using the backup, and then maintaining a simple local database of pages and update dates. I'll keep this up-to-date by running an updater which looks at the recent changes data. (It's a kludge, but given the API's limitations it may be my best bet.)

(2) "pages.select('site' => 'mysite', 'order' => 'created_at desc')" is a clever solution to getting recent changes which makes the above solution possible! Unless I miss my guess, it's still limited by the number of pages which can be returned in one call, so the updater would need to be run on a very regular basis, but that's doable.

(3) You said,
"result = pages.get_one(…) OR posts.get(…)
echo result['html']"

What is "echo"? I don't see it in the API nor in Python documentation.

Again, many thanks!

Last edited on 22 Jun 2016 14:55 by mlo1 Show more

Options

Unfold Re: Access issues by

mlo1, 22 Jun 2016 12:45

Fold

Re: Access issues

TeRq 30 Jun 2016 15:17

Hi,

echo is a printing command in some programming languages. Also I'v checked how many pages your site (fancyclopedia) have and how many api returns.

Pages for your site (taken from database)

wd=> SELECT count(*) FROM page WHERE site_id = 334877;
 count
-------
 20934

pages from api call (using ruby wikidot-api gem)

require "rubygems"
require "base64"
require "wikidot_api"
require "pp"

api_master = WikidotAPI::Client.new "wikidot-api", "api-key-here"

r = api_master.pages.select(
  "site" => "fancyclopedia"
)

pp r.count
# and it returns: 20934

As I'm not a python programmer I cant help you in your python code.

Bartłomiej Bąkowski @ Wikidot Inc.
';.;' TeRq (Write PM)

Options

Unfold Re: Access issues by

TeRq, 30 Jun 2016 15:17

Fold

Re: Access issues

mlo1 02 Jul 2016 14:30

Well, I feel stupid! I tested pretty much the same code you just showed me a couple of weeks ago, and got the message "'Too large to show contents. Max items to show: 300'" Because I'd read of API access limits, I *assumed* that this was an API message, but I just re-ran it and now I see that it's a message form PyCharms, the IDE I'm using. Oops. Yes, the full list is being return, but not displayed.

I'm about to leave for the 4th of July weekend, but I think this can get me started once I return.

Many thanks!

Options

Unfold Re: Access issues by

mlo1, 02 Jul 2016 14:30

Fold

Re: Access issues

leiger 25 Jul 2016 09:39

I just noticed an error in what I gave you - created_at will give you the most recently created pages. For updates to existing pages to be included, use updated_at instead. That returns both creation and edits.

~ Leiger - Wikidot Community Admin - Volunteer
Wikidot: Official Documentation | Wikidot Discord server | NEW: Wikiroo, backup tool (in development)

Reply Options

Unfold Re: Access issues by

leiger, 25 Jul 2016 09:39

Fold

Re: Access issues

mlo1 24 Jul 2016 13:25

OK, I've got things working. The help I got is much appreciated.

I now have a robust Python 3 application which will synchronize an entire wiki with a local copy and keep it synchronized. (The wiki is stored in a directory, with each wiki page represented by <pagename>.txt for the source and <pagename>.xml for the metadata plus, if necessary, a directory named <pagename> containing any attached files.

The synchronizer compares the local copy with the wiki on Wikidot and just downloads the deltas.

Would this be useful? If so, how do I make it available.

Reply Options

Unfold Re: Access issues by

mlo1, 24 Jul 2016 13:25

Fold

Re: Access issues

leiger 25 Jul 2016 09:40

Useful? For sure!

Perhaps put it up on GitHub or something similar?

~ Leiger - Wikidot Community Admin - Volunteer
Wikidot: Official Documentation | Wikidot Discord server | NEW: Wikiroo, backup tool (in development)

Reply Options

Unfold Re: Access issues by

leiger, 25 Jul 2016 09:40

Fold

Re: Access issues

mlo1 26 Jul 2016 13:45

I'm using GitHub (mlo31415/FancyDownloader), but how do I let people using Wikidot know that it exists? There must be a place to post things like this other then in here.

Options

Unfold Re: Access issues by

mlo1, 26 Jul 2016 13:45

Fold

Re: Access issues

Helmut_pdorf 26 Jul 2016 14:37

You can always use the http://community.wikidot.com/app:_start ("Community Apps" ) to add your solution!

Service is my success. My webtips:www.blender.org (Open source), Wikidot-Handbook.

Sie können fragen und mitwirken in der deutschsprachigen » User-Gemeinschaft für WikidotNutzer oder
im deutschen » Wikidot Handbuch ?

Options

Unfold Re: Access issues by

Helmut_pdorf, 26 Jul 2016 14:37

Fold

Re: Access issues

mlo1 27 Jul 2016 14:35

While I am apparently a member of the wikidot community, when I try to add the application, I get

Permission error
Please note: Sorry, you can not create a new page in this category. Only members of this site, site administrators and perhaps selected moderators are allowed to do it.

So it looks like I need some sort of access, but there's no obvious place to ask for it.

Options

Unfold Re: Access issues by

mlo1, 27 Jul 2016 14:35

Fold

Re: Access issues

(account deleted) 27 Jul 2016 17:56

It looks like you're not yet a member of the community site. i just sent you an invite to join. I don't have time to look into what's going on with the application submissions at this time, but this should serve as a workaround for now.

Community Admin

Options

Unfold Re: Access issues by

(account deleted), 27 Jul 2016 17:56

Fold

Re: Access issues

mlo1 27 Jul 2016 19:36

Done — thank you!

Options

Unfold Re: Access issues by

mlo1, 27 Jul 2016 19:36

Fold

Re: Access issues

Helmut_pdorf 28 Jul 2016 06:25

Thanks for this gift to the community !

Well done !

I made a screenshot of the homepage and uploaded it to the "app:_start" page.

Service is my success. My webtips:www.blender.org (Open source), Wikidot-Handbook.

Sie können fragen und mitwirken in der deutschsprachigen » User-Gemeinschaft für WikidotNutzer oder
im deutschen » Wikidot Handbuch ?

Options

Unfold Re: Access issues by

Helmut_pdorf, 28 Jul 2016 06:25

Fold

Re: Access issues

leiger 29 Jul 2016 04:08

That works!

~ Leiger - Wikidot Community Admin - Volunteer
Wikidot: Official Documentation | Wikidot Discord server | NEW: Wikiroo, backup tool (in development)

Reply Options

Unfold Re: Access issues by

leiger, 29 Jul 2016 04:08

New Post

Developer

Site for Wikidot API developers

Problem with Whiffle

Using ServerProxy Directly