The last couple of days, I've started receiving HTTP 429 (Too many requests) responses while trying to send API requests, which is causing a lot of regular operations to fail. I've never had this issue before; has something changed?
We have limited API requests to 240 req/min (it's mentioned here http://www.wikidot.com/doc:api-methods). We have involved limits due to abusive usage of API by scripts which leads to increased latency and harming regular users.
You need to handle 429 or add some wait in your scripts.
Bartłomiej Bąkowski @ Wikidot Inc.
';.;' TeRq (Write PM)
Considering that some of the jobs I am running already take several hours to do, is there any way to get an exclusion made for some accounts? It's very difficult to run jobs on very large sites (such as the SCP Wiki) with this kind of restriction in place.
The site that I help support, Fancy III, occasionally, about once a year, requires that I download all of it's pages to insure that the site and my copy of it are in sync. Normally I use the watch e-mails to keep up to date but, because of the probability of e-mails that occasionally go astray, it must be done. Since the site has 13,242 pages as of two days ago, and is steadily growing, this could be a pain in the butt for me too.
Jack Weaver Fanac Fan History Project
We can surely find a compromise. Describe for me what is yours typical use case? how many requests and which types (select/get/save) are you performing per time unit. How many threads performs those operations?
Bartłomiej Bąkowski @ Wikidot Inc.
';.;' TeRq (Write PM)
Our bot does a check for updates every 15 minutes (though the timer starts after the previous check completes, so it's more like 16.5), which entails doing a large-scale select over the entire site, about half a dozen pages.get_one requests to get the contents of several of our index pages, then pages.get_meta requests for all the pages on the site (based on the initial select) to update the bot's internal database. With about 4500 indexed pages on the site and a limit of ten pages per get request, that ends up being about a total of 460-470 requests and was done within 105 seconds prior to the request limit. This caused HTTP 429s almost every update pass.
In addition, the bot also fields user-triggered requests. Searches based on keyword, tag, or title are done using the internal database, but site links pasted in chat trigger a pages.get_meta request to verify the existence of the page. This is done because the most common use case for pasting said links is because someone has just submitted a new page and it's unlikely to be in the internal database. Users can also request statistical pages to be generated for certain authors, which uses the internal database but triggers a pages.save_one to the support site. The request overhead on these operations is generally negligible.
Twice a month, we also do a deep backup of the site using the API. This is done for two reasons: firstly, because author provenance is important for our community and the default backup scheme doesn't save any metadata at all much less author information, and secondly because we run extensive statistics on the body of work on the site, including text analysis. The portion of this operation that uses the Wikidot API generally took around 2-3 hours prior to the request limit and generates just as many if not more requests over time than the periodic scan due to it pulling not only page contents but comments as well. This is in addition to the continued periodic update scans, which results in spikes of well over double the API traffic that triggers the usual HTTP errors.
There's also a non-API component to the backup since the API isn't robust enough to gather all the data we want and that can take an additional six hours, but that's irrelevant to this discussion.
We have over 13,000 pages and still growing but my usual API use happens only every few days, depending on site activity. and involves downloading those pages that have been added or updated and then uploading less than 30 rebuilt indices. That is obviously not a problem. Once in awhile however I need to download all the pages on the site to check against each one against the copy that I keep and that will be a problem. See Access a site's "Activities" log via API for an alternative…
Jack Weaver Fanac Fan History Project
I'v changed limit to 512 (get/select) or 256 (save). Save operations counts as 2 requests. Hopefully it wont impact our servers latency and will be more suited to your needs.
Bartłomiej Bąkowski @ Wikidot Inc.
';.;' TeRq (Write PM)