Jump to content

User:Harej/sandbox

From Wikipedia, the free encyclopedia
sandbox

since 2005

{{Pageset definition
| namespaces     =
| categories     =  
| category-depth = 
| wdq1           =
| petscan1       =
| domain-links1  =
| sql1           =
| links-here1    =
| transclusions1 =
| links-on-page1 =
}}

Notes[edit]

Missing articles[edit]

Citation watchlist script[edit]

https://en.wikipedia.org/w/index.php?title=Capital_punishment_in_the_United_States&diff=prev&oldid=1203024750

https://en.wikipedia.org/w/api.php?action=compare&fromrev=1203018841&torev=1203024750&format=json

<a class="mw-changeslist-diff" href="/w/index.php?title=Zoology&amp;curid=34413&amp;diff=1203018841&amp;oldid=1203024750">diff</a>

This diff adds a new sentence to the article and also adds a new link to a source.

In this one diff these two sources are cited:

Given a watchlist:

  1. Isolate each revision id and previous id from each line in the watchlist
  2. Check every five seconds if there is a revision id / previous id pair that hasn't been checked yet.

Given a pair (or batch of them):

  1. Use the "action=compare" endpoint.
  2. Screen out URLs with a regular expression (joke about now having an additional problem to solve for)
  3. Isolate domain names from URLs
  4. Check those sources against internal representation of RSP (hardcoded in script for now)
  5. If there's a hit, add an indicator next to the diff. (Red Triangle "!" for warn-list, yellow circle "?" for caution-list)

The problems I have with this approach:

  • Each user is doing the lookups and computations themselves, rather than going through a centralized service that does it for them

In the future when we have a centralized service doing this work, because we are doing something more complicated than screens against RSP,

The user script:

  1. Seeks consent to access the external service where data is coming from
  2. Scans each revision ID / prev ID on a watchlist
  3. Submits them to the service in batch
  4. Retrieves data
  5. Adds to HTML based on retrieved data

What about this "service"? If I set up WRDB as an ongoing, self-updated service, then all this service would need to do is check the revision ID in WRDB. At the moment, however, WRDB only supports a one-time build, and domain information is not directly stored in the database. However, this will help with support for non-URL references in the future.

Citation Watchlist testing[edit]

https://dailymail.co.uk

https://avensonline.org

Diff, hist, prev, cur[edit]

extracts URL from link label "type" old revision ID new revision ID
Page history First revision; no subsequent revisions none! new Currently invisible to Test Wikipedia branch
Page history First revision cur new none (curid:) oldid= It was the "curid" when it was new
Page history Subsequent revision prev diff (diff:) extract previous revision ID from oldid= (oldid:) oldid=
Watchlist and Recent Changes First revision hist new none (curid:) curid=
Watchlist and Recent Changes Subsequent revision diff diff (diff:) extract previous revision ID from diff= (oldid:) diff=