Jump to content

Wikisource:WikiProject DNB/Progress

From Wikisource

This project page is for reference on general cleanup of the pagespace bot postings of DNB volumes.

Progress and troubleshooting table

[edit]
  1. Sandbox1.djvu [264 pages]
  2. Sandbox2.djvu [264 pages]
  3. Sandbox3.djvu [264 pages]
Vol. Index AL? % text[1] Best scan[2] Offset[3] Glitches[4] Comment[5]
1 index 100 [1] (poor) 14 Replaced with djvu from Google Books PDF. Good condition. Like the scan it replaced, without errata.
2 index 100 [2] (good) 12 New djvu, 100 dpi grayscale. Stray note stuck to one of the pages gone, but recorded. (may have had copyrighted material) No errata. Index restored
3 index 100 [3] (good) 6 New djvu file, while readable throughout, contains isolated blurred text. March 25, 3rd djvu version introduced replacing previous one; assumed all correct. 2nd and 3rd versions both have errata introduced into the text apparently by making text on such pages a little smaller so there is no shifting forward of the text onto the next page. Text and images aligned. Done All text images present.
4 index 100 [4] (good) 4 Better scan, but now index gone.
5 index 100 [5] (good) 8 This 2011 text file includes 1904 Errata corrections, but needs index pages inserted into final six djvu pages of this file. Listing. Replace text. Done Without terminating indices
6 index 100 [6] (good) 12 Deleted bot applied pages, text and image align, so undeletions possible if poor image scan Better quality version uploaded. Listing.
7 index 100 [7] (good) 6 30 DEC scan requires index pages - currently available on some pages from previous file. Text replaced. Done Needs index pgs.
8 index 100 [8] (good) 4 Replaced with best scan.
9 index 100 [9] (poor) 6
10 index 100 [10] (fair) 8 Needs new text. Current text is best available (30 Jan 11). Will keep looking… All 15 problematic pages are identified on the index page. The text has been refreshed, but will require an alternative source to proofread and validate pending locating better source. May 1, 2014:Smudged text replaced with Palo Alto scans. Text refreshed for all problematic pages.
11 index 100 [11] (good) 6 Text replaced with the good version identified. Most red pages deleted, though may be some that were not meant to be deleted.
12 index Text complete 100 [12] (good) 6 Text images are reasonably good prior to page 368 (with a few that may have blurred sections). All replacement pages >367 are marked and have had text refreshed. Better text needed.
13 index Text complete 100 [13] (good) 6 Better text needed. Found. (1-31-11) April 7, 2013: As a caution, neither of the scans contains errata, but someone added errata to a page I was proofreading. Replace text. Done
14 index Text complete 100 [14] (good) 6 Better text needed.
15 index 100 [15] (good) 6
16 index Text complete 100 [16] (good) 7
17 index 100 [17] (OK) 6
18 index Text complete 100 [18] (poor) 6
19 index 100 [19] (good) 6 Done to this recommended volume, keep same pagination
20 index 24 [20] (good) 6 There is duplication after this to 27; two pages missing after djvu.216; two pages missing after djvu.321; one missing after djvu.345. Better text found (all pages). (1-31-11) Will upload once pages 95-97 are validated with existing images. Adding templates. Replaced djvu file. Done
21 index Text complete 7 [21] (good) 6 One or two pages are missing after each of djvu.231, 343, 377, 382, 386, 389, 392, 395, 396, 407, 414, 417, 427. This is a weird image, apparently mixing two pages. Better text found (all pages). (1-31-11) Text replaced. Done
22 index Text complete 11 [22] (good) 6 The following five pages are torn and have missing characters adjacent to the column edge: pp. 51, 88-90, & 143. Text replaced. Done
23 index 15 [23] (good)in place 8 Better text needed. DoneAwaiting validation.
24 index [24] (good), metadata says wrong volume 14 Done to best quality volume
25 index Text complete 98 [25] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
26 index Text complete 98 [26] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
27 index Text complete 98 [27] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
28 index 4 [28] (good) 6 All text images replaced. Text replaced. Done Index pages awaiting validation.
29 index Text complete 100 [29] (good, but breaks off p.279) 6 DPF vol. 29 452 pages and complete index. vol 29 djvu few pages missing or blurred images.
30 index Text complete 99 [30] (poor) 6 Two pages no longer missing after djvu/33 (09-21-2012).
31 index Text complete 99 [31] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in early 2013. Google Books PDF (used as base to replace previously flawed DjVu source file).
32 index Text complete 98 [32] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
33 index Text complete 25 [33] (OK) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in early 2013. Google Books PDF (used as base to replace previously flawed DjVu source file).
34 index Text complete 100 [34] (poor) 6
35 index Text complete 99 [35] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
36 index Text complete 21 [36] (good) 4 Two pages missing after djvu/255. Page missing after djvu/392.
37 index Text complete 100 [37] (good) 14
38 index Text complete 19 [38] (good) 6 Text layer present; complete pages.
39 index Text complete 13 [39] (good) 6 P.275-6 missing characters where ripped, see this; pp. 301, 369, & 373 text images are crowded on one margin; missing characters. New text needed. Done
40 index Text complete 74 [40] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in early 2013. But the new version incorporates the errata, while the previous version did not. Google Books PDF (used as base to replace previously flawed DjVu source file).
41 index 17 [41] (good) 6 Duplicate pair: djvu/94 and 95 duplicate the two previous pages. Better text found (all pages). (1-31-11) AI vol 41 All pages are present, text near one margin on pages 175-78 will be challenging. Replaced File Done
42 index Text complete 98 [42] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
43 index Text complete 100 [43] (good) 6 Poor scans throughout the work, mark problematic. Replaced text with new AI source file. Done.
44 index 15 [44] (good) 12 12 "workable" problematic pages remain and are identified. Replace text. Done
45 index Text complete 9 [45] (good) 8 OCR layer is best scan. Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
46 index Text complete 12 [46] (good) 6 In later half of book identified Problematic scans, may be more in first half. Page 176 is quite blank. Pages 408 and 409 duplicate 406 and 407. Text layer present, realigned latter pages; complete pages. All bios transferred to Page: and converted to <pages>
47 index 100 [47] (good) 6 Numerous pages marked as problematic. Candidate for replacement Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
48 index 11 [48] (good) 6 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
49 index Text complete 11 [49] (good) 6 Numerous problematic pages; replace file. Updated source file. Done
50 index Text complete 12 [50] (good) 12 (new scan 20091125) Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
51 index Text complete 6 [51] (OK) 8 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
52 index 5 [52] (poor) 10 Numbers of illegible pages identified in 2nd half, presumably similar in first half. Available copy poor, rescue job performed. Will need later review when work done to determine further needs. All bios transferred to Page: and converted to <pages>
53 index Text complete 98 [53] (good) 6 All previous issues listed here have been resolved as the DjVu source file was replaced in late 2012. Google Books PDF (used as base to replace previously flawed DjVu source file).
54 index Text complete 7 [54] (good) 7 some scans may be indistinct
55 index Text complete 13 [55] (good) 6 replace file to fix text misalignment.
56 index Text complete 8 [56] (good)

found to be missing pages 165-172.

6 Generated a new djvu version for Commons that is a mix of both files. Version based on Good version with inserts from other available source
All bios transferred to Page: and converted to <pages>
57 index Text complete 15 [57] (good) 6 Existing text pages may need to be replaced (<20091107)
All bios transferred to Page: and converted to <pages>
Scan replaced note.
58 index 9 [58] (good) 8 Better text found (all pages). (1-31-11) Uploaded new volume which was intact. Previously proofread text aligns with recently added text images. Volume replaced with complete version. Done Completed page alignment.
59 index Text complete 2 [59] (good) 6 Text layer present; complete pages. Listing. Text layer is not best scan. All bios transferred to Page: and converted to <pages>
60 index Text complete 3 [60] (good) 6 DjVu source file replaced. Two pages missing after this now present; Pages 88 and 89 duplicated pages 86 and 87 - two djvu images that should have be there instead are no longer missing (08-30-2012). Text layer present; Better text was found for all pages (1-31-11); All bios transferred to Page: and converted to <pages>.
61 index Text complete 100 [61] (good) 6 Templates added, complete pages. All bios transferred to Page: and converted to <pages>
62 index Text complete 14 [62] (OK) 6 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
63 index 100 [63] (good) 24 Better text needed; complete pages, though p. xviii needs rescan. All bios transferred to Page: and converted to <pages>

Notes

[edit]
  1. Percentage text added. Apart from vol. 1, text completed generally ranges between 2% and 4%. Text verified and marked is currently negligible. One point is that the project page suggests ligatures should be added; the æ ligature is very common in the DNB. Another point that needs to be visited is the use of strike-through text to show a "diff" for a later edition.
  2. The code here is "good" for the Toronto scans, "poor" for the Google scans, and "OK" for the Hyderabad scans that are of intermediate quality. This is the rule-of-thumb only: in some cases the Toronto scan for a page may be so corrupt that another scan works better.
  3. The offset is the difference of the djvu file number and the page number in the volume. This ought to be consistent throughout the volume: if it currently is known not to be, the entry is "n/a" and the next column gives details.
  4. The bot-generated initial postings have imperfections, to be noted here.
  5. Points include: the best scan may not have been used by the bot ("better text needed"); progress in formatting at least all the author templates.