with additional specifications/clarifications/corrections.
This document serves 3 purposes:
rtd18_109_1860_11_06.xml
rtd18_109_1860_11_07.xml
rtd18_109_1860_11_09.xml
All files for the University of Richmond digital library repository are prepared as follows:
In working with keyboarding companies, the working assumption is that keyboarders will work from archival quality tif images. While full-color images are preferable in all other instances, for keyboarding projects (i.e., DDD) bi-tonal tiffs are preferred.
images should be consistently 8 bit bi-tonal tiffs, shot at 600 dpi.
jpgs should also be converted from tiffs, at similar resolution.
These entity declarations will look slightly different, as the file naming convention is slightly different. The elements are composed of the following:
| Media Type_ | Author-Title_ | Volume_ | Number | |
| NW_ | RichTimesD_ | 018_ | 110 |
For the issue referenced in the previous example, the file name for the XML file would be:
NW_RichTimesD_018_110.xml
Where “NW” stands for “newspaper,” “RichTimesD” for “Richmond Times-Dispatch,” and the Volume and Number of the issue, each element separated by underscores. Image files will simply have an additional three-digit number at the end, referencing its page number:
NW_RichTimesD_018_110_001.jpg
Therefore, the jpg declarations from the new template look like this:
<!NOTATION jpg SYSTEM "JPEG">
<!ENTITY NW_RichTimesD_018_110_001 SYSTEM "NW_RichTimesD_018_110_001.jpg" NDATA jpg>
<!ENTITY NW_RichTimesD_018_110_002 SYSTEM "NW_RichTimesD_018_110_002.jpg" NDATA jpg>
<!ENTITY NW_RichTimesD_018_110_003 SYSTEM "NW_RichTimesD_018_110_003.jpg" NDATA jpg>
<!ENTITY NW_RichTimesD_018_110_004 SYSTEM "NW_RichTimesD_018_110_004.jpg" NDATA jpg>
All textual data specified must be included in the transcription, and all non-textual data must be included in the mark-up as figures. For the DDD project, content to be keyboarded will be restricted as follows. All content (including advertisements, etc.) already keyboarded by DDD should be brought into conformity with TEI lite and specifications listed here. For page images not yet keyboarded, content should be restricted to all news stories, plus content under the headings (or headings very similar to):
DDD will keyboard the entirety of a given issue of a newspaper, once from a month of the paper's publication, from the Monday of the second week for all months specified for a job.
NOTE: For all other newspaper issues, content division placeholders should be keyboarded without content, in a comment but with a generic entity, as in the following example:
<div1 type="notices">
<head></head>
<div2 type="millinery">
<head>MILLINERY& DRESS-MAKING</head>
<div3 type="ad-blank" n="1">
<head></head><p><!-- &advertid; --></p></div3>
<div3 type="ad-blank" n="2">
<head></head><p><!-- &advertid; --></p></div3>
<div3 type="ad-blank" n="3">
<head></head><p><!-- &advertid; --></p></div3>
</div2>
Follow the basic structural mark-up common to most TEI texts:
<TEI.2>
<teiHeader>
. . . [metadata section supplied by URL to the keyboarding vendor]
</teiHeader>
<text>
<front>
. . . [front matter: masthead etc.]
</front>
<body>
. . . [main body of text]
</body>
<back>
. . . [back matter: fiinal advertisements]
</back>
</text>
</TEI.2>
The TEI Header must be included in all XML files. The TEI Header+Front template for the IMLS Civil War Newspaper Project should be copied and pasted into each document; the only elements needing to be edited are marked in red in the file, and those elements are listed below as well:
NOTE: DDD did not use the TEI Header from the file spec_rtd18_112_1860_11_09.xml, sent 19 June 2004, as instructed. We are providing a new file called "CivilWarHeader+Front.txt" that will be a template for you to copy and paste into all xml files. The assumption is that UR will only be charged for keyboarding changes to this template, not for the characters simply copied and pasted into the file. DDD should account for this in its billing.
NOTE: this file will also include the <front> division, as well as what is, strictly speaking, the TEI Header. The entire file can be copied and pasted into each XML file that constitutes an issue of the newspaper; portions of the header and <front> div must be edited as indicated in the template, including declaration of any foreign language tags used in the file.
NOTE: the Header + Front template includes what could be construed as the top of two different columns in the <front>. Be sure not to repeat this as part of the column in the <body>. In terms of its information, it belongs to the masthead and preliminaries.
All the information found in the image below is included in the <front> division:
<front>
</front>
NOTE: Should the layout or content of the masthead change in future issues, DDD will be responsible to note and reflect those changes in the mark-up.
NOTE: Error in DDD files: incorrect jpg declarations. Two of the files sent by DDD failed to parse. The error was the same in both files, near the top of the TEI Header. The DDD file (rtd18_110_1860_11_07.xml) had the following:
DDD mark-up:
<!NOTATION jpg SYSTEM "JPEG">
<!ENTITY rtd18_110_1860_11_07 SYSTEM "rtd18_110_1860_11_07_01.jpg" NDATA jpg>
<!ENTITY rtd18_110_1860_11_07 SYSTEM "rtd18_110_1860_11_07_02.jpg" NDATA jpg>
<!ENTITY rtd18_110_1860_11_07 SYSTEM "rtd18_110_1860_11_07_03.jpg" NDATA jpg>
<!ENTITY rtd18_110_1860_11_07 SYSTEM "rtd18_110_1860_11_07_04.jpg" NDATA jpg>
Corrected mark-up:
<!NOTATION jpg SYSTEM "JPEG">
<!ENTITY rtd18_110_1860_11_07_01 SYSTEM "rtd18_110_1860_11_07_01.jpg" NDATA jpg>
<!ENTITY rtd18_110_1860_11_07_02 SYSTEM "rtd18_110_1860_11_07_02.jpg" NDATA jpg>
<!ENTITY rtd18_110_1860_11_07_03 SYSTEM "rtd18_110_1860_11_07_03.jpg" NDATA jpg>
<!ENTITY rtd18_110_1860_11_07_04 SYSTEM "rtd18_110_1860_11_07_04.jpg" NDATA jpg>
The information to the right of ENTITY should be identical to that following SYSTEM, with the exception that in the latter case, the information is in quotes, and has a jpg file extension. See the section on the milestone element the errors on referencing the images in the body of the text. See above I-B. File naming conventions for correct, current file-naming conventions.
starting with <div1>: Although TEI Lite allows them, URL does not use the <div> or <div0> elements. Instead, top-level structural divisions must be encoded as <div1>.
divs and <head> tags: All div tags must have associated head tags.
type attribute: Each <div#> tag must have a type attribute indicating the kind of division. If the division has no obvious type, the generic term “section” may be used; if “section” has already been used for a higher-level division, use “subsection.” Newspapers usually have information grouped in such a way that an appropriate designation can be found.
Most actual articles and advertisements will be on div level 3. The highest div level one will be either “news” or “notices.” The latter will include all advertisements, Runaways, Military notices, etc. The former will include “telegraphic news” “news from the north” etc. The “type” designations for level 2 divs should be drawn from the content, but should also be standardized. What appears in the head tag should be what appears in the text (not standardized). If there is not a collective designation for a group of material, it should have empty head tags. For the sake of standardization, all discrete articles and advertisements should be on div level 3. An empty div level 2 should be added if needed. See above for changes to markup of empty ad divs.
<div1 type="news">
<div1 type="notices">
<div2 type="telegraphic">
<div2 type="hats">
<div3 type="article">
<div3 type="advert">
<back>
All final advertisements (at the end of the paper) should be put in the <back> section.
Typical (generic) type values for divisions within <body> are, especially for newspapers are:
n attribute: If the division is numbered or otherwise labeled in the print source, record the number or label in the n attribute
NOTE: For newspaper mark-up, all articles and advertisements are to be numbered, typically, as they appear in order within a div2 or div3.
Although it will be rare, part of the standard of TEI lite is the <foreign> tag. It is used to indicate that a language other than English is witnessed. DDD mark-up from rtd18_110_1860_11_07.xml has:
DDD mark-up:
seen, simultaneously exclaimed, <q>"<hi rend="italic">Le voila, le voila</hi>"</q> and swayed to and fro a few moments, eagerly looking up at Brainerd, who...
Since “Le voila, le voila” is French, it must be tagged to indicate this, "fre" standing for French:
Corrected mark-up:
seen, simultaneously exclaimed, <q>4<foreign lang="fre"><hi rend="italic">Le voila, le voila</hi></foreign>4</q> and swayed to and fro a few moments, eagerly looking up at Brainerd, who...
NOTE: the three-character codes for foreign languages are an ISO 639-2 standard, including:
| fre | French. | |
| ger | German. | |
| grc | Greek, ancient (to 1453). | |
| gre | Greek, modern (1453- ). | |
| heb | Hebrew. | |
| ita | Italian. | |
| lat | Latin. | |
| rus | Russian. | |
| spa | Spanish. |
Greek Use the Unicode characters listed alongside the http://www.oasis-open.org/docbook/specs/wd-docbook-xmlcharent-0.3.html isogrk1 iso-grk1.ent character entities, supplemented as needed by the accented characters in iso-grk2.ent.
Hebrew Use the Hebrew block of Unicode (0590 - 05FF): א for aleph.
Russian Use the Cyrillic block of Unicode (0400 - 04FF).
19th century newspapers in particular make more extensive use of abbreviations, especially for names, than is currently customary, and it is furthermore not standardized. We would like to capture all these abbreviations with the <abbr> tag attributes or values will be added by UR.
E. WARREN, <abbr>Sec'y </abbr>
JULES ROBIN & <abbr>CO,'S </abbr> COGNAC.
John M. Gregory, <abbr>Lieut. </abbr> and acting Governor.
Any character that is currently marked with a # sign to indicate the character was unreadable would instead be marked with the <unclear> element. The “#” sign is not an “industry standard” for unclears! Also, the entire word must be marked, not just the illegible character.
DDD mark-up:
and Shoes. Call at 119 Main street, Richmond, opposite Mitchell & #yler's, and get supplied SIMPSON &
Corrected mark-up:
and Shoes. Call at 119 Main street, Richmond, opposite Mitchell & <unclear>yler's</unclear>, and get supplied SIMPSON &
Do not key in or otherwise indicate the ambiguous character: instead, put <unclear> tags around the entire word. When there is a spelling or typographical error in the newspaper original, which is clearly discernable in the image, the error should be marked with <sic> tags—the entire word should be enclosed by the tags—tags should not normally break up words.
It is more important to have the word intact than to record the information about hyphenation. Therefore, hyphens should be eliminated, and words should not break between hyphens. This is especially true when there is also a tag breaking a word. For example:
<p><hi rend="bold">FOR THE PURPOSE OF FA</hi>CILITATING THE TRADE of this city,
should instead be keyed as:
<p><hi rend="bold">FOR THE PURPOSE OF FACILITATING</hi> THE TRADE of this city,
Similarly, words separated by a forward slash, as in “either/or” should always be separated by spaces, and so keyed instead as “either / or”.
Occasionally, decisions have to be made as to how to mark-up a problematic section. It is industry standard practice to provide notes in a separate file about such decisions when significant decisions have been made in keyboarding. We will expect such files—one for each XML file, but a file is only to be generated where a problem or ambiguity has been encountered. The files should be named along the same convention as the XML file, with “DDDnotes” added and a “.txt” file extension. The notes file for NW_RichTimesD_018_110.xml, would be NW_RichTimesD_018_110_DDD_notes.txt.
In the following article, the headline is: “DEATH OF THE DUKE OF RICHMOND”—despite the fact that it appears in the paragraph of the article:
When a keyboarder needs to make mark-up decisions, information should always be chosen over appearance, i.e., it is more important to know that “DEATH OF THE DUKE OF RICHMOND” is a headline, than it is to know that it originally appeared within the body of the paragraph. People will want to search headlines, and for that to be possible, they must be designated as such.
DDD mark-up:
<div3 type="article" n="3">
<head></head>
<p><hi rend="small caps"><hi rend="bold">Death Of The Duke Of Richmond.</hi></hi>
—The Lord of the Goodwood races is dead. He was the Duke of Richmond, whose.
...
Corrected mark-up:
<div3 type="article" n="3">
<head rend="smallcaps">Death Of The Duke Of Richmond.</head>
<p rend="nobreak">—The Lord of the Goodwood races is dead.
He was the Duke of Richmond, whose
...
Note that the head element can take a rend attribute; also, we will eliminate <hi rend="bold"> for all head tags, as we will assume they are in bold of some kind. The rend attribute with the <p> tag (<p rend="nobreak”>) will indicate that the head does not break from the paragraph. Notice that this will affect many advertisements as well (when those are occasionally tagged).
It is assumed that contents of all <head> tags will be bold, so this should not be keyed in; other formatting such as italics, may be indicated by, i.e., <head rend=”italics”>.
When using the <head> tag, it is very important to discern what element the tag belongs to.
In the XML file rtd18_110_1860_11_07, and the image file rtd18_110_1860_11_07_03.jpg, (NOV 7, 1860), we see a continuation of a <div2 type="auction"> from column 5:
DDD mark-up:
</div3>
<milestone unit="column"n="6"/>
<div3 type="advert"n="30">
<head>AUCTION SALES.</head>
<p><hi rend="italic"><hi rend="bold">FUTURE DAYS.</hi></hi></p>
<p><hi rend="bold">By Goddin & Apperson, Auct's.</hi></p>
<p><hi rend="bold">VALUABLE FARM IN HENRICO, FIVE</hi>MILES ABOVE RICHMOND, SLAVES AND STOCK, FOR SALE.—Having determined to remove to Richmond to reside I will sell at public auction. on the premises, on THURSDAY, the 8th November, 1860, at 11o'clock A M., (if fair; if not on first fair day thereafter,) the Farm on ...
(Error: please also note the typo: no space between “11” and “o’clock”).
(Error: please note the date should be keyed in with the ISO format;corrected version below).
|
It is clear from the image itself (below), however, that the category heading “AUCTION SALES” does not belong to the ad, but is a category supplied by the newspaper to organize its ads. The heading “FUTURE DAYS” also is a sub-category supplied by the newspaper, and similarly does not belong to the ad which follows it. Therefore these items should not be included in the head tag associated with the advertisement's division. NOTE: Whenever a column heading appears at the top of a column that applies to the whole column (or to more than the ad or article immediately below it) a new division must be started, and the ads or articles within it renumbered starting again from 1: |
![]() |
Corrected mark-up:
</div3>
</div2>
<milestone unit="column" n="6"/>
<div2 type="auction">
<head>AUCTION SALES.<lb/><hi rend="italics">FUTURE DAYS.</hi></head>
<div3 type="advert" n="1">
<head>By Goddin &Apperson, Auct's.</head>
<p><hi rend="bold">VALUABLE FARM IN HENRICO, FIVE</hi>MILES ABOVE RICHMOND, SLAVES AND STOCK, FOR SALE.—Having determined to remove to Richmond to reside I will sell at public auction. on the premises, on THURSDAY, the <date value="1860-11-08">8th November, 1860</date>, at 11 o'clock A M., (if fair; if not on first fair day thereafter,) the Farm on ...
NOTE: the ads at the bottom the prior column (5) for cigars, chewing tobacco, lye, etc., do not fall within the category of “auction sales” in any case, so the division should be changed.
NOTE: see also (F) below on associating <head> tags with tables.
For page-breaks as well as column-breaks, the milestone tag is to be used (not the <pb/> tag). Note that some DDD files used the milestone tag for page-breaks and some used <pb/>. Only the milestone tag should be used to indicate page-breaks!
The first milestone tag in each issue (referencing page one) will be in the <front> section, not within <body>. The corresponding page image reference using the <figure entity> tag for the first page will also be in the <front> section. Here is the correct rendering of the <front>:
<text id="NW_RichTimesD_018_110T">
<front>
<milestone unit="page" n="1"/>
<div1 type="page-image">
<head></head>
<p><figure entity="NW_RichTimesD_018_110_001">
<figDesc>Page image of Daily Dispatch, Volume 18, Number 110, page 1.</figDesc>
</figure></p>
</div1>
On all subsequent pages (any page 2, 3, 4 etc.) the milestone unit for the page-break, the milestone unit for the column-break, and the figure entity for the page-image will all be clustered together, as follows:
<milestone unit="page" n="2"/>
<div1 type="page-image">
<head></head>
<p><figure entity="NW_RichTimesD_018_110_002">
<figDesc>Page image of Daily Dispatch, Volume 18, Number 110, page 2.</figDesc>
</figure></p>
</div1>
<milestone unit="column"n="1"/>
Since the milestone unit for the page-break marks the point at which the page starts, the milestone unit indicating the page should come first, followed by a reference to the page-image (which, is not a representation of the page, because it links to the page-image).
NOTE: the entities outside the TEI Header do not take jpg filename extensions!
NOTE: the page number should correlate to the image number, and those to the figDesc!
NOTE: the formatting of this cluster for any page 2, 3, 4 etc. (anything beyond the <front> section) should always be exactly as in the model above, i.e. there should be line breaks after:
Also, the language for the <figDesc> should always follow the format of this example.
Errors:
In the XML file rtd18_110_1860_11_07, the tag was used inconsistently. It correctly referenced the page image for page 2 of the issue near the milestone tag indicating the page-break, but then no page-image was referenced for pages three or four. In the XML file rtd18_112_1860_11_09, the only page-image correctly referenced was the one copied from the <front> section. Some entity references for page-images included the jpg file name extensions, which is incorrect.
When encoding letters, prefaces, and other personal writings, use the following elements:
Openers and closers, in turn, contain one or more of these elements:
<date value="1900-09-22">22d of September, 1900</date>
<date value="09-22">22d of September</date>
Here is an example of standard encoding for a letter:
<div3 type="letter">
<head>TO MRS. H. LINCOLN.</head>
<opener>
<dateline> <name type="place">Weymouth</name>, <date value="1761-10-05">5 October, 1761.</date> </dateline>
<salute>MY DEAR FRIEND, </salute>
</opener>
<p>
<hi rend="smallcaps">Does</hi> not my friend think me a stupid girl, when she has kindly offered to correspond with me, that I should be so senseless as not to accept the offer?
</p><p>
I can say, in the length of this epistle, I've made the golden rule mine. Pray, my friend, do not let it be long before you write to your ever affectionate.
</p>
<closer>
<signed>A. S.</signed>
</closer>
</div3>
For letters cited in their entirety within a division, such as a complete poem cited within an essay, or a letter cited in its entirety within an article, the citation should take its own division:
<div3 type="advert" n="4">
<head rend="bold">Baker's Premium Bitters.</head>
<div4 type="letter">
<opener>
<dateline>Henrico County, <date value="1860-10-01"><hi rend="smallcaps">Oct. 1, 1860</hi></date>.</dateline>E. <hi rend="small caps">Baker,</hi> Esq:
<salute>Dear Sir</salute>
</opener>
<p rend="nobreak">—My wife has been suffering with Dyspepsia and Nervous Affection for several years, during which time she
...
my family physician's, that BAKER'S PREMIUM BITTERS is the best medicine now before the public for the above-mentioned diseases.
</p>
<closer>
<salute>Yours, most truly, </salute>
<signed>P. W. J. <hi rend="smallcaps">Quarles.</hi></signed>
</closer>
</div4>
<div4><p>These Bitters can be had of all the Druggists in this city ... Orders filled promptly by addressing.
</p><p>
E. BAKER, Proprietor,.
</p><p>
oc 30—ts Richmond, Va.
</p>
</div4>
</div3>
Additional elements:
Some of the following elements (i.e., <name type="value">, <dateRange>, and <timeRange> ) are part of the broader TEI and not TEI lite, so their use will cause files not to parse. Consequently, additional elements are to be used sparingly, only where such information is crucial (i.e., for the time-frame of a battle) and never in advertisements (except for standard dates). If ambiguities arise as to coding, in an ad, specific date and address tagging in advertisements may be substituted by generic coding, i.e. simple <p> tags. These elements, or elements with extended attributes, should be used only when critical information is divulged on a serious (news) topic:
<address>
<street>110 Southmoor Road</street>
<name type="city">Oxford</name>
<postCode>OX2 6RB</postCode>
<name type="country">United Kingdom</name>
</address>
The order of elements within an address varies from nation to nation, and in different time periods, and so may appear in any order:
<address>
<name type="org">Università di Bologna</name>
<name type="country">Italy</name>
<postCode>40126</postCode>
<name type="city">Bologna</name>
<street>via Marsala 24</street>
</address>
The following example from the most recent DDD files is basically correct in terms of the use of the <opener>, <dateline>, <name>, <closer> and <signed> tags in an article in which some of these elements appear. Please note, however, two keyboarding errors in this article. Also modified as a correction in the sample is an example of how the <sic> tag should be used.
<div3 type="article" n="14">
<head>
Correspondence of the Richmond Dispatch.<lb/>
<hi rend="italic">Politics—Religious Conference—Crops, &c.</hi>
</head>
<opener>
<dateline><name type="place"><hi rend="smallcaps"><hi rend="bold"> Idle Of Wight Co., Va.,</hi></hi></name>
<date value="1860-11-03"> Nov. 3d, 1860.</date></dateline>
</opener>
<p>
Notwithstanding the extreme inclemency of the weather, there was a large attendance of the sovereigns at the great Breckinridge mass meeting at the Court-House, on the 30th <sic>ult</sic>. Unfortunately, however, the distinguished speakers who had been announced were deterred by adverse provinences from being present—ex-Gov. Wise by an accident on the railroad, and Mr. Leake, we learn, by illness in his family. The only address on the occasion was by Dr. Rives, of Surry.
...
perhaps, an average—in some districts very good; in others, impaired by the drought, &c. More wheat will probably be sown with us than over before.
</p>
<closer>
<signed><hi rend="smallcaps">Rusticus.</hi></signed>
</closer>
</div3>
NOTE—Spec Change: To solve the problem of identifying the source of a citation when it is not formally declared, but rather indicated informally, for example, in a sentence preceding the citation, without adding text to the document where it is not in the original, the <bibl> tag will be used at the end (prior to the closing cit tag) but with the attribute “rend” and value “hidden.” This should allow us to eliminate the enclosed text from displaying, while identifying the source of the citation.
NOTE: The <q> tag normally indicates a block element typographically offset from surrounding text. In order to distinguish between the use of the q tag as representing quotation marks in a normal paragraph, the element should take the attribute “inline”: <q rend="inline">.
The general format for (block) citations will be the following:
<cit><q>
</q><bibl></bibl></cit>
The following example of the </cit>, <q> and <bibl> tags from the recent DDD files is basically correct. Again, this mark-up has also been modified to reflect the changes indicated above concerning headlines occuring within the article paragraph and placement of the <bibl>.
(Note that the images above are part of a single column, which has been split for easier viewing.).
<div1 type="news">
<head>Richmond Dispatch<lb/>
</head>
<div2 type="morning">
<head><date value="1860-11-07">WEDNESDAY MORNING . . . . . . . . . . NOV 7, 1860</date>
</head>
<div3 type="article" n="1">
<head rend="smallcaps">Execution Of A Matricide.</head> <p rend="nobreak">—Ezra Brainard was hung at Three Rivers, C. W., on the 25th <sic>ult.</sic>, for having murdered his mother some months since.
...
A correspondent of the Montreal Gazette sketches the closing scene of his life. He writes:
</p>
<cit><q>
<p>
Shortly before 11 o'clock, a door leading to the convict's cell
...
chest and the agitation of the limbs. A few gasps, and in a few moments Ezra Brainerd had expiated his crime.
</p>
</q><bibl rend="hidden">Montreal Gazette</bibl></cit>
</div3>
NOTE—Spec Change: Since the bibliographic reference will be captured in the <bibl> tag, but masked by the hidden attribute, do not use the bibl tag outside the cit tag.
NOTE: In most cases, a paragraph should not close before entering a citation. In the example above, as no text from the primary speaker occurs after the citation close, the paragraph closes.
NOTE—Spec Change: Since the <q> tag is being used to replace quotation marks, the default for rendering the <q> tag will be none. Off-set block quotes of only one paragraph should use the rend attribute to indicate a break from the paragraph in which it occurs:
<q rend="break">
NOTE—Spec Change: double quotation marks are to be encoded as XML entities (4) which is reflected in the “Corrected mark-up” below:
DDD mark-up: (correct according to previous spec; no longer correct).
<milestone unit="column"n="4"/>
<div3 type="article" n="10">
<head>Interesting Sketches.</head>
<p>
<bibl>The Alexandria Sentinel</bibl> has been publishing some sketches of Virginia history <q>"for children"</q> which are very interesting to grown people. The last of the number contains the following facts about the Old Dominion: <lb/>
<cit><q>The boundaries of Virginia have undergone great changes. The charters
...
he knew not how distant the Pacific Ocean lay from the Atlantic.
</q></cit>
</p>
</div3>
Corrected mark-up:
<milestone unit="column"n="4"/>
<div3 type="article" n="10">
<head>Interesting Sketches.</head>
<p>
The Alexandria Sentinel has been publishing some sketches of Virginia history 4for children4 which are very interesting to grown people. The last of the number contains the following facts about the Old Dominion:
<cit><q>
The boundaries of Virginia have undergone great changes. The charters given
...
he knew not how distant the Pacific Ocean lay from the Atlantic.
</q><bibl rend="hidden">The Alexandria Sentinel</bibl></cit>
</p>
</div3>
Note reference vs. note body: “Note reference” means the anchor point for the annotation, typically indicated with a superscript number or symbol, using the ref element:
<ref target="n8.1">1</ref>
“Note body” means the content of the annotation, contained in the note id, on this model:
<note id="n8.1"rend="foot"><p>1 See <i>Squier's Monograph ... </p></note>
The most common locations for the note body are:
Required attributes: Always include the id attribute, which must contain an ID that is unique within the XML document, and the rend attribute, to indicate its placement on the printed page:
NOTE: When creating IDs for notes, use a simple, human-readable numbering scheme. For notes that are already numbered in the print source, include the number in the ID. For example:
If the notes are numbered sequentially throughout the entire work, use "n1", "n2", etc. (where “n” is short for “note”).
books, of the like kind, have been published in the extensive provinces of Peru, in South America. <ref target="n8.1">1</ref>
<note id="n8.1" rend="foot">1<p>See <hi rend="italics">Squier's Monograph of Central American Authors</hi>, 1861, pp. 70.—<hi rend="italics">M</hi>.
</p><p>
An excellent little volume by the learned and reliable bibliographer, Don Joaquin Garcia Icazbalceta, on the subject of books on the American aboriginal languages has lately appeared.
. . .
</p>
</note>
Most tables will need only the following elements, with the following attributes (indented):
In some cases (see blow) the cell element should also take the attribute "role" and value "label". However, this additional mark-up should only be used when clearly indicated. Note that tables must be contained within <p> tags. The following DDD example was good:
DDD mark-up:
<p><hi rend="bold">WEEKLY PAPERS.</hi></p>
<p><table rows="8" cols="2">
<row><cell>Illustrated London News</cell><cell>3,393,151</cell></row>
<row><cell>News of the World</cell><cell>2,885,000</cell></row>
<row><cell>Weekly Times</cell><cell>1,993,853</cell></row>
<row><cell>Weekly Dispatch</cell><cell>1,052,450</cell></row>
<row><cell>Bell's Life in London</cell><cell>466,500</cell></row>
<row><cell>Bell's Weekly Messenger</cell><cell>#04,000</cell></row>
<row><cell>Record</cell><cell>205,000</cell></row>
<row><cell>Athenæum</cell><cell>81,000</cell></row>
</table></p>
the <table> tag can also take a <head> tag, and in this example, the information put in a simple paragraph really bellows to the table, and should be associated with it as a table header. Our specifications require that all tables must have an associated <head>, even if it is empty.
Corrected mark-up:
<p></p>
<p><table rows="8"cols="2">
<head>WEEKLY PAPERS.</head> <row><cell>Illustrated London News</cell><cell>3,393,151</cell></row>
<row><cell>News of the World</cell><cell>2,885,000</cell></row>
<row><cell>Weekly Times</cell><cell>1,993,853</cell></row>
<row><cell>Weekly Dispatch</cell><cell>1,052,450</cell></row>
<row><cell>Bell's Life in London</cell><cell>466,500</cell></row>
<row><cell>Bell's Weekly Messenger</cell><cell><unclear>404,000</unclear></cell></row>
<row><cell>Record</cell><cell>205,000</cell></row>
<row><cell>Athenæum</cell><cell>81,000</cell></row>
</table></p>
Only in those cases where a clear description of data is being provided should the attribute "role" and value "label" be used. The default value for cells is “data,” so that need never be tagged. In most cases, there will be no need to use the "role" and "label" attributes and values in tables; when in doubt, the keyboarder should refrain from this extended tagging for tables.
<p><table rows="4" cols="6">
<head>IN THE KINGDOM 0F PRUSSIA, AND DUKEDOM OF LITHUANIA.</head><lb/>
<row>
<cell role="label">Annual Average.</cell>
<cell role="label">Births.</cell>
<cell role="label">Burials.</cell>
<cell role="label">Marriages.</cell>
<cell role="label">Proportion<lb/> of Births to <lb/>Marriages.</cell>
<cell role="label">Proportion <lb/>of Births to <lb/>Burials.</cell>
</row><row>
<cell role="label">10 Yrs to 1702</cell><cell>21,963</cell><cell>14,718</cell><cell>5,928</cell><cell>37 to 10</cell><cell>150 to 100</cell>
</row><row>
<cell role="label">5 Yrs to 1716</cell><cell>21,602</cell><cell>11,984</cell><cell>4,968</cell><cell>37 to 10</cell><cell>180 to 100</cell>
</row><row>
<cell role="label">5 Yrs to 1756</cell><cell>28,392</cell><cell>19,154</cell><cell>5,599</cell><cell>50 to 10</cell><cell>148 to 100</cell>
</row>
</table></p>
<list type="ordered">
<head></head>
<item n="1">
1. 4To choose our own governors.4
</item><item n="2">
2. 4To cashier them for misconduct.4
</item><item n="3">
3. 4To frame a government for ourselves.4
</item>
</list>
The following mark-up has the problems of 1) not identifying the list heading, and 2) not identifying the material following the first list, as another list, partly owing to its typographical rendering, i.e., its use of hanging indents. However, that can be accommodated, on the one hand, and on the other, the identification of data is always the priority, meaning identification as list:
DDD mark-up:
<p>The following is a list of the officers and companies who have reported at the Camp:</p>
<p>ROSTER OF FIELD, STAFF AND OFFICERS OF THE</p>
<p>LINE.</p>
<list><item>Col. Sherwin McRac, Commanding.</item>
<item>Maj. J. J. Werth, 1st Major.</item>
<item>Maj. Thos. G. Armstead, 2d Major.</item>
<item>John F. Wren, Adjutant.</item>
<item>Daniel E. Gardner, Quartermaster.</item>
<item>F. W. Hancock, Assistant Surgeon.</item>
<item>Edmund Fontaine, Sergeant Major.</item>
<item>Walter K. Martin, Paymaster.</item>
<item>Miles C. Selden, Assistant Commissary.</item></list>
<p>COMPANIES.</p>
<p>Hanover Troop—Capt. Wms. C. Wickham, Lieut. Wm. B. Newton, Lieut, B. H. Bowlse.</p>
...
<p>Goochland Troop—Capt. Julian Harrison, Lieut. T. P. Hobson, Lieut. Geo. F. Harrison.</p>
<p>Powhatan Troop—Lieut. Com'g John F. Lay, Lieut. Chas. Old, Lieut. T. P. Skipwith.</p>
<p>King William Troop—Capt. Beverly B. Douglass, Lieut. Wm. Gregory, Lieut. W. V. Croxton, Lieut. Thos. Gregory.</p>
<p>Surry Troop—Capt. T. W. Taylor, Lieut. Wm. Allen. (We regret to learn that Lieut. A. was disabled by a kick from a horse on the way to Richmond.)</p>
<p>A few members of the Essex Troop are here. The officers are—Capt. R. S. Cauthorn, Lieut. Aubrey H. Jones,
Lieut. Wm. Oliver.</p>
Error: mis-keyboarding of “McRae”
Error: failure to identify “ROSTER OF FIELD, STAFF AND OFFICERS OF THE LINE.” as header that needs to be associated with the list that follows.
Error: <p> tags are used to indicate paragraphs, not line breaks. In this case, what was marked as a <p> should be identified as the head of the list, but in such cases, whether in a <p> or <head> tag, an <lb/> tag should be used to identify the break.
Correction: according to new specifications (in this doc.) all abbreviated words are to be marked with <abbr>. Initials in a person's name do not need to take the tag.
NOTE: in the DDD sample files, many lists were identified with <list>, others with <list type="simple">. The attribute need only be added if it is an ordered list, but additionally, tags must be applied consistently.
Corrected mark-up:
<p>The following is a list of the officers and companies who have reported at the Camp:</p>
<list>
<head>ROSTER OF FIELD, STAFF AND OFFICERS OF THE<lb/> LINE.</head>
<item><abbr>Col.</abbr> Sherwin McRac, Commanding.
</item><item><abbr>Maj.</abbr> J. J. Werth, 1st Major.
</item><item><abbr>Maj.</abbr> <abbr>Thos.<abbr> G. Armstead, 2d Major.
</item><item>John F. Wren, Adjutant.
</item><item>Daniel E. Gardner, Quartermaster.
</item><item>F. W. Hancock, Assistant Surgeon.
</item><item>Edmund Fontaine, Sergeant Major.
</item><item>Walter K. Martin, Paymaster.
</item><item>Miles C. Selden, Assistant Commissary.
</item>
</list>
<list>
<head>COMPANIES.</head>
<item rend="hang">
Hanover Troop—<abbr>Capt.</abbr> <abbr>Wms.</abbr> C. Wickham, <abbr>Lieut.</abbr> <abbr>Wm.</abbr> B. Newton,
<abbr>Lieut.</abbr> B. H. Bowlse.
...
</item><item rend="hang">
Surry Troop—<abbr>Capt.</abbr> T. W. Taylor, <abbr>Lieut.</abbr> Wm. Allen. (We regret to learn that <abbr>Lieut.</abbr> A. was disabled by a kick from a horse on the way to Richmond.)
</item>
</list>
<p>
A few members of the Essex Troop are here. The officers are—<abbr>Capt.</abbr> R. S. Cauthorn,
<abbr>Lieut.</abbr> Aubrey H. Jones, <abbr>Lieut.</abbr> Wm. Oliver.
</p>
For more formal block text—typically verse or song—the element <lg> should be used instead of <list>. The preference is for opening and closing tags to be grouped at the start of a line:
<p>
<cit><q>
<lg type="stanza">
<l>4Know, first, that heaven, and earth's compacted frame,
</l><l>And flowing waters, and the starry flame,
</l><l>And both the radiant lights, one common soul
</l><l>Inspires and feeds—and animates the whole.
</l><l>This active mind, infused through all the space,
</l><l>Unites and mingles with the mighty mass:
</l><l>Hence, men and beasts the breath of life obtain,
</l><l>And birds of air, and monsters of the main.
</l><l>Th' ethereal vigour is in all the same,
</l><l>And every soul is filled with equal flame.4
</l></lg>
</q><bibl></bibl></cit>
</p>
Note in this case that the quotation marks do not receive additional <q> tags, since those quotes are identical with the passage as a whole being quoted, and in this case, the <q> tags cannot be placed directly next to the quotation marks. Note the <q> tag does not need the
rend attribute, since the default formatting of the <lg> tag will apply.While TEI Lite allows for standard character entities (i.e., & for the ampersand), UR is adopting the Unicode standard for all entities. Punctuation requiring Unicode encoding are:
| double quotation mark | " | 4 | |||||||
| single quotation mark | ' | ' | |||||||
| ampersand | & | 8 | |||||||
| em dash (long dash) | — | — | |||||||
| em space (see below) |   | ||||||||
| at sign | @ | ( | |||||||
| ligatures (example) | æ | æ |
NOTE—Spec Change: single (') and double quotation marks (4) are now to be encoded as XML entities!.
The rationale for this change is to reduce confusion in text manipulation and other programming in searching for quotations, to be able to distinguish between double quotes, in particular, that are used in encoding attribute values in TEI tagging, from in-text characters. It additionally allows a means to distinguish quotation marks associated with the <q> element, those in TEI tagging.
Most common punctuation characters can and should be represented using their normal keyboard characters:
| apostrophe | ' | |||
| exclamation point | ! | |||
| dollar sign | $ | |||
| percent sign | % | |||
| asterisk | * | |||
| opening and closing parentheses | ( ) | |||
| hyphen | - | |||
| opening and closing square brackets | [ ] | |||
| opening and closing braces | { } | |||
| colon | : | |||
| semicolon | ; | |||
| comma | , | |||
| period | . | |||
| solidus (forward slash) | / | |||
| question mark | ? |
An ellipse a series of dots or asterisks indicating deliberately omitted text is indicated by a series of keyboard-character periods. Use the same number of characters.
If the print source contains a long space that needs to be preserved (for example, to indicate a word deliberately omitted by the author), use a series of   (em space) Unicode entities.
For other punctuation, Unicode entities are preferred. Entity codes may be found at here:
XML Character Entities
http://www.oasis-open.org/docbook/specs/wd-docbook-xmlcharent-0.3.html
Conveniently lists the characters in the standard ISO 8879 entity sets, with graphics.
Unicode Code Charts
http://www.unicode.org/charts/
Contains links to PDF code charts of Unicode characters.
Attributes and values of elements should never contain spaces.
NOTE—Spec Change: This is a change in specification for the attribute “small caps”:
correct:
<hi rend="smallcaps">
incorrect:
<hi rend="small caps">
Similarly, made-up values for divisions should never contain two words separated by a space; if necessary, two (short) words may be used, connected by a dash.
Long dashes in headlines: Since these newspapers often use a long dash to distinguish between a headline in an article and the body of the article—while including all in the same paragraph, and UR has specified that headlines should be separated out and identified with head tags, the placement of the long dash should be consistent:long dashes separating headlines from the main text of a division should be placed within the first paragraph, as below:
<head rend="smallcaps">Execution Of A Matricide.</head>
<p rend="nobreak">—Ezra Brainard was hung at Three Rivers, C. W., on the 25th <sic>ult.</sic>, for having murdered his mother some months since.
Spacing between sentences: Use one space character between sentences, not two, regardless of the apparent spacing in the print source.
A major goal of XML was to create files that are human- and machine-readable. Extra returns at significant points would be helpful in off-setting major breaks in the document structure.
Major divisions should be off-set; closing tags for divs should be on a single line, below the prior div., with an extra line breaking the close of the highest div level.
Milestone units should always have one return above and one return below, for milestone units indicating both page and column-breaks:
</div3>
</div2>
<milestone unit="column"n="6"/>
<div2 type="auction">
<head>AUCTION SALES.<lb/><hi rend="italics">FUTURE DAYS.</hi></head>
<div3 type="advert" n="1">
<head>By Goddin &Apperson, Auct's.</head>
<p><hi rend="bold">VALUABLE FARM IN HENRICO, FIVE</hi>MILES ABOVE RICHMOND,
Additionally, formatting should be ideal for manipulation by text processors, whether to transform the text into another text file, for output to a different format, or to draw out select information from a file. Since many processors work by reading line-by-line (i.e., PERL), it is ideal to have certain containers grouped together. Here are some general guidelines:
Less preferable:
<lg type="stanza">
<l>4Know, first, that heaven, and earth's compacted frame,</l>
<l>And flowing waters, and the starry flame,</l>
<l>And both the radiant lights, one common soul</l>
<l>Inspires and feeds—and animates the whole.</l>
<l>This active mind, infused through all the space,</l>
<l>Unites and mingles with the mighty mass:</l>
<l>Hence, men and beasts the breath of life obtain,</l>
<l>And birds of air, and monsters of the main.</l>
<l>Th' ethereal vigour is in all the same,</l>
<l>And every soul is filled with equal flame.4</l>
</lg>
More preferable:
<lg type="stanza">
<l>4Know, first, that heaven, and earth's compacted frame,
</l><l>And flowing waters, and the starry flame,
</l><l>And both the radiant lights, one common soul
</l><l>Inspires and feeds—and animates the whole
</l><l>This active mind, infused through all the space,
</l><l>Unites and mingles with the mighty mass:
</l><l>Hence, men and beasts the breath of life obtain,
</l><l>And birds of air, and monsters of the main
</l><l>Th' ethereal vigour is in all the same,
</l><l>And every soul is filled with equal flame.4
</l></lg>