About the Project

History of the Dictionary

Publication of the Royal Irish Academy’s Dictionary of the Irish Language began in 1913 with the appearance of the first fascicle (D-degóir) under the editorship of Carl Marstrander. The next fascicle (E) did not appear until 1932, and in 1936 the Academy moved to expedite publication under the revised title, Contributions to a Dictionary of the Irish Language. However, work on several other fascicles was then already at an advanced stage and despite the changes in title and format the Contributions closely followed the original plan. Subsequent fascicles appeared at more or less regular intervals, and the Dictionary was completed under the general editorship of E.G. Quin with the publication of H in 1976. In all, the original Dictionary comprises 2,525 pages in 23 fascicles and approximately 35,000 entries.

The Dictionary was digitised under the direction of Professor Gregory Toner at the University of Ulster between 2003 and 2007 with funding from the Arts and Humanities Research Council. This edition, published in 2007, is still available on the website. The text of this edition is identical to that of the Academy’s Dictionary, except that obvious errors were corrected and that the published Additions and Corrections for the letters A-C and F were incorporated. The original format of the Dictionary was preserved in this edition, and the original column and line numbers were retained so as to allow references given in this form to be located in the electronic version.

Two further major grants from the Arts and Humanities Research Council have allowed us to update the material of the Dictionary in line with the most recent scholarship available. An interim edition, incorporating changes to 4000 entries based on a reading of the secondary literature published since 1932 (the date of the second fascicle) was published online in 2013. Work has continued since then on primary sources and is expected to be completed by 2019. As a result of significant changes to entries, future editions will not follow the column and line layout of the original, hard copy edition.

The text of the electronic edition has been tagged in XML according to the guidelines of the Text Encoding Initiative (TEI) for Print Dictionaries. Adherence to TEI guidelines is intended to ensure that the text of the Dictionary will remain accessible for future generations. We have tagged numerous data types so that they can be searched discretely, including headwords, definitions, internal cross-references, grammatical information (case, stem, number, etc.), citations and their translations, references, including title of work and page reference, language of the text, and lemmas. Parts of speech are not routinely given in DIL and these have been added where they can be determined.

We have generally distinguished between definitions of headwords (including sub-senses) and translations of citations given as examples. Users wishing to find the medieval Irish equivalent of an English word can, therefore, search through definitions alone, as this will lead them to the equivalent headword. Searching on the translation will enable the user to consider a wider range of equivalents from among the Dictionary’s many citations. For entries which provide no formal definition of the headword we have selected a word, words or phrase from one or more of the citation translations to stand for a definition.

The aim of the revision of the Dictionary is to provide as much useful and up-to-date information as possible, including the creation of new entries for words that were not known to the original compilers of the Dictionary, and the addition of new and revised definitions, additional or corrected citations, and new or amended grammatical information. The revision of entries is driven by material that we collect and a revision to one part of an entry does not mean that an entire entry has been revised, although we endeavour to ensure that the revisions are consistent with the whole entry. We have not attempted to iron out inconsistencies in the original Dictionary: our aim has been to use the time available to add new information rather than reorder existing material.

We have attempted to follow the editorial policy of the original Dictionary throughout where this could be ascertained. This has a particularly bearing on the treatment of etymologies. DIL generally provides etymologies only where a word is a borrowing from another language (such as Latin or Anglo-Saxon) or where it is derived from another, extant early Irish word (for example, diminutives). We followed this policy throughout, save that we permitted reference to Indo-European etymologies where DIL a) had proposed a borrowing for which an inherited etymology could be argued or b) had itself proposed an Indo-European etymology. While we have not set out to revise DIL as an etymological dictionary, it seemed perverse not to warn readers of errors or draw their attention to alternative proposals.

Changes to headwords are made where the evidence is unequivocal. Where a new headword form is established but the existing headword is also attested, both forms will be presented with the older form being placed first. No attempt is made to impose consistency on the spelling of headwords throughout the Dictionary, however. Where we have added new entries, the headword generally reflects the period of language of the earliest attestation. Where new headwords or subsections of entries are added, they are entered at the appropriate point according to sense or age. Where homonym numbers or numbered subsections already existed we have attempted to avoid renumbering by using superscript numbers/characters to distinguish the new sections. Where no numbers previously existed in DIL, we have introduced numbering starting with 1 for the existing entry: e.g. foich > 1 foich. Where a new entry appears in a sequence of homonyms, we have given it the same homonym number as the preceding entry but with the addition of a letter (e.g. 2b).

Where we have discovered ghost words in the original Dictionary, the headword has been allowed to stand if the word appears in a printed edition and so might be encountered by a user of the Dictionary: in cases such as this, it is simply marked as a ghost. Where words that appear in manuscripts are likely to be erroneous, we have allowed these to be entered into the Dictionary either as a headword and preceded by a question mark where they cannot be assigned to another entry. Where they can be assigned to an entry with some certainty, we have done so, preceding the citation with a query.

Compound words have normally been placed under the relevant section of the first element of the compound. Compounds which are sufficiently well established to be considered as separate words with a distinctive form have been given a separate headword, whereas compounds which are ad hoc or formally identical to an ad hoc compound have been placed under the first element. For example, ardri g.sg. ardrach is given as a headword, whereas ardrí is placed under ard alongside ardescop etc.

We have sought to add citations that give more information on the form, inflection or meaning of the headword or where the citation provides a demonstrably earlier example than those already listed in DIL. Additional citations are interpolated in broadly chronological order (where this can be established) within the relevant section of the entry, although we follow the tendency in DIL to allow glosses which illustrate meaning to take precedence. We have followed normal practice in DIL in redividing words in citations printed as single phrases in the source text, and we have occasionally amended punctuation for the sake of clarity and consistency. We have allowed existing citations from earlier editions to remain where they are substantially correct as revising these would have taken up too much of our time. Where errors of substance occur in citations and translations, however, we have corrected these following the most recent or most authoritative edition.

Translations of citations taken from the source are shown in italic in single inverted commas; our own translations of citations are printed in italic without inverted commas. Translations added to an existing citation in DIL are placed after the citation or after the accompanying bibliographic reference, whichever is neater. The latter strategy is generally pursued where an existing translation in DIL is retained; where the former approach is adopted, the source, if there is one, is given immediately after the existing bibliographic reference.

History of the Project

Digitisation. Funding for the digitisation was provided by the ARTS AND HUMANITIES RESEARCH COUNCIL. This award allowed us to commission outside contractors to capture the text and build the search engine, and to employ two full-time research associates, Dr Maxim Fomin (2003-07) and Dr Tom Torma (2003-05), the latter being succeeded by Dr Grigory Bondarenko (2006-07). The Royal Irish Academy generously gave permission to digitally capture the text of the Dictionary of the Irish Language and copyright of the original text resides with it.

The text of the Dictionary was digitally captured by ARCHIVE QUEST LIMITED, an external agency with expertise in this area. The text was both scanned and triple-keyed, and the output of each of the three typists was simultaneously compared with each other and with the scanned version and any discrepancies flagged for further attention. This method, which is commonly used for capturing legal documents, produced an accuracy rate of 99.992%, that is, less than one error in every 10,000 characters. Many remaining errors, including errors in the original Dictionary, were corrected during the subsequent mark-up stage. During final editing, the text was digitally compared with the original captured text to ensure that no additional errors had unintentionally been introduced during the mark-up phase.

A structural analysis of DIL revealed that typefaces were used in a sufficiently consistent manner as to allow automatic XML tagging of a significant portion of the text. Formatting and structural layout of the hard copy, including fonts, line breaks, and column and line numbers were coded as HTML tags during the capture phase so that these could be used to automate some of the generation of more meaningful XML tags. Bold print is used in the hard copy for headwords and to mark section letters/numbers, so it was a relatively straightforward task to convert the HTML tag for bold to the XML tag for headword once the section markers had been converted.

Similarly, italic is used in the hard copy almost exclusively for definitions, translations and lemmas, so once the definitions had been marked manually, we were able to tag translations and lemmas automatically. A certain degree of manual manipulation was then required, for example, to expand those headwords which are partially contracted in the hard copy. Much of the grammatical information (gender, stem, case, person, number, tense, mood etc.) was automatically tagged at this stage, and again visual inspection and manual correction were often required.

During the second phase of the digitisation project, Old Irish citations and variants of the headword in the body of the entry were manually tagged, along with definitions. Parts of speech, largely absent from the original, were also added at this stage.

In the third phase, translations of citations were tagged as described above and the language of the text was marked where this deviated from the norm (Irish for citations, English for translations, and Latin for lemmas). Tagging of sources and accompanying page references proved to be a more difficult task to automate than might appear at first because of the considerable inconsistency of the abbreviations used in DIL, the bewildering array of possible formats of the page/line references, and the breaking of abbreviation and reference over two lines. Nevertheless, Julianne Nyhan was able to use pattern matching to successfully identify in excess of 95% of references and tag them appropriately. Her program also tagged ‘orphans’, that is, page references for which the source was elided, and linked them to the previous title through a unique identifier. The remaining references and sources were tagged manually. The inconsistency of the abbreviations has been allowed to stand in eDIL but each source has been assigned a standardised abbreviation which is read by the search engine but is not visible to the user.

In the final phase of the digitisation project, outstanding problems were addressed and both the data and the mark-up were systematically checked. Each file was digitally compared to the original files to ensure that no errors had crept in during the mark-up phases. Common mark-up and layout errors were identified by visual inspection and corrected. However, it is certain that some mark-up errors still remain, but they are of limited extent and significance. For example, where two citations from a single source appear close together without an intervening reference, they have sometimes been inadvertently treated as a single citation. Variant or inflected forms of the headword are supposed to be marked where they appear in the grammatical section of an entry, but they have sometimes been overlooked, and occasionally a word other than a variant has been incorrectly tagged as a variant. In the current interface, words marked as variants are extracted from the body of the text and displayed in a list in the left-hand column; users should be aware that these lists may not be complete and may occasionally contain forms that are not forms of the headword. However, we thought it of some use to display the variants in this way. It is hoped that these errors can be corrected in a future edition. The marked variants are also used in searches on Irish words to prioritize the list of search results: hits are sorted in priority order of headwords, marked variants, and finally any other occurrences in citations. Existing errors will have a minimal effect on this search function.

Revision. A five-year project funded by the AHRC was established in 2007 to revise existing DIL entries. Given the limited resources available, a targeted approach to the revision of the Dictionary was adopted. The research team identified relevant articles in major Celtic Studies and linguistics journals through a comprehensive reading programme. Published bibliographies provided a considerable body of references but it was clear from the start that they were incomplete and that only a comprehensive programme of reading would enable us to uncover all the relevant material in the journals. The revisions of the Dictionary from this phase are based primarily on our reading of the major relevant journals for the period 1932 to the present. The start date was chosen as the year when the second fascicle was published. The first fascicle was severely criticised on publication and we have offered revisions based on contemporary reviews but due to the limitations of the scope of the project we did not attempt to cover all journals for the years 1913-1931. Naturally, any later material relevant to the first fascicle was incorporated into the revisions. The work of this project was published as a revised edition (2013), along with a separate Supplement pdf.

The work of revision continues with a further AHRC-funded project (2014-19). This project is focussed on a reading of scholarly editions published since 1932, particularly those dealing with the Old and Middle Irish periods. Data will be excerpted from an extensive range of text during the first three years of the project. In the final two years, entries will be drafted, checked and edited before publication in 2019.

THANKS

We are grateful to the many individuals and organisations who have helped us during the work of this project. The Arts and Humanities Research Council provided the financial support without which the project would not have been possible. We have been ably and generously guided throughout by an advisory panel (listed below). Many other scholars provided advice on their own areas of speciality as required and/or allowed us to consult unpublished work, including Dr Pádraic Moran, Prof. Colm Ó Baoill and Dr Kaarina Hollo. We would like to thank Dr Gerald Manning who assisted with inputting material from PACDIL. Finally, we wish to acknowledge the work of generations of Irish scholars on whose work the current revision is built. Tosach eolais imcomarc do grés.

Gregory Toner

Members of Staff

Queen’s University, Belfast

  • PROFESSOR GREGORY TONER (PI)
  • DR SHARON ARBUTHNOT

University of Cambridge

  • PROFESSOR MÁIRE NÍ MHAONAIGH (CI)
  • DR DAGMAR WODTKO

Advisory Panel

  • PROFESSOR LIAM BREATNACH
  • DR ANTHONY HARVEY
  • PROFESSOR SÉAMUS MAC MATHÚNA
  • PROFESSOR RUAIRÍ Ó HUIGINN
  • PROFESSOR ROIBEARD Ó MAOLALAIGH
  • DR JURGEN UHLICH

Former Members of Staff

  • DR GRIGORY BONDARENKO
  • DR GIUSEPPINA SIRIU
  • DR MAXIM FOMIN
  • DR THOMAS TORMA
  • DR CAOIMHÍN Ó DÓNAILL
  • MISS HILARY LAVELLE

Web Development

  • CHRIS YOCUM
  • GAVIN MITCHELL