Summary LingDoc Clarity

What is Clarity?

Clarity is a set of tools for the analysis of natural language, for the maintenance of lexicons, for the authoring of controlled language and for eventual subsequent translation. Natural languages are described by grammars and lexicons (the lingware).

The system consists of three main modules: Author, Capri and LexBench.

Author assists an author during normal editing tasks with correction and standardization of spelling, terminology, grammar and style. If errors are encountered, suggestions for corrections are given to the author. Author comes in three versions:

o           LEditor, a standalone editor

o           FmEdit, an add-in for Framemaker

o           WoEdit, an add-in for Microsoft Word

Capri is an engine which performs the analysis, correction and translation; it runs in the background of Author or as a standalone program.

Lexbench is a workbench for the creation and maintenance of lexicons and thesauruses.

What tasks are performed with Clarity?

Clarity helps authors to write clearly and consistently, according to a predefined standard, described by lingware.

It offers built-in support for the correction of technical language on the level of terminology, grammar and style.

Also, Clarity can improve the quality of the output of commercial Machine Translation systems. If the language is sufficiently restricted, then Clarity can perform the translation by itself. In that case, the form and meaning of sentences have been restricted to such an extent that they can only have one possible meaning, and strictly apply to the domain for which they were intended.

Authors have to be trained. Some authors may resist to use restricted language (as is the case with authors of XML documents).

Of course, Clarity (or one of its components) can be used for more mundane tasks where subsets of natural language are used, such as the analysis of forms and questions.

What is the difference between Clarity and similar systems?

Commercially, several systems are available with similar functionality. See for instance the “Best Practices Guide” of LISA.

Clarity is open source and is proven technology.

Who are using Clarity?

Clarity is intended to function in the environment of a documentation department where natural language is used in a standardized way. A precondition is that texts are of an informative nature. Examples of such texts are:

o           help texts

o           user manuals

o           maintenance manuals

o           operation instructions.

Clarity has been used, for instance, in the following type of projects at Cap Gemini:

o           Rewriting of Dutch manuals for automatic translation by Capri into German

o           Rewriting of training materials in Dutch for fully automatic translation by Capri into English

o           For the EU: project Docstep I

o           Authoring of AECMA Simplified English

o           A number of pilot projects for customers in the following sectors: finance, banking, software development and engineering.

Why is Clarity interesting?

o           Clarity enables precise descriptions of errors in the input text, with multiple options for correction.

o           Clarity enables organizations to maintain their standards for writing and terminology. Texts become better readable, too, which makes them accessible for a larger audience.

o           Output is guaranteed to be correct, and is therefore suitable for further processing without human intervention.

o           The quality of subsequent commercial automatic translation will be improved. If the language is sufficiently restricted, fully automatic translation can be performed by the system itself.

o          From a technical viewpoint, the software of Clarity can also be used as a working example of an addin for MsWord. All types of operations are performed on the internal buffers of MsWord, together with the extensibility of menu items and dialogue boxes. Through the use of Add-in Express the addin is secured to be compatible with future versions of MsWord.

o           Clarity is open source.

History

The development of Clarity started in 1990 at Vleermuis Software Research in Utrecht, Netherlands, by Gert van der Steen. The system had to be created from scratch in order to avoid claims of property rights. Therefore, the engine of LingDoc Transform (at that time called Parspat) could not be used. The project was taken over in 1992 by Cap Volmac which later became Cap Gemini. Names of their research organizations were PSD (Professional Service Development) and ATS (Advanced Technology Services). The project group itself took several names, such as Lingware Services and Active Documentation. The main technical developer was Gert van der Steen, who left in 1998 after the completion of Capri. Linguistic services have been developed by Bert-Jan Dijenborg, Michiel de Koning and Pim van der Eijk. Many people contributed as authors of software, of lingware and of documents which had to be rewritten. Their names are listed at the website.

In 2000 the project ended. The system was stable and Cap Gemini concluded that it was time to sell it to third parties, because the development and maintainance of lingware does not belong to their core business. However, during a major internal move of the company at a time that the project was temporarily unmanaged, all materials got lost.

Gert van der Steen was able to reconstruct most of Clarity according to the status of 1998.

In 2007 Cap Gemini granted the right to publish the reconstructed Clarity as open source to Gert van der Steen.

Platforms

Author:

o      WoEdit, the add-in for Microsoft Word, is written in Ansi C. It works for MsWord 2007 and 2010 (32 bit) under Windows7 (32 and 64 bit).

o      LEditor, the standalone editor, is written in Borland Pascal. It is no longer maintained but the source code is available.

o      FmEdit, the add-in for Framemaker, is written in Ansi C. It is no longer maintained but the source code is available.

Capri is written in Ansi C and is maintained in Ms Visual Studio 2010

Lexbench is written in Foxpro and is maintained in Ms Foxpro 9.

More information can be found in the position papers.