lt-proc(1)lt-proc(1)NAMElt-proc - This application is part of the lexical processing modules
and tools ( lttoolbox )
This tool is part of the apertium machine translation architecture:
http://www.apertium.org.
SYNOPSISlt-proc [ -a | -b | -o | -c | -d | -e | -g | -n | -p | -s | -t | -v |
-h -z -w ] fst_file [input_file [output_file]]
lt-proc [ --analysis | --bilingual | --surf-bilingual | --case-sensi‐
tive | --debugged-gen | --decompose-nouns | --generation | --non-
marked-gen | --tagged-gen | --post-generation | --sao | --translitera‐
tion | --null-flush --dictionary-case --decompose-compounds | --version
| --help ] fst_file [input_file [output_file]]
DESCRIPTIONlt-proc is the application responsible for providing the four lexical
processing functionalities
· morphological analyser ( option -a )
· lexical transfer ( option -n )
· morphological generator ( option -g )
· post-generator ( option -p )
It accomplishes these tasks by reading binary files containing a com‐
pact and efficient representation of dictionaries (a class of finite-
state transducers called augmented letter transducers). These files are
generated by lt-comp(1).
It is worth to mention that some characters (`[', `]', `$', `^', `/',
`+') are special chars used for format and encapsulation. They should
be escaped if they have to be used literally, for instance: `['...`]'
are ignored and the format of a linefeed is `^...$'.
OPTIONS-a, --analysis
Tokenizes the text in surface forms (lexical units as they
appear in texts) and delivers, for each surface form, one or
more lexical forms consisting of lemma, lexical category and
morphological inflection information. Tokenization is not
straightforward due to the existence, on the one hand, of con‐
tractions, and, on the other hand, of multi-word lexical units.
For contractions, the system reads in a single surface form and
delivers the corresponding sequence of lexical forms. Multi-word
surface forms are analysed in a left-to-right, longest-match
fashion. Multi-word surface forms may be invariable (such as a
multi-word preposition or conjunction) or inflected (for exam‐
ple, in es, "echaban de menos", "they missed", is a form of the
imperfect indicative tense of the verb "echar de menos", "to
miss"). Limited support for some kinds of discontinuous multi-
word units is also available. Single-word surface forms analysis
produces output like the one in these examples: "cantar" ->
`^cantar/cantar<vblex><inf>$' or `"daba" ->
`^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$'.
-b, --bilingual
Does lexical transference, attaching queues of morphological
symbols not specified in the dictionaries. As the analysis mode,
supports multiple lexical forms in the target language for a
given lexical form in the source language. Works tipically with
the output of apertium-pretransfer.
-o, --surf-bilingual
As with -b, but takes input from apertium-tagger -p , with sur‐
face forms, and if the lexical form is not found in the bilin‐
gual dictionary, it outputs the surface form of the word.
-c, --case-sensitive
Use the literal case of the incoming characters
-d, --debugged-gen
Morph. generation with all the stuff
-e, --decompose-compounds
Try to treat unknown words as compounds, and decompose them.
-w, --dictionary-case
Use the case information contained in the lexicon, instead of
the surface case (only applied in analysis mode).
-g, --generation
Delivers a target-language surface form for each target-language
lexical form, by suitably inflecting it.
-n, --non-marked-gen
Morphological generation (like -g) but without unknown word
marks (asterisk `*').
-b, --tagged-gen
Morphological generation (like -g) but retaining part-of-speech
tags.
-p, --post-generation
Performs orthographical operations such as contractions and
apostrophations. The post-generator is usually dormant (just
copies the input to the output) until a special alarm symbol
contained in some target-language surface forms wakes it up to
perform a particular string transformation if necessary; then it
goes back to sleep.
-s, --sao
Input processing is in orthoepikon (previously `sao') annotation
system format: http://orthoepikon.sf.net.
-t, --transliteration
Apply a transliteration dictionary
-z, --null-flush
Flush output on the null character
-v, --version
Display the version number.
-h, --help
Display this help.
FILES
input_file The input compiled dictionary.
SEE ALSOlt-expand(1), lt-comp(1), apertium-tagger(1), apertium(1).
BUGS
Lots of...lurking in the dark and waiting for you!
AUTHOR
(c) 2005,2006 Universitat d'Alacant / Universidad de Alicante.
2006-03-23 lt-proc(1)