Op deze pagina vindt u een aantal problemen waarvoor Suares & Co oplossingen zocht.
Technology
On this page some solutions we found for some uncommon issues.
Creating a hunspell dictionary for use as spellcheck in Open Office
25-10-2008
Cool! OpenOffice uses hunspell as spellchecker. This means that it is possible to create your own wordlist and affix file for your language, if that doesn't exist. Well, for papiamentu - CuraƧao's native language - such dictionary doesn't exist. But I had a hard time finding out how to create such files, so I am documenting it here...
Installing hunspell on Ubuntu 8.04
Of course, you need hunspell installed. I am sure you had that already, and if not, the following might help:
sudo apt-get install hunspell hunspell-tools
The wordlist
A wordlist is just a list of words. It's the base for the dictionary. A wordlist can have 8000 words, or 100.000, or whatever seems reasonable for your language. Papiamentu has about 30.000 words at the moment.
Here's a very simple example wordlist:
love
lover
lovers
beer
beers
office
open
opened
chair
Save that lists as wordlist, just for now. Make sure there's a newline at the end of the file or else that last character will be eaten!
The Dictionary File
A dictionary file is just a wordlist preceded by the number of words. So this will make a dictionary file (sorted, too, and duplicates removed):
How to create the Affix File
An affix file is ehmm... quite complex. It took me a couple of hours to understand that it can just be an empty file... if you have a relatively small wordlist (let's say less then 100.000).
Create an empty affix file:
touch yourlang.aff
In the Right Place
On Ubuntu 8.04, the dictionaries are kept in /usr/share/myspell/dicts/. So copy our language there:
sudo cp yourlang.* /usr/share/myspell/dicts/
Testing the Hunspell
hunspell -d yourlang
This should give you a prompt. Enter a word and you will see a result:
Hunspell 1.1.9
love
*
This means, that the word 'love' is spelled correctly according to your language.
Now try a non-exsiting word:
bove
& bove 1 0: love
This means that 'bove' is not a word, but 'love' comes close.
More Affixion
You got to read the manual on the affix file format. Here is a less comprehendable but more comprehensive one. Just a small example:
SET UTF-8
SFX P Y 1
SFX P 0 s
It says the rule 'P' adds an 's' behind a word without removing any characters. It's a totally bogus rule for the english language, but it'll work fine with our example dictionary.
I also added the SET UTF-8 because in papiamentu, my accents got garbled.
Munch the .dic and the .aff
Munching will apply the affix rules to the dictionary, and produce a smaller dictionary. In fact, it will replace 'lover' with 'lover/P' and remove 'lovers'. One word less. Because the rule will discover these two words and decide that it's more efficient to use 'lover/P'. It'll also remove 'beers' and replace 'beer' with 'beer/P'. Munching will also add the wordcount at the beginning of the .dic file.
Ah! 'love' and 'lover' as expected, and 'lovers' has 'lover' as base.. that's what the '+' means. But remember, for small wordlists, the affix file can be empty!