Welcome to palaso.org
Website of the Payap Language Software Development Group
WeSay on OLPC
Author admin | 07.03.2008 | Category WeSay
We have gotten a lot of fixes into Mono recently and have now successfully got WeSay running on our OLPC.
There are still definitely visual issues. I will be checking to see if they are related to the dpi. So far, in the normal operation it seems responsive enough.
Formatting dictionaries with CSS
Author admin | 26.02.2008 | Category Typesetting, Dictionary, Developers
In evaluating CSS as a stylesheet language for formatting dictionaries, I started putting PrinceXML through its paces. I tried what I considered to be
the hardest dictionary layout and while I think I have matched many of the features. The sidenotes are just not going to happen without specialized support for them in CSS. (The closest I could get was a float but of course if you have more than one within a line, they just write on top of each other). That result is here. I then switched to a more typical layout which had no problems at all. That result is here. You can get all the files to reproduce this exercise here.
Types of style
There are really a number items which contribute to the style of a dictionary:- Selection of fields
- Order of fields
- Textual markup - characters or text that is added before, after, or around items to distinguish a field from surrounding text
- Character styles - font changes
- Paragraph styles
- Page layout - columns
CSS3 Selectors
Another interesting behavior of CSS 3 is that you cannot select the first element having a class containing the word ‘pronunciation’:.pronunciation:first-of-type
You can only use the :first-of-type selector to select the first element with a particular name so a general div and span with class attributes would have to be converted to xml named elements instead. There is a way around this, given that our document will be generated from another format and that is to actually add classes first-of-type and last-of-type. Then the data becomes:
<span class="pronunciation
first-of-type">...</span><span
class="pronunciation">...</span><span class="pronunciation
last-of-type">...</span>
<span class="pronunciation first-of-type last-of-type">...</span>
Column-span
The only other problem I ran into was that Prince does not yet support the column-span property. This ended up not being a big problem since I just wanted the heading to span both columns and was able to work around this by making the first page of the section have a 12cm top margin and to float the heading into this space.Configuring where Enchant looks for files
Author admin | 22.02.2008 | Category Spelling, Developers
So far, I have covered how to get started using Enchant and how to set up dictionaries. This post will cover more advanced concepts that let an application developer or a user take more control over Enchant.
Where Enchant looks for providers
Enchant looks for which providers are available when the enchant_broker_init function is called.
Providers can be installed on the machine for all users to use on the system or can be installed for only one user. If Enchant finds a particular provider as a system provider and as a user provider, the user provider is used.
Enchant looks for system providers in the following locations:
- The value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Config\Module_Dir, if any - Otherwise, the value found in the registry at
HKEY_LOCAL_MACHINE\Software\Enchant\Config\Module_Dir, if any - Otherwise, in
%enchant%\lib\enchant, where%enchant%is the location oflibenchant.dll.
The provider location for the user is determined by:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Config\Data_Dir, if there is one. - Otherwise, in
%APPDATA%\enchant, where%APPDATA%is shorthand for theC:\Users\folder (Windows Vista) or the\AppData\Roaming\ C:\Documents and Settings\folder (Windows XP/2000).\Application Data\
How Enchant decides which provider to load for a given language
The provider that is used for a given language is determined by the provider ordering. This can be set programatically by using the enchant_broker_set_ordering function. Enchant initializes the ordering by looking in the enchant.ordering file. There is a system ordering file as well as a user ordering file. A user entry overrides a system entry.
Enchant looks for the system enchant.ordering file in the following locations:
- The value found in the registry at
HKEY_LOCAL_MACHINE\Software\Enchant\Config\Data_Dir, if any - Otherwise, in
%enchant%\share\enchant, where%enchant%is the location oflibenchant.dll.
Enchant looks for the user enchant.ordering file in the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Config\Data_Dir, if there is one. - Otherwise, in
%APPDATA%\enchant, where%APPDATA%is shorthand for theC:\Users\folder (Windows Vista) or the\AppData\Roaming\ C:\Documents and Settings\folder (Windows XP/2000).\Application Data\
If enchant doesn’t find any ordering files and the ordering is not overridden programmatically then the ordering is system dependent (but I think that means they will be ordered alphabetically by filename).
Where Enchant looks for Ispell dictionaries
Enchant looks for user Ispell dictionaries in the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Config\Data_Dir, if there is one. - Otherwise, in
%APPDATA%\enchant\ispell, where%APPDATA%is shorthand for theC:\Users\folder (Windows Vista) or the\AppData\Roaming\ C:\Documents and Settings\folder (Windows XP/2000).\Application Data\
Enchant looks for system Ispell dictionaries in the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Ispell\Data_Dir, if there is one. - Otherwise, using the value found in the registry at
HKEY_LOCAL_MACHINE\Software\Enchant\Ispell\Data_Dir, if there is one. - Otherwise, in
%enchant%\share\enchant\ispell, where%enchant%is the location oflibenchant.dll.
Where Enchant looks for MySpell dictionaries
Enchant looks for user MySpell dictionaries in the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Myspell\Data_Dir, if there is one. - Otherwise, in
%APPDATA%\enchant\myspell, where%APPDATA%is shorthand for theC:\Users\folder (Windows Vista) or the\AppData\Roaming\ C:\Documents and Settings\folder (Windows XP/2000).\Application Data\
Enchant looks for system Ispell dictionaries in the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Myspell\Data_Dir, if there is one. - Otherwise, using the value found in the registry at
HKEY_LOCAL_MACHINE\Software\Enchant\Myspell\Data_Dir, if there is one. - Otherwise, in
%enchant%\share\enchant\myspell, where%enchant%is the location oflibenchant.dll.
Where Enchant looks for the Aspell library
Enchant looks for the aspell-15.dll using the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Aspell\Module, if there is one (this value should include the filename and not just the path). - Otherwise, using the value found in the registry at
HKEY_LOCAL_MACHINE\Software\Enchant\Aspell\Module, if there is one (this value should include the filename and not just the path). - Otherwise, using the value found in the registry at
HKEY_LOCAL_MACHINE\Software\Aspell\Path, if there is one, as the path to findaspell-15.dll(this is set by the Aspell installer for Windows). - Otherwise, in the same directory as
libenchant_aspell.dll. - Otherwise, it uses the normal Windows search strategy, which includes looking in the path.
Setting up dictionaries for Enchant
Author admin | 21.02.2008 | Category Spelling, Developers
In my last post, I gave some tips for getting started with Enchant but you really can’t get anywhere until you have properly configured the providers and installed some dictionaries.
ASpell
The ASpell provider for Enchant requires aspell-15.dll. The easiest way to get started with ASpell is to use the installer for ASpell and for dictionaries.
- Be sure you have the ASpell provider (you can list it with enchant-lsmod) libenchant_aspell.dll
- Download the installer and run it to install ASpell.
- Download a dictionary installer from here and run the installer.
- Verify that it has been installed correctly by running
enchant-lsmod.exe -list-dicts. You should see something like:en_US (aspell)but with the language code for the language you installed instead ofen_US - You can also test it using
enchant -d en_US -a(again using the language code for the language you installed). Then you can type words which are or aren’t in the dictionary and see suggestions when they aren’t.
It is possible to use ASpell by including the aspell-15.dll in the same directory as libenchant_aspell.dll or it can be somewhere in the path. If you install aspell using the Windows installer, it will write a registry entry that points to where it was installed and Enchant will use that to find the dependency.
MySpell/Hunspell (OpenOffice format)
Enchant doesn’t require any additional dependencies other than the MySpell provider for MySpell dictionaries but it does require you to copy the dictionary files to the right place.
- Be sure you have the MySpell provider (you can list it with enchant-lsmod) libenchant_myspell.dll
- Download a dictionary that you want: You can get any of the dictionaries from OpenOffice.org.
- Unzip (or otherwise uncompress the package) and copy the contents into
%APPDATA%\enchant\myspell(you may need to create theenchantandmyspelldirectories the first time).%APPDATA%is shorthand for theC:\Users\folder (Windows Vista) or the\AppData\Roaming\ C:\Documents and Settings\folder (Windows XP/2000). But you can type\Application Data\ %APPDATA%in the explorer’s address bar and it will go to the right place. - Verify that it has been installed correctly by running
enchant-lsmod.exe -list-dicts. You should see something like:en_US (myspell)but with the language code for the language you installed instead ofen_US - You can also test it using
enchant -d en_US -a(again using the language code for the language you installed). Then you can type words which are or aren’t in the dictionary and see suggestions when they aren’t.
Note: if you install MySpell and ASpell dictionaries for the same language, the ASpell dictionaries will be used instead of the MySpell dictionaries (this can be changed but I’ll leave that for another post)
If you are feeling really adventurous and would like to create your own, you can see the directions here.
ISpell
Enchant’s Ispell provider also doesn’t have any dependencies (the dictionaries are read directly by Enchant).
- Be sure you have the ISpell provider (you can list it with enchant-lsmod) libenchant_ispell.dll
- Download a dictionary from here (at the bottom of the page).
- Unzip (or otherwise uncompress the package) and copy the contents into
%APPDATA%\enchant\ispell(you may need to create theenchantandispelldirectories the first time). - Verify that it has been installed correctly by running
enchant-lsmod.exe -list-dicts. You should see something like:en_US (ispell)but with the language code for the language you installed instead ofen_US - You can also test it using
enchant -d en_US -a(again using the language code for the language you installed). Then you can type words which are or aren’t in the dictionary and see suggestions when they aren’t.
Empty dictionaries
An easy way to get spell checking for a language that doesn’t have a dictionary, is to create an empty MySpell dictionary. First, decide on the language code to be used. (You should use the iso639 code or the ietf language tag, for our example we will use qaa, the first of the private use language codes, as the language code). There are two files that are required, the affix file, qaa.aff, and the dictionary file, qaa.dic. They should both be put in %APPDATA%\enchant\myspell.
The qaa.aff file should contain the following line:
SET UTF-8
The qaa.dic file should contain the following line (it’s a zero, the number of items in the dictionary):
0
Of course, you won’t have any items in your empty dictionary so all the words will be marked as misspelled. As you add items to the dictionary using Enchant, the words will be stored in %APPDATA%\enchant\qaa.dic.
Using Enchant in a Windows App: Getting Started
Author admin | 20.02.2008 | Category Spelling, Developers
The following are notes toward getting started with incorporating Enchant into a Windows app.
Enchant is a spell-checking framework that allows you to use many different spell-checking backends, including Aspell, Hunspell, and Ispell.
You can get the source here.Building using MSVC is not difficult once all the dependencies are provided. The full build notes are here.
If you don’t want to bother with building it yourself, you can get binaries here.
libenchant.dll is the main library. It uses backend adapters for the providers: libenchant_aspell.dll, libenchant_ispell.dll, and libenchant_myspell.dll to proxy spell checking requests. (There are others available but if you want others, you will have to build it yourself.) There is also a .Net binding (Enchant.Net.dll) that can sit on top of libenchant.dll. libenchant_aspell.dll only works if you have aspell installed as well. If aspell-15.dll is not in your path, you must specify the dll file location in the registry key: HKCU or HKLM \Software\Enchant\Aspell\Module
By default, the providers (the backend adapters) are put into the subdirectory lib\enchant underneath the location of libenchant.dll
By default, you put dictionaries (like ispell and myspell) into the user’s appdata\enchant\[Provider Name] where [Provider Name] is MySpell or ISpell (But aspell get’s its dictionary from its installation location)
You can check your setup by running enchant-lsmod.exe. It will list the providers it finds and the dictionaries as well.
I’ll add more later.
Custom merging with Mercurial, on Windows
Author John Hatton | 07.02.2008 | Category Developers, WeSay
Today I had one of those 1/2 hour projects that turn into most of the day. I set out to try out mercurial and see how difficult it would be to get it to use a custom program to do a merge, based on the file extension. When the file type is a .lift (which is xml), I want it to run my program that knows about lift files. Simple enough. The web was littered with pages claiming ways to accomlish this… but like so often in open source stuff, a given page is for some past version on some other platform, though you as a newbie have no idea of this, so you spend all day trying things.
I’ll spare you the rest of sob story. Here’s what finally worked:
1) I installed a recent tortoisehg distribution.
2) I installed python 2.5, made sure python was part of my PATH
3) I grabbed the hgmerge.py python script
4) I learned enough python to make the script actually work (must have worked in some previous version of python… surely!)
The part of the script that inserts the values for the file paths had lines like this:
cmd.replace(’$base’, base)
Which I had to change to:
self.cmd = string.replace(self.cmd,’$base’, base)
Also the part that recognizes the “external” word didn’t work, so I rewrote it as:
elif string.find(inter, ‘external’) > -1:
start = string.find(inter, ‘external’)
tool = External()
tool.set_command(inter[9+start:])
return (noninteractive, tool, ext_filter)
5) in the .hg folder which is the mercurial repository, I edited the hgrc file:
[ui] merge = python E:\Users\John\Documents\WeSay\hgTest1\hgmerge.py
6) I edited the E:\Program Files\TortoiseHg\mercurial.ini file:
[hgmerge] ext.lift = external C:\WeSay\lib\LIFT\LIFTDotNet\output\debug\LiftMerge.exe $base $local $other $output
Beating the Disposal Blues
Author John Hatton | 18.01.2008 | Category Developers
This blog is largely a note to my future self, as I know I’ll be here again, and hopefully will google for the right terms to find this entry.
I had just added our standard code to ensure that a control was disposed of properly:
~DetailList()
{
if (!this._disposed)
{
throw new InvalidOperationException("Disposed not explicitly called on " + GetType().FullName + ".");
}
}
Using Resharper’s Unit Test Runner, all my tests passed. I checked it in, and fired up the build machine. The build uses NUNIT console to run its tests; they all passed there, too, but the build failed because this code was throwing this InvalidOperationException all over!
Notice, the key problem here is that this exception is thrown on the finalization thread, at indeterminate times. So ReSharper might not notice it at all, and no test runner really knows which test caused it. Even with Nunit, the message was listed under the wrong test method a couple of times; this is quite understandable; suddenly this exception comes through, how’s the test runner supposed to know that the fault was actually 8 tests back? If you have a thousand or so unit tests, it can be tough to figure out which one forgot to dispose of the object.
I got out of this mess by storing the stack trace, at the time of construction, with the object:
public DetailList()
{
#if DEBUG
_stackAtConstruction = new StackTrace();
#endif
Then, in the finalizer, I just cough that back up:
~DetailList()
{
if (!this._disposed)
{
string trace = "Was not recorded.";
if (_stackAtConstruction != null)
{
trace = _stackAtConstruction.ToString();
}
throw new InvalidOperationException("Disposed not explicitly called on " + GetType().FullName + ". Stack at creation was "+trace);
}
}
Using WeSay from other applications
Author John Hatton | 16.01.2008 | Category WeSay
Recently, we were asked to make a way for a user of a translation program to make use of WeSay, without leaving the program they’ve been trained on. The native speaker-user will want to:
- See which words are missing from the dictionary, and add them along with a definition.
- Jump into the entry screen for a word in WeSay to do more advanced editing.
- Point to a word and see a list of similar words they might choose instead (thesaurus lookup).
A linguist working with the group will want to:
- Click on an unfamiliar word and see the full dictionary article for it.
If the language has any affixation, both users will need to be able to:
- see a list of entries, ordered by how similar their spellings are to the word be investigated.
- find words based on their inflected/derived forms, not just by the citation form in the dictionary.
- add variants to the word so that it is clear that this form is covered by the dictionary, and make it easier to lookup next time.
The first round of this work is now available for other application developers to use (the italicized bullet items above will come in some future version).
To help developers add these features to their programs, I’ve built a little sample application so they can see what’s possible and how to do it. Here’s a little crummy video showing it:
A few technical details for developers
Currently, I’ve implemented support for .net applications to make use of these services. But support via any language, via xml-rpc, should be easy to add when needed. All .net applications need to do is get our Palaso library and use the DictionaryAccessor class. You currently need to tell it where on the user’s machine to find WeSay, and where the dictionary is that you’ll be accessing.
Here’s some code to show what it takes add this ability to a .net application:
Getting some HTML of matching entries to show in a WebBrowser control
DictionaryAccessor dictionary = new DictionaryAccessor("c:docsnoosupunoosupu.lift", "c:program fileswesaywesapp.exe");
string[] forms;
string[] ids;
dictionary.GetMatchingEntries(writingSystemIdForWords, “foobar”,
FindMethods.Exact, out ids, out forms);
string html = dictionary.GetHtmlForEntries(ids);
Adding a new word
dictionary.AddEntry(writingSystemIdForWords, wordBox.Text,
writingSystemIdForDefinitions,definitionBox.Text,
writingSystemIdForWords, exampleBox.Text);
Ok, so you get the idea that this will be a very easy service to add to your .net program.
A plug here for .net 3’s WCF (Windows Communication Framework), which made implementing this a very nice experience. I actually used Mono’s Olive implementation of it, to be sure we can do this on Linux.
Solid 0.8.4.3 Released
Author cambell | 16.01.2008 | Category Solid
A new release of Solid is now available for download.
A couple of minor issues have been fixed relating to inference when more than one parent node is inferred and also navigation using the error report.
Full details of changes made in this release are available on the project site:
Solid 0.8.4.2 Released
Author cambell | 13.12.2007 | Category Solid
A new release of Solid is now available for download.
A major issue relating to opening files has been fixed. Progress bars have also been added to open and export processes.
Full details of changes made in this release are available on the project site:
Categories
- Blogroll (2)
- Chorus (2)
- Developers (19)
- Dictionary (5)
- FLEx (3)
- Linux (4)
- OurWord (3)
- Palaso Library (3)
- Solid (13)
- Spelling (5)
- Typesetting (3)
- Uncategorized (3)
- WeSay (40)
Archives
- March 2011
- January 2011
- June 2010
- February 2010
- January 2010
- August 2009
- June 2009
- May 2009
- April 2009
- March 2009
- January 2009
- August 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
