Please don't ask me to create a dictionary for your language if it isn't already defined, as I cannot make them. I'm only doing this to lend a hand to those who don't know how.
If your language isn't listed below, then either I haven't made an apk for it yet or your language isn't supported.
Hi Bill
I'm struggling since months, trying to build the world's first swissgerman dictionary for ICS/JB LatinImeGoogle.apk from scratch.
I've posted to every single XDA thread about that topic (from which all are outdated and/or nobody's answering).
I've managed to do anything necessary to build:
- Create my own swissgerman text corpus with more than 20'000 words (there is no swissgerman text corpus available)
- Clean that corpus from errors, mistakes, different versions of a similar word etc.
- Generate the weighted dictionary XML file
- Manually added common city names, street names, first- and last names, websites etc.
Now the part where I struggle:
From that XML we need to generate the binary dictionary file "main.dict". For that purpose there is a "makedict.jar" inside the source of older versions of "Softkeyboard" from Google.
But it seems, that from build to build (2.x 4.0, 4.1) the binary dictionary specification is a little different. I have a makedict.jar from 2.x which produces an invalid "main.dict" which can't be used in the 4.0 LatinImeGoogle.apk
I can't find a more recent version of makedict.jar and I tried to build it form the Android 4.1.1 sourcecode, which failed.
Long story short:
I can provide all files, but I would appreciate your help just to generate the main.dict, so you could create an additional LatinIme.apk with Swissgerman, which would be the world's first genuine Swissgerman Keyboard.
Background info about swissgerman:
It's not considered an official language in Switzerland, but the case is EVERYBODY uses swissgerman on smartphones, text messages, talking etc. But in official documents Switzerland uses the German language to write. So 90% of the time Swiss citizens are writing in swissgerman dialect. A problem with it's unofficial status is, there is no official style or correct way to write. Everybody writes just "as it sounds like", so for the same word some people uses different versions. "oktober" which means the month october can be written like "okktobr" "oktobr" "oktoober" etc. So I tried to build a big word-corpus from all my saved conversations which consists 20'000 words. I hope this corpus will mainly contains most used word styles. Another problem is: every part of Switzerland has very strong differencies in their dialect. So my "swissgerman" is very very different from a guy from north or south. It's far more different than british english / us english.
We categorize every dialect of Switzerland to it's geographical occurence. As we have 26 cantons (cantons are similar to States in the US) we have 26 dialects. Every canton has a two letter identification. The canton where the capital city is, is called BERN (BE), another big canton is ZURICH (ZH). So I'm coming from BE, so my dialect is called BE. Someone from ZH has a "ZH dialect". Easy, right?
British and US english are both english versions, so my BE dialect is a version of German.
British english is mostly written like that: EN_GB and us english: EN_US --> [language]_[country]
As we have one layer more we should call a BE dialect dictionary: DE_CH_BE
DE: "Deutsch" is the german word for "german"
CH: "Confederatio Helvetica" is the official name of Switzerland
BE: Is the dialect spoken in the canton of Bern
The main reason why I should not provide one dictionary for all Swiss citizen is clear: someone from ZH (canton of Zurich) would write the same words very different and would hate the automatic correction after 5min)
Process Explained:
The Swissgerman BE dialect XML: DE_CH_BE
Some posts where I tried to get help in the past: modaco |
xda1 |
xda2