Am trying to consolidate all the different type of language and interaction tools required to make computer better at understanding and interacting in human languages. I have captured it as a mind map in Xmind . This will be a living document and welcome any suggestions on tools / technologies that I have missed.
Interestingly most of the technologies like nlp, speech recognition and speech translation are still at very nascent stage. But sadly there is hardly any research being done for Indian / Indic languages like Hindi , Tamil etc. some technologies are relevant to only Indic languages for example Symbol translation. Languages like Tamil owing to its ancient nature, have different scripts at different stages of its evolution , namely Brahmi-tamil script, vateluthu (வட்டெழுத்து), modern script. Symbol translators convert a text from ancient script to modern script of the same language.
Interacting with computers in native language takes the technology closer to masses. There are many ways to input in native languages.On Windows platform , Microsoft Input method editors (IME)can be used to type in non-latin languages like Indic or CJK . A newer alternative is Google IMEs, Though it supports only transliteration. On Linux there are different alternatives to type in non-latin languages viz scim, xim, uim etc. SCIM IME was the most popular on Linux until recently. However SCIM is older and has its own disadvantages.So a newer architecture was developed called IBus.
The Intelligent Input Bus (IBus, pronounced as I-Bus) is an input method (IM) framework for multilingual input in Unix-like operating systems. It’s called “Bus” because it has a bus-like architecture.
Latest Linux releases inluding Ubuntu 11.04 come with IBus installed. Am listing down the steps to configure Indic languages like Tamil, Hindi, Kannada on KDE or GNOME desktop on Ubuntu Linux 11.04.
Open a terminal and type the following commands. Alternatively you can select these packages from Synaptic package manager on (K)Ubuntu. Install IBus if it’s not already there.
sudo apt-get install ibus
sudo apt-get install ibus-m17n # this package contains tables for Indic languages)
sudo apt-get install ibus-qt4 #(if you are using KDE desktop)
sudo apt-get install ibus-gtk # (if you are using GNOME desktop)
sudo apt-get install im-config
Now run im-config from command line or using your favorite app launcher. Slelct ibus as the input method.And accept whatever the pop-up dialog says.
Restart the PC and log in to your desktop. You should see a keyboard icon in the taskbar. If not type ‘ibus’ in the terminal and give enter. Now you can add the selected input methods by right clicking on the icon and selecting preferences.
Now press ctrl + space to enable the IME, select the language you want to input and start typing in that language. IBus-m17n supports transliteration for few Indic langugaes like Tamil , Hindi.
Internet will become really friendly to the masses, only when they can use it in their native language. Although there are plenty of non-English content sites on the internet, you need to type an English website address (eg www.yahoo.com) to reach them.
Almost 30 years after the Internet was invented, ICANN(Internet Corporation for Assigned Names and Numbers ) has started giving website addresses (domain names) in non-English languages. Egypt, Russia, Saudi Arabia, and the United Arab Emirates can begin creating online addresses in their native languages.
India should also convince ICANN to issue domain names in Indian languages. This will bring internet closer to our rural population.
Its pretty simple to install support for input Indian languages (Hindi, Tamil, Kanada) on Windows XP.
Open Control Panel –> Regional and Language options. Select the Languages tab. Check the “Install files for Complex scripts and right-to-left languages”. When you press Ok , Windows will ask for the installation media to install the necessary language fonts and few keyboard layouts. Te keyboard layouts installed by this pack is not comprehensive. For example Tamil99 layout widely used by Tamilians is not available in this pack. To install all keyboard layouts for your language follow the steps below.
Now download your Indic IME ( Input method editor) for your language from here and install it.
After that Open your Control Panel –> Regional and Language options. Select the languages tab and click on details button. Here you can add the required keyboard layouts as below. I recommend setting hotkeys for each keyboard layout because the language bar is not reliable and may not appear for all applications.
when a non-English language is selected the language bar appears on the screen ( Somewhere near the bottom left). You can view the virtual keyboard by selecting the menu as shown in the snapshot.
This is the virtual keyboard for Tamil99 layout
Tamil transliteration layout allows you to enter Tamil texts by typing them phonetically in English. for example typing ammaa will enter அம்மா