Implementations in Python and PHP

MLphone

GitHub page Download ZIP

MLphone is a phonetic algorithm for indexing Malayalam words by their pronunciation, like Metaphone for English. The algorithm generates three Romanized phonetic keys (hashes) of varying phonetic affinities for a given Malayalam word.

The algorithm takes into account the context sensitivity of sounds, syntactic and phonetic gemination, compounding, modifiers, and other known exceptions to produce Romanized phonetic hashes of increasing phonetic affinity that are very faithful to the pronunciation of the original Malayalam word.

MLphone was created to aid spelling tolerant Malayalam word search, but may be useful in tasks like spell checking, word suggestion etc.


Enter a Malayalam word


Examples

Word key0 key1 key2 Transliteration Metaphone
നീലക്കുയില്‍ NLKYL NLKYL N4LK25Y4L Neelakkuyil‍ NLKYL
മൃഗം MRK3 MRK3 MRK3 Mrugam MRKM
മ്രിഗം MRK3 MRK3 MRK3 Mrigam MRKM
ഉത്സവം U0SV3 U0SV3 U0SV3 Uthsavam U0SFM
ഉല്‍സവം U0SV3 U0SV3 U0SV3 Ul‍savam ULSFM
വാഹനം VHN3 VHN3 VHN3 Vaahanam FHNM
വിഹനനം VHNN3 VHNN3 V4HNN3 Vihananam FHNNM
രാഷ്ട്രീയം RSTRY3 RS1TRY3 RS1TR4Y3 Raashtreeyam RXTRYM
കണ്ണകി KNK KNK KN2K4 Kannaki KNK
കന്യക KNYK KNYK KNYK Kanyaka KNYK
മനം MN3 MN3 MN3 Manam MNM
മണം MN3 MN13 MN13 Manam MNM
വിഭക്ത്യാഭാസം VBK0YBS3 VBK0YBS3 V4BK0YBS3 Vibhakthyaabhaasam FBHK0YBHSM
വലയം VLY3 VLY3 VLY3 Valayam FLYM
വളയം VLY3 VL1Y3 VL1Y3 Valayam FLYM
രഥം R03 R03 R03 Ratham R0M
രദം R03 R03 R03 Radam RTM
രത്തം R03 R03 R03 Rattham RTM
രധം R03 R03 R03 Radham RTHM

Usage

The algorithm's available in Python 3 and PHP.

Python

from mlphone import MLphone

converter = MLphone()
keys = converter.compute(ml_str)

PHP

<?php
	require 'mlphone.php';

	$keys = MLphone::compute($ml_str);
?>

Background

Refer to the Substitution map.

Consider the word നീലക്കുയില്‍:

1. Discard all non-Malayalam characters
2. Group modified entitites
	2.1 Group compounds from the compounds table along with their modifiers 	നീല{ക്കു}യില്‍
	2.2 Group non-compounds along with their modifiers 				{നീ}ല{ക്കു}{യി}ല്‍
3. Group unmodified entities
	3.1 Group compounds from the compounds table 					{നീ}{ല}{ക്കു}{യി}{ല്‍}
	3.2 Group non-compounds								{നീ}{ല}{ക്കു}{യി}{ല്‍}
4. Substitute individual modified and un-modified entity groups with corresponding keys {Nീ}{L}{K2ു}{Yി}{L്}
5. Substitute the modifiers in the groups with numeric modifier keys to get key2 	N4LK25Y4L
6. Remove numeric modifiers 2, and 4-9 from key2 to obtain key1 			NLKYL
7. Remove numeric modifiers 1, 2, 4-9 from key2 to obtain key0				NLKYL

Substitution map

Vowels

A A I I U U R E E AI O O O

Consonants

K K K K NG C C J J NJ T T T
T N1 0 0 0 0 N P F B B M Y
R L V S1 S1 S H L1 Z R1

Chills

L L1 N1 N R1 ൿK

Compounds

ക്കK2 ഗ്ഗK ങ്ങNG ച്ചC2 ജ്ജJ ഞ്ഞNJ ട്ടT2 ണ്ണN2 ത്ത0 ദ്ദD ദ്ധD ന്നNN ന്തN0 ങ്കNK ണ്ടN1T ബ്ബB
പ്പP2 മ്മM2 യ്യY ല്ലL2 വ്വV ശ്ശS1 സ്സS ള്ളL12 ഞ്ചNC ക്ഷKS1 മ്പMP റ്റT ന്റNT ന്തN0 ്രിR ്രുR

Modifiers

R 3 ി4 4 5 5 6 6 7 8 8 9 9
Kailash Nadh, November 2012.