GoVarnam is a cross-platform transliteration library.



Varnam is an Indian language transliteration library. GoVarnam is a Go port of libvarnam with some core architectural changes. Not every part of libvarnam is ported.

It is stable to use daily as an input method. See it in action here:

An Input Method Engine for Linux operating systems via IBus is available here:


You will need to install GoVarnam library in your system for any app to use Varnam.

  • Download a recent GoVarnam version.
  • Extract the zip file
  • Open a terminal and go to the extracted folder by using this command :
cd Downloads/govarnam
  • Now run this command to install GoVarnam :
sudo ./ install

It will ask for your password, enter it.

  • Installation is finished

To check if installation is successful, try this command :

varnamcli -s ml enthaanu

It should give malayalam output if installation is successful.

  • To make Varnam give better suggestions, you will need to import some words. Download a .vlf (Varnam Learnings File) file from here [TODO LINK].
  • Import it:
varnamcli -s ml -import file.vlf

Now, you may install the IBus engine to use Varnam system wide:


Test it out:

varnamcli -s ml namaskaaram

Learn a word:

varnamcli -s ml -learn കുന്നംകുളം

Train a word with a particular pattern:

varnamcli -s ml -train college കോളേജ്

Learning Words From A File

You can import all language words from any text file. Varnam will separate english words and non-english words and learn accordingly.

varnamcli -s ml -learn-from-file file.html

You can download news articles or Wikipedia pages in HTML format to learn words from them.



This repository have 3 things :

  1. GoVarnam library
  2. GoVarnam Command Line Utility (CLI)
  3. Go bindings for GoVarnam

GoVarnam is written in Go, but to be a standard library that can be used with any other programming languages, we compile it to a C library. This is done by :

go build -buildmode "c-shared" -o

(Shortcut to doing above is make library)

The output is a shared library that can be dynamically linked in any other programming languages. Some examples :

  • Go bindings for GoVarnam: See govarnamgo folder in this repo
  • Java bindings for GoVarnam: IN PROGRESS

Wait, it means we need to write another Go file to interface with GoVarnam library ! This is because we're interfacing with a shared library and not the Go library.

Files & Folders

  • govarnam - The library files
  • main.go, c-shared* - Files that help in making the govarnam a C shared library
  • govarnamgo - Go bindings for the library. For use with other Go projects
  • cli - A CLI tool for varnam. Uses govarnamgo to interface with the library.
  • symbol-frequency-calculator - For populating the weight column in VST files

CLI (Command Line Utility)

The command line utility (CLI) is written in Go, uses govarnamgo to interface with the library.

You need to separately build the CLI:

cd cli

# Show the path to
export LD_LIBRARY_PATH=$(realpath ../):$LD_LIBRARY_PATH

go build -o varnamcli .


This section is straight on getting your hands in. Explanation of how GoVarnam works is at the bottom.

  • Clone of course
  • Do go get
  • You will need a .vst file. Get it from schemes folder in a release. Paste it in schemes folder
  • Do make library to compile

When you make changes to govarnam source code, you will need to do make library for the changes to build on and then test with CLI.

You can run tests (to make sure nothing broke) with :

make test

GoVarnam BTS

Read GoVarnam Spec:

Changes from libvarnam

  • ml.vst has been changed to add a new weight column in symbols table. Get the new ml.vst here. The symbol with the least weight has more significance. This is calculated according to popularity from corpus. You can populate a ml.vst with weight values by a Python script. See that in the subfolder. The previous ruby script is used for making the VST. That is the same. ml.vst from libvarnam is incompatible with govarnam.

  • patterns_content is renamed to patterns in GoVarnam

  • patterns table in learnings DB won't store malayalam patterns. Instead, for each input, all possible malayalam words are calculated (from symbols VARNAM_MATCH_ALL) and searched in words. These are returned as suggestions. Previously, pattern would store every pattern to a word. english => malayalam.

  • patterns in govarnam is used solely for English words. Computer => കമ്പ്യൂട്ടർ. These English words won't work out with our VST tokenizer cause the words are not really transliterable in our language. It would be kambyoottar => Computer


To build without SQLite :

go build -tags libsqlite3 -buildmode=c-shared -o

Release Process

  • git tag
  • make build release

Pack ibus engine:

  • make build-ubuntu18 release
  • Give prirority to greedy tokenized if there are no exact matches

    Give prirority to greedy tokenized if there are no exact matches

    For this dictionary results will have to be separated :

    • ExactMatches
    • PatternDictionaryMatches
    • DictionaryMatches
    • ...

    This order should be :

    • ExactMatches
    • PatternDictionaryMatches show first result of PatternDictionaryMatches if ExactMatches is NULL
    • DictionaryMatches show first result of DictionaryMatches if ExactMatches is NULL
    • GreedyTokenized if ExactMatches is NULL
    • PatternDictionaryMatches show rest of PatternDictionaryMatches if ExactMatches is NULL
    • DictionaryMatches show rest of DictionaryMatches if ExactMatches is NULL
    • PatternDictionaryMoreMatches
    • DictionaryMoreMatches
    • ...

    It's also good to separate DictionaryMatches into DictionaryMatches & DictionaryMoreMatches. Similarly separate PatternDictionaryMatches.

    Usecase: Dictionary will have the word പാവയ്ക്ക. If I type "pavanaayi", the first suggestion will be "പാവനായി" which is wrong. The importance should be to the scheme pattern eh ? or at least show them at the beginning itself and not wayyy down.

    opened by subins2000 2
  • - character in input string is causing an FTS5 error

    - character in input string is causing an FTS5 error


    2022/08/19 06:08:10 fts5: syntax error near "*"
    2022/08/19 06:08:10 fts5: syntax error near "*"
    2022/08/19 06:08:11 fts5: syntax error near "*"

    From varnamd logs :

    [2022-08-19T06:08:10.771881527Z] status: 200, latency_human: 2.561737ms, error: <nil>, user_agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36, uri: /tl/ml/-4i
    [2022-08-19T06:08:10.828157178Z] status: 200, latency_human: 1.249368msm error: <nil>, user_agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36, uri: /tl/ml/-4it
    bug priority 
    opened by subins2000 1
  • Improve search symbol table

    Improve search symbol table

    • Adds varnam_new_search_symbol :

    Symbol table search had a fatal flaw :

    var searchCriteria Symbol
    searchCriteria.Pattern = "a"
    searchCriteria.AcceptCondition = 0

    Varnam omitted a search criteria by seeing if the value is a default struct value of Go. Go's default struct value is 0 which means one can't apply a search criteria of value 0. The solution applied in this PR to this is to set default value -1 by calling func NewSymbol(): Symbol. So now:

    searchCriteria := NewSymbol()
    searchCriteria.Pattern = "a"
    searchCriteria.AcceptCondition = 0
    • Added all varnam configuration vars to varnam_config(). Previously used functions like varnam_set_dictionary_suggestions_limit has been DEPRECATED.
    opened by subins2000 1
  • LearnFromFile didn't process all words from file

    LearnFromFile didn't process all words from file

    There's a bug in learning from frequency report file. If the word place in the file has hidden characters like <0xa0> then GoVarnam mistakenly takes the next number to it as a word. Because of this rest of the words will fail. Here's a sample :

    പേരിൽ 254
    ചെയ്യുന്നു 254
    നിരവധി 254
    പുതിയ 254
    വിവിധ 254
    കേരളത്തിലെ 254
    കേരള 254
    ചെറിയ 254

    Words from വിവിധ wouldn't get learned because of a hidden <0xa0> character in the previous line.

    opened by subins2000 1
  • One Click Install Script

    One Click Install Script

    Currently the install process is a 3-step path :

    • Install GoVarnam (this repo)
    • Install language support needed by user:
    • Install IBus Engine for GoVarnam:

    Make this 3-step process into a single one click installer script. End result :

    curl http://raw.githubuser....../ | bash
    Welcome to Varnam Installer. This installation is a 3-step process.
    Step 1: Install GoVarnam
    Start step 1. ? (yes/NO): yes
    Downloading GoVarnam version 1.5.0...
    <Download progres>
    Installing GoVarnam...
    Step 2: Install your language support
    | as | Assameese |
    | ml | Malayalam |
    Which language would you like to install ? (Separate by comma): ml,as
    <Download progres>
    Installed ml
    Same but installed as.
    Looks like `ml` has words to import. Import words for "ml" ? (yes/no): yes
    <Importing progress>
    Step 3: Install Varnam IBus Engine
    Proceed ? (yes/NO): yes
    <download progress>
    Varnam installation finished.
    Telegram Group: 
    Matrix Group: 
    opened by subins2000 1
  • Dictionary DB goes corrupt after unlearning a word

    Dictionary DB goes corrupt after unlearning a word

    Steps to reproduce:

    1. Import many words
    2. Unlearn a word
    3. Try transliterating a word varnamcli -s ml ennum
    4. "Database image is malformed" error is in output

    On investigation, this bug is because of syncing problem between table & FTS table. The DELETE trigger has a problem.

    Bug discovered thanks to this meme


    opened by subins2000 0
  • Exact words

    Exact words

    Fixes #21

    • Adds a new higher prioirty ExactWords result along with ExactMatches. How it differs:
      • ExactWords - Exactly found words in dictionary if there is any.
      • ExactMatches - Exactly starting word matches in dictionary if there is any. Not applicable for patterns dictionary.
    • Avoid item variable in range loops to save memory copying wherever possible. for i, item := range to for i := range
    opened by subins2000 0
  • Weird exact matches result for a non-existing word

    Weird exact matches result for a non-existing word

    Bug obtained from mwordle. Type "param". API gives {exact_matches: [word: "പരമ്", weight: 253]} but the DB doesn't have the single word പരമ്. There are words like പരമ്പര in DB which is basically പരമ് + പര. Varnam is confusing with it ?

    The ideal output is പരം. It's in the exact_matches list but second with weight 252.

    opened by subins2000 0
  • Add varnam_get_suggestions()

    Add varnam_get_suggestions()

    Add a new functionality varnam_get_suggestions(string word) to get all suggestions from dictionary starting with a particular word. This word won't be English but a language word itself.


    // Gives മലയാളം മലയാളചലച്ചിത്രം മലപ്പുറം മലയാളത്തിൽ

    How is it useful ?

    One useful case is with Inscript engines. An Inscript engine is letter-by-letter, there's no Manglish, so once you make a word it will help in giving suggestions. Currently govarnam-ibus gives Inscript output English first and then user has to pick suggestion to complete the word. This is bad practice as the Inscript English key output will be hard to understand.

    opened by subins2000 0
  • Marathi reverse transliterated sequence not giving correct output

    Marathi reverse transliterated sequence not giving correct output

    Namaskar! 😃

    The reverse transliteration of "प्रयत्न" is shown as p~ryt~n but if the same sequence is typed into the web editor then "प्रयत्न" doesn't get shown in any of the options. In fact all 3 options are the same.

    Why is this happening and how can this be fixed?


    opened by sanketgarade 0
  • Allow symbol removal from VST in VST Maker

    Allow symbol removal from VST in VST Maker

    VST Maker should allow to remove a symbol from VST using a matching condition. In Malayalam scheme:

    anusvara [["m"]] => ["ം","ം","മ"]
    anusvara "m_" =>  ["ം","ം","മ"]
    anusvara({:accept_if => :ends_with}, "m" => ["ം","ം","മ"])
    anusvara({:accept_if => :in_between}, "m" => ["ം","ം","മ"])
    consonants ["ma"] => "മ"

    The CV generation makes m => മ് but there is no use of മ് at the end of a string, anusvara will be used instead. So, need to remove the generated m => മ് and then custom add it:

    anusvara({:accept_if => :starts_with}, "m" => ["മ്"])
    anusvara({:accept_if => :in_between}, "m" => ["മ്])
    opened by subins2000 0
  • Porting from libvarnam Status

    Porting from libvarnam Status

    • [x] Transliteration

    • [x] Reverse Transliteration

    • [x] Learning, Training from CLI

    • [x] Learning, Training from file

    • [x] VST Creation

    • [ ] Stem rules

      This may not be needed cause there's better stemming tools from SMC. Besides, stemrules are only set for Malayalam in varnam.

    • [ ] Using flags column in symbol table (This may not be needed cause govarnam works just fine without using flags)

    opened by subins2000 0
  • Tamil Letter Suggestion

    Tamil Letter Suggestion

    Ok the word is >> சொல்லாமல் , which is (sollamal)

    But in varnam while typing it gives the other il (ள)


    If we type >> il (it should give (இல்) ) which is correct, but if you type ill it gives (இள்), but instead of (இள்) it should give us இல்

    The CAPS ILL does the same.

    opened by josephmiller2000 0
  • v1.9.0(Feb 20, 2022)

    IMPORTANT: Existing users of Varnam should run this command in a terminal :

    varnamcli -s ml -reindex
    • Fixes a serious bug in user dictionary database (database disk image getting corrupt after unlearning a word #24)
    • User dictionary database upgrades will now be automatically done on init
    • Added -reindex option to varnamcli
    Source code(tar.gz)
    Source code(zip) MB)
  • v1.8.0(Feb 5, 2022)

  • v1.7.1(Nov 6, 2021)

    • Ported Scheme -> VST maker to GoVarnam from libvarnam #17
    • New VSTs are made with GoVarnam:
    • Greedy suggestions will be always shown at 2nd position

    There might be some issues not seen in previous VST files made by libvarnam. Please notify if there are any errors.

    Source code(tar.gz)
    Source code(zip) MB)
  • v1.7.0(Oct 30, 2021)

    • Reduced filesize (over 50%)
    • Improved performance, speed (over 500%)
    • Get version with varnamcli -version
    • Suggestions will always show greedy tokenized result as the 2nd result after which all will be dictionary results. The first result would be from dictionary.

    varnam-ibus-engine v1.6.1 release (no significant changes except for filesize reduction):

    Source code(tar.gz)
    Source code(zip) MB)
  • v1.6.0(Sep 25, 2021)

    • #9 New option varnam_set_pattern_dictionary_suggestions_limit for result limit from patterns dictionary
    • #10 GetRecentlyLearnedWords now has pagination
    • #11 Improved Inscript support. Added varnam_get_suggestions() and varnam_transliterate_greedy_tokenized()
    • ZWNJ will not be added after explicit virama with key ~. Use _ for explicit insert of ZWNJ

    To install or upgrade, use the same installer:

    bash <(curl -s
    Source code(tar.gz)
    Source code(zip) MB)
  • v1.5.0(Sep 11, 2021)

    Installation instructions updated. You need to install these together: 1: GoVarnam (this) 2: Language Support: 3: IBus Engine:

    If GoVarnam is updated, YOU WILL have to update IBus engine. By default, GoVarnam has no language support. You will have to install it separate from here:


    • Export file format changed: Importing older exports won't work (for making it work there's a way though)
    • Export now exports as multiple files. Default: 30,000 words per file
    • Import now support multiple file import using wildcards :
    varnamcli -s ml -import "*.vlf"

    The quotes " are important ^

    • Use environment variables VARNAM_VST_DIR and VARNAM_LEARNINGS_DIR to override path
    • Many bugfixes
    • API changes :
      • transliterate() output is now an array (varray)
      • New method transliterateAdvanced (output struct TransliterationResul)
      • Get recently learnt words
      • Result now obtained via pointer like libvarnam
    Source code(tar.gz)
    Source code(zip) MB)
  • v1.4.0(Aug 24, 2021)

    GoVarnam is now stable for production use. It is being used live in

    This version includes these schemes :

    • Assamese
    • Bengali
    • Gujarati
    • Hindi
    • Kannada
    • Malayalam
    • Malayalam Enhanced Inscript
    • Marathi
    • Nepali
    • Odia
    • Punjabi
    • Sanskrit
    • Tamil
    • Telugu

    The download zip has GoVarnam with all the above language support, Varnam IBus Engine for all GNU/Linux systems and a ready-to-import Learnings file for Malayalam.

    Source code(tar.gz)
    Source code(zip) MB)
“Varnam” is an open source, cross platform transliterator for Indian languages
A clean, Markdown-based publishing platform made for writers. Write together, and build a community.

WriteFreely is a clean, minimalist publishing platform made for writers. Start a blog, share knowledge within your organization, or build a community

WriteFreely 3.1k Jan 4, 2023
A general purpose application and library for aligning text.

align A general purpose application that aligns text The focus of this application is to provide a fast, efficient, and useful tool for aligning text.

John Moore 78 Sep 27, 2022
A NMEA parser library in pure Go

go-nmea This is a NMEA library for the Go programming language (Golang). Features Parse individual NMEA 0183 sentences Support for sentences with NMEA

Adrián Moreno 188 Dec 20, 2022
Go library for the TOML language

go-toml Go library for the TOML format. This library supports TOML version v1.0.0-rc.3 Features Go-toml provides the following features for using data

Thomas Pelletier 1.4k Dec 27, 2022
A Go library to parse and format vCard

go-vcard A Go library to parse and format vCard. Usage f, err := os.Open("cards.vcf") if err != nil { log.Fatal(err) } defer f.Close() dec := vcard.

Simon Ser 85 Dec 26, 2022
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

goq Example import ( "log" "net/http" "" ) // Structured representation for github file name table type example struct { Title str

Andrew Stuart 222 Dec 12, 2022
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

JF Technology 532 Jan 4, 2023
The Go library for working with delimited separated value (DSV).

Package dsv is a Go library for working with delimited separated value (DSV). NOTE: This package has been deprecated. See

Shulhan 28 Sep 15, 2021
Upskirt markdown library bindings for Go

Goskirt Package goskirt provides Go-bindings for the excellent Sundown Markdown parser. (F/K/A Upskirt). To use goskirt, create a new Goskirt-value wi

Jukka-Pekka Kekkonen 32 Oct 23, 2022
Golang HTML to plaintext conversion library

html2text Converts HTML into text of the markdown-flavored variety Introduction Ensure your emails are readable by all! Turns HTML into raw text, usef

J. Elliot Taylor 453 Dec 28, 2022

Tideland Go Library Description The Tideland Go Library contains a larger set of useful Google Go packages for different purposes. ATTENTION: The cell

Tideland 194 Nov 15, 2022
Go library to parse and render Remarkable lines files

go-remarkable2pdf Go library to parse and render Remarkable lines files as PDF.

Jay Goel 35 Nov 7, 2022
A modern text indexing library for go

bleve modern text indexing in go - Features Index any go data structure (including JSON) Intelligent defaults backed up by powerful co

bleve 8.8k Jan 4, 2023
Faker is a Go library that generates fake data for you.

Faker is a Go library that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your p

Jonathan Schweder 366 Jan 7, 2023
character-set conversion library implemented in Go

mahonia character-set conversion library implemented in Go. Mahonia is a character-set conversion library implemented in Go. All data is compiled into

axgle 788 Dec 22, 2022
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech

Joseph Kato 3k Jan 4, 2023
golang rss/atom generator library

gorilla/feeds feeds is a web feed generator library for generating RSS, Atom and JSON feeds from Go applications. Goals Provide a simple interface to

Gorilla Web Toolkit 642 Dec 26, 2022
An (almost) compliant XPath 1.0 library.

xsel xsel is a library that (almost) implements the XPath 1.0 specification. The non-compliant bits are: xsel does not implement the id function. The

null 38 Dec 21, 2022
pdf document generation library

gopdf 项目介绍 gopdf 是一个生成 PDF 文档的 Golang 库. 主要有以下的特点: 支持 Unicode 字符 (包括中文, 日语, 朝鲜语, 等等.) 文档内容的自动定位与分页, 减少用户的工作量. 支持图片插入, 支持多种图片格式, PNG, BMP, JPEG, WEBP,

quinn 98 Dec 8, 2022