
This article was originally published in Rest of World, which covers technology’s impact outside the West.
When Amrith Shenava began experimenting with large language models shortly after the launch of ChatGPT, he quickly realized that Tulu – the language he and some 2 million people spoke in the southern Indian state of Karnataka – had virtually no digital data set. He decided to build one.
Shenava, who has a degree in computer science from Kent State University in Ohio, had earlier launched a translation app, and a language learning app for Tulu. To build the data set for the LLM, he had to collect voice and text data from native speakers including teachers, professionals, homemakers, and members of the Tulu diaspora.
“Most AI systems are built in the US. They don’t understand Indian languages or contexts,” Shenava, the 27-year-old founder of TuluAI, told Rest of World. “We need our own models that represent us.”
India has more than 1,600 languages and dialects, but most artificial intelligence systems cater to those that are widely spoken. OpenAI’s ChatGPT supports more than a dozen Indian languages including Hindi, Tamil, and Kannada, the dominant language in Karnataka. Google’s Gemini can chat with users in nine Indian languages.
Spurred by their success, and keen to be a part of the rapid global transition to AI,...
from Scroll.in https://scroll.in/article/1089340/tulu-bodo-kashmiri-startups-are-teaching-ai-models-indian-dialects?utm_source=rss&utm_medium=public https://sc0.blr1.cdn.digitaloceanspaces.com/article/211000-pibcuqursf-1765877192.jpg
via

0 Comments