- News>
- Newspapers
A library sans books: The Hindu
Thiruvananthapuram, Sept 08: It is a library with books in all the Indian languages - from Sanskrit texts and palm leaf manuscripts to more recent scholastic works. It adds 1,50,000 new books each year and is accessible from any part of the country.
Thiruvananthapuram, Sept 08: It is a library with books in all the Indian languages — from Sanskrit texts and palm leaf manuscripts to more recent scholastic works. It adds 1,50,000 new books each year and is accessible from any part of the country.
The Digital Library of India will be inaugurated by the President, A.P.J. Abdul Kalam, in New Delhi on Monday.
N. Balakrishnan of the Indian Institute of Science, who is the project coordinator, said 30,000 books, including ancient palm leaf manuscripts, had been scanned for the library. It has books in English, Sanksrit, Telugu, Urdu, Kannada and Tamil and hopes to have one lakh books online by the year end. The target of one million books will be reached only by 2008.
Scanning books and making them freely available over the Internet is only part of the task, maybe even the easier part of it. "This huge database of Indian language texts is a wonderful test-bed for developing software so that knowledge created in one language is accessible to all," Mr. Balakrishnan said. "We still do not have good optical character recognition (OCR) software for any Indian language." As a result, the photographs of all pages of the books have been put online as image files.
An OCR, on the other hand, is able to read these images, character by character. It paves the way for words and sentences to be recognised. Thus, searching for the text and translation into other languages also becomes possible.
The hope is that users will be able to search the entire collection in any language of their choice. When users pick out the documents they wish to see, those in other languages can be translated immediately. As a result, a Hindi speaker would be able to find material in, say, Kannada and Bengali books, whose existence he may not have known about earlier.
N. Balakrishnan of the Indian Institute of Science, who is the project coordinator, said 30,000 books, including ancient palm leaf manuscripts, had been scanned for the library. It has books in English, Sanksrit, Telugu, Urdu, Kannada and Tamil and hopes to have one lakh books online by the year end. The target of one million books will be reached only by 2008.
Scanning books and making them freely available over the Internet is only part of the task, maybe even the easier part of it. "This huge database of Indian language texts is a wonderful test-bed for developing software so that knowledge created in one language is accessible to all," Mr. Balakrishnan said. "We still do not have good optical character recognition (OCR) software for any Indian language." As a result, the photographs of all pages of the books have been put online as image files.
An OCR, on the other hand, is able to read these images, character by character. It paves the way for words and sentences to be recognised. Thus, searching for the text and translation into other languages also becomes possible.
The hope is that users will be able to search the entire collection in any language of their choice. When users pick out the documents they wish to see, those in other languages can be translated immediately. As a result, a Hindi speaker would be able to find material in, say, Kannada and Bengali books, whose existence he may not have known about earlier.
The Digital Library of India will be part of the Universal Library initiative promoted by Raj Reddy and other scientists at the Carnegie Mellon University (CMU) in the United States. India and China have both joined this programme. The library's web address is www.dli.gov.in.
Bureau Report