Multilingual Systems - CMedPort


The CMedPort was built to provide medical and health information services to both researchers and the general public. It is a prototype to discover whether the integrated techniques can help improve Internet searching and browsing in Chinese search engines. Because users from mainland China, Hong Kong and Taiwan use different forms of Chinese characters (Simplified Chinese and Traditional Chinese), the CMedPort provides two versions of interfaces to address the user’s needs. The CMedPort indexed more than 300,000 medical related pages from mainland China, Hong Kong and Taiwan, using the spidering toolkit “SpidersRUs” developed by AI Lab. It also meta-searches six major search engines from those three regions. Upon searching, the encoding conversion program allows users to search for three regions simultaneously, and see the result list in their familiar form of Chinese characters. When the results are returned, the CMedPort provides summarization and categorization functions to allow post-retrieval analysis. The Chinese summarization is modified from TXTRACTOR, an English summarization developed in AI Lab. It uses cue phrases and tf*idf to select summary sentences from the original document. The categorization extracts key phrases with highest frequency from the title and summary of the returned documents, and uses those phrases as folder topics, thus gives an overview of these documents.


