CBizPort (Chinese Business Intelligence Portal) is an Internet search portal for Chinese business information. It searches for business information in major Chinese search engines and business portals in mainland China, Taiwan and Hong Kong.
CBizPort is powered by a set of meta-search engines that integrate several high-quality online information resources. It enables encoding conversion between Simplified Chinese and Traditional Chinese to support cross-regional search. Post-retrieval analysis functions, including summarization and categorization are also provided.
Major components in the CBizPort:
User Interface: CBizPort has two versions of interface, Simplified Chinese and Traditional Chinese version interfaces. Each is designed for users of the corresponding languages. Both versions have the same look and feel and each version uses its respective character encoding when processing queries.
Encoding Converter: The encoding converter relies on a conversion dictionary with 6,737 Chinese characters in each of the two encodings (Big5 and GB2312). The dictionary includes the most commonly used characters in the Chinese language. Encoding conversion is performed when the portal sends out queries to other search engines having different encoding than its own or when the portal collects results from those search engines.
Meta Search: Authoritative information sources are selected for meta-searching, which include major Chinese search engines and business-related portals from the three regions. General search engines include Baidu, Yahoo Hong Kong and Yam. Business-related portals include several commercial and government Web sites.
Categorization: The CBizPort categorizer organizes the documents retrieved from the meta-searching into different categories based on the occurrence of keywords extracted from the title and introduction of the documents. Two Chinese business lexicons are prepared for Simplified Chinese and Traditional Chinese business to extract keywords from Web pages. Categorized documents are put into folders labeled by the key phrases to help browse the results.
Summarization: The CBizPort summarizer is modified from an English summarizer called TXTRACTOR that uses sentence-selection heuristics to rank text segments. This heuristic strives to reduce redundancy of information in a query-based summary. The summarizer can flexibly summarize Web pages using one to five sentence(s).