(Chinese Business Intelligence Portal)
is an Internet search portal for Chinese
business information. It searches
for business information in major
Chinese search engines and business
portals in mainland China, Taiwan
and Hong Kong.
CBizPort is powered
by a set of meta-search engines that
integrate several high-quality online
information resources. It enables
encoding conversion between Simplified
Chinese and Traditional Chinese to
support cross-regional search. Post-retrieval
analysis functions, including summarization
and categorization are also provided.
in the CBizPort:
CBizPort has two versions of interface,
Simplified Chinese and Traditional
Chinese version interfaces. Each is
designed for users of the corresponding
languages. Both versions have the
same look and feel and each version
uses its respective character encoding
when processing queries.
The encoding converter relies on a
conversion dictionary with 6,737 Chinese
characters in each of the two encodings
(Big5 and GB2312). The dictionary
includes the most commonly used characters
in the Chinese language. Encoding
conversion is performed when the portal
sends out queries to other search
engines having different encoding
than its own or when the portal collects
results from those search engines.
Authoritative information sources
are selected for meta-searching, which
include major Chinese search engines
and business-related portals from
the three regions. General search
engines include Baidu, Yahoo Hong
Kong and Yam. Business-related portals
include several commercial and government
The CBizPort categorizer organizes
the documents retrieved from the meta-searching
into different categories based on
the occurrence of keywords extracted
from the title and introduction of
the documents. Two Chinese business
lexicons are prepared for Simplified
Chinese and Traditional Chinese business
to extract keywords from Web pages.
Categorized documents are put into
folders labeled by the key phrases
to help browse the results.
The CBizPort summarizer is modified
from an English summarizer called
TXTRACTOR that uses sentence-selection
heuristics to rank text segments.
This heuristic strives to reduce redundancy
of information in a query-based summary.
The summarizer can flexibly
summarize Web pages using one to five