* The COPLINK system was initially developed by the University of Arizona Artificial Intelligence Lab with funding from the National Institute of Justice and the National Science Foundation since 1997. With additional venture funding and product development, Knowledge Computing Corporation (KCC) currently distributes, maintains, and updates the commercially available COPLINK Solution Suite.
Demo: COPLINK Authorship Analysis
Fingerprint-based identification has been the oldest biometric technique successfully used in conventional crime investigation. The unique, immutable patterns of a fingerprint can help a crime investigator infer the identities of suspects. However, circumstances have changed since the emergence and rapid proliferation of cybercrime. Generally, cybercrime include Internet fraud, computer hacking/network intrusion, cyber piracy, spreading of malicious code, and so on. Cyber criminals post online messages over various Web-based channels to distribute illegal materials, including pirate software, child pornography materials, and stolen property. Moreover, international criminals and terrorist organizations such as Osama bin Laden use online messages as one of their major communication media. Since people are not usually required to provide their real identity in cyberspace, the anonymity makes identity tracing a critical problem in cybercrime investigation. This problem is further complicated by the sheer amount of cyber users and activities.
Fortunately, there is another type of print, which we call “writeprint,” hidden in people’s writings. Similar to fingerprints, writeprint is composed of multiple features, such as vocabulary richness, length of sentence, use of function words, layout of paragraphs, and key words. These writeprint features can represent an author’s writing style, which is usually consistent across his or her writings, and further become the basis of authorship attribution and facilitate identity tracing in cybercrime investigation.
We developed a framework for authorship identification of online messages to address the identity tracing problem. In this framework, four types of writing style features (lexical, syntactic, structural, and content-specific features) are extracted and inductive learning algorithms are used to build feature-based classification models to identify authorship of online messages. To examine this framework, we conducted experiments on English and Chinese online newsgroup messages. We compared the discriminating capability of the four types of features and also the prediction power of three classification techniques, i.e., decision trees, back-propagation neural networks, and support vector machines. The experimental results showed that the proposed approach was able to identify authors of online messages with satisfactory accuracy. All four types of message features contributed to discriminating authors of online messages. Support vector machines outperformed the other two classification techniques in our experiments. The high performance we achieved for the Chinese dataset showed the potential of applying this approach in a multilingual context. Our proposed framework and techniques are promising for automatic cyber criminal identity tracing.
The following figure shows the writing
style difference based on the
value of key features. We can
draw a conclusion that the two
authors (red and blue) are very
similar in terms of their writing
style but significantly differs
from the other one (green).
For additional information, please contact us.