Resources Page for MIS 611D and MIS 496A

Class Resources for MIS 496A, Special Topics in Data Analytics, and MIS 611-D, Topics in Data and Web Mining (Spring 2016)

Instructor:  Hsinchun Chen, Ph.D., Professor, Management Information Systems Dept, Eller College of Management, University of Arizona

TOPIC 1: Introduction

  1. University of Arizona MIS Program: Overview, by Dr. Hsinchun Chen
  2. Ideas for the Future of the IS Field, by G. B. Davis, P. Gray, S. Madnick, J. F. Nunamaker, R. Sprague, and A. Whinston.  Transactions on Management Information Systems, Volume 1, Issue 1, pp. 2:1 - 2:15. [PDF copy here]
  3. Design Science, Grand Challenges, and Societal Impacts, by Hsinchun Chen. Transactions on Management Information Systems, Volume 2, Issue 1, pp. 1:1 - 1:10. [PDF copy here]
  4. Journals, Conferences, and Funding Sources for MIS Researchers and Educators: A Resource Guide, by Dr. Hsinchun Chen (846K, 2003)
  5. ISI Ranking of Top Computer Science and Information Systems Journals, from the ISI Web of Knowledge, 2004-2007
  6. The H Index for MIS, April 2015
  7. Template for Producing IT Research and Publication, by Dr. Hsinchun Chen
    1. IEEE Template - Word Document
    2. IEEE Paper Examples
      1. Exploring Threats and Vulnerabilities in Hacker Web: Forums, IRC and Carding Shops - PDF (Benjamin et al., 2015)
      2. Developing Understanding of Hacker Language through the use of Lexical Semantics - PDF (Benjamin and Chen, 2015)
      3. Exploring Hacker Assets in Underground Forums - PDF (Samtani et al., 2015)
      4. Exploring Hacker Assets in Underground Forums - Word Document (Samtani et al., 2015)
  8. Design Science in Information Systems Research, by Alan R. Hevner, Salvatore T. March, Jinsso Park, and Sudha Ram.  MIS Quarterly, Volume 28, Number 1, pp. 75-105, March 2004.
  9. Predictive Analytics in Information Systems Research, by Galid Shmueli and Otto R. Koppius.   MIS Quarterly, Volume 35, Number 3, pp. 553-572, September 2011.
  10. Positioning and Presenting Design Science Research for Maximum Impact, by Shirley Gregor and Alan R. Hevner. MIS Quarterly, Volume 37, Number 2, pp. 337-355, September 2011.
  11. MISQ BI Special Issue: Business Intelligence and Analytics: From Big Data to Big Impact, by Hsinchun Chen et al. (2012).

TOPIC 2: Data Mining

  1. Top 10 Algorithms in Data Mining (PDF)
  2. WEKA Overview (Sagar Samtani, Weifeng Li, and Hsinchun Chen, 2016)
    1. iris-train, iris-testhouses-trainhouses-test
  3. Analytical and Visual Data Mining (5.29M)
  4. ID3 Handout
  5. Backpropagation Neural Network Handout
  6. Self-organizing maps: an introduction
  7. K-means algorithm
  8. Knowledge Management Systems: Development and Applications Part II: Techniques and Examples (2.46M)
  9. Expert Prediction, Symbolic Learning, and Neural Networks-An Experiment on Greyhound Racing, by Hsinchun Chen et al., IEEE Expert (December 1994)
  10. Introduction to Support Vector Machine (SVM) and Conditional Random Field (CRF) (Long Version, Short Version)
  11. Pattern Recognition using Support Vector Machine and Principal Component Analysis (Ahmed Abbasi)
  12. Predictive Analytics - Regression and Classification (Weifeng Li, Sagar Samtani, Hsinchun Chen, 2016)
  13. Logistic Regression and Elastic Net (Weifeng Li, Hsinchun Chen, 2016)
  14. Publicly Available Datasets (Sagar Samtani, Hsinchun Chen, 2016)
  15. Google masters Go (Nature, Elizabeth Gibney, January 28, 2016)
  16. Artificial Intelligence Go Showdown (The Economist, March 12, 2016)
  17. Detecting Fake Websites: The Contribution of Statistical Learning Theory, by Abbasi et al., September 2010 (MISQ) - PDF
  18. CyberGate: A Design Framework and System for Text Analysis of Computer-Mediated Communication, by Abbasi and Chen, December 2008 (MISQ) - PDF
  19. CyberGate: A Design Framework and System for Text Analysis of CMC, by Abbasi and Chen, 2008 - PPT
  20. Artificial Intelligence - Million Dollar Babies - The Economist, April 2, 2016
  21. Mastering the game of Go with deep neural networks and tree search (Nature, Silver et al., 2016)
  22. Deep Learning (Nature, LeCun et al., 2015)
  23. Deep Learning: An Overview (Weifeng Li, Victor Benjamin, Xiao Liu, and Hsinchun Chen, 2016)
  24. Topic Modeling and Latent Dirichlet Allocation: An Overview (Weifeng Li, Sagar Samtani, and Hsinchun Chen, 2016)
  25. Cybercriminal Jargon Identification and Analysis using Unsupervised Learning (Kangzhi Zhao and Hsinchun Chen, 2016)
  26. Exploring Topics and Key Hackers in Chinese Hacker Communities (Zhen Fang, Xinyi Zhao, and Hsinchun Chen, 2016)

TOPIC 3: Text Mining

  1. Information Visualization
  2. Information Visualization for Digital Library (2.21M)
  3. Visualizing Data (Hongyi Zhu, Sagar Samtani, Hsinchun Chen, 2016)
    1. Tableau Overview (Sagar Samtani and Hsinchun Chen, 2016)
    2. Sample NFL Dataset for Visualization
  4. Text Mining: Techniques, Tools, Ontologies and Shared Tasks, by Xiao Liu

TOPIC 4: Web Mining

  1. Inside Internet Search Engines, by Jan Pedersen and William Chang (SIGIR 1999)
    1. Fundamentals, by Jan Pedersen and William Chang (SIGIR 1999) (398K)
    2. Spidering and Indexing, by Jan Pedersen and William Chang (SIGIR 1999) (41K)
    3. Search, by Jan Pedersen and William Chang (SIGIR 1999) (553K)
    4. Products, by William Chang and Jan Pedersen (SIGIR 1999) (75K)
    5. Business, by William Chang and Jan Pedersen (SIGIR 1999) (37K)
  2. The Anatomy of a Large-Scale Hypertextual Web Search Engine, S. Brin and L. Page (1998)
  3. Page Rank and Google Story, by Vise and Malseed, 2005
  4. AI, Chapter 4, Winston (1984)
  5. GA Handout (27M)
  6. A Smart Itsy Bitsy Spider for the Web, Hsinchun Chen, et al. (1998)
  7. Optimal Search-Based Gene Subset Selection for Gene Array Cancer Classification, Jiexun Li, Hua Su, Hsinchun Chen (2007)
  8. Network Science (Sagar Samtani, Weifeng Li, Hsinchun Chen, 2016)
  9. The Great Giveaway (25M), by Erick Schonfeld, Business 2.0 (April 2005)
  10. The Long Tail, by Chris Anderson, WIRED Magazine (December 2004)
  11. Web 2.0 ... The Machine is Us/ing Us (YouTube)
  12. What Is Web 2.0? Design Patterns and Business Models for the Next Generation of Software, by Tim O'Reilly (2005)
  13. Facebook Story (2012)
  14. Communications of the ACM (2011):
    1. Reflecting on the DARPA Red Balloon Challenge, by John C. Tang et al. (April 2011)
    2. Crowdsourcing Systems on the World-Wide Web, by Anhai Doan et al. (April 2011)
    3. An Overview of Business Intelligence Technology, by Surajit Chaudhuri et al. (August 2011)
  15. Android Overview (Josh Dehlinger and Siddharth Kaza)
  16. World (Patent) War, from the BloombergBusinessweek Technology section, March 12, 2012.
  17. The Netflix Recommender System: Algorithms, Business Value, and Innovation (Uribe and Hunt, 2015)
  18. Data Science and Prediction, by Vasant Dhar (2013)
  19. Harvard Business Review (October 2012)
    1. Big Data: The Management Revolution (from HBR 12/12)
    2. Data Scientist: The Sexiest Job Of the 21st Century (from HBR 12/12)
    3. Making Advanced Analytics Work For You (from HBR 12/12)
  20. Hype Cycle for Business Intelligence, 2011, by Andreas Bitterer, Gartner Report (Aug. 12 2011)
  21. Magic Quadrant for Business Intelligence Platforms, by Rita L. Sallam et al., Gartner Report (Jan. 27 2011)
  22. The 2011 IBM Tech Trends Report, by IBM (Nov. 15th, 2011)
  23. The Economist A Special Report on Social Networking---A World of Connections (January 30th 2010):
    1. A world of connections (from The Economist 1/30/10)
    2. Global swap shops (from The Economist 1/30/10)
    3. Twitter's transmitters (from The Economist 1/30/10)
    4. Profiting from friendship (from The Economist 1/30/10)
  24. The Economist, Data, Data, Everywhere: A Special Report on Managing Information (February 25th 2010); includes the following pieces:
    1. The data deluge
    2. Data, data everywhere
    3. All too much
    4. A different game
    5. Show me
    6. Needle in a haystack
    7. New rules for big data
    8. Clicking for gold
    9. Handling the cornucopia
    10. The open society
    11. Sources and acknowledgments
  25. The Economist, A Special Report on Personal Technology (October 8th 2011).  Includes the following sections:
    1. Beyond the PC
    2. The Power of Many
    3. The Beauty of Bite-sized Software
    4. IT's Arab Spring
    5. Up Close
  26. The Economist, Special Report, Cyber-Security, July 12, 2014: Defending the Digital Frontier.  Includes the following sections:
    1. Cybercrime: Hackers, Inc.
    2. Vulnerabilities: Zero-day game
    3. Business: Digital disease control
    4. Critical infrastructure: Crashing the system
    5. Market failures: Not my problem
    6. The Internet of Things: Home, hacked home
    7. Remedies: Prevention is better than cure
  27. Introduction to Web Application and APIs (Revised by Jonathan Jiang and Julian Guo):
    1. Flickr Photo Search API Sample Code
    2. Amazon Product Advertising API Sample Code
    3. YouTube Data API Sample Code
    4. Yelp API Sample Code
  28. Big Data Technology - Hadoop, MapReduce, and Spark (Jonathan Jiang, with updates from Sagar Samtani, 2016)

TOPIC 5: Emerging Research in Data and Web Mining (for MIS 611D)

  1. COPLINK, Dark Web, and Hacker Web: A Research Path in Security Informatics, by Dr. Hsinchun Chen
  2. Criminal Network Analysis and Visualization, by Jennifer Xu and Hsinchun Chen
  3. The Topology of Dark Networks, by Jennifer Xu and Hsinchun Chen
  4. An SIR Model for Violent Topic Diffusion in Social Media by J. Woo, J. Son, and H. Chen (2011)
  5. Cybersecurity Project Overview, by Victor Benjamin
  6. Cybersecurity Research Overview, by Victor Benjamin
  7. CyberGate: A Design Framework and System for Text Analysis of CMC, by Ahmed Abbasi and Hsinchun Chen
  8. Optimal Search-Based Gene Subset Selection for Gene Array Cancer Classification, by Jiexun Li, Hua Su, Hsinchun Chen, and Bernard W. Futscher (2007)
  9. MedTime: A Temporal Information Extraction System for Clinical Narratives, by Yu-Kai Lin, Hsinchun Chen and Randall A. Brown (2013)
  10. Smart and Connected Health: Guest Editors' Introduction, by Gondy Leroy, Hsinchun Chen, and Thomas C. Rindflesch (2014)
  11. Time-To-Event Predictive Modeling for Chronic Conditions Using Electronic Health Records, by Yu-Kai Lin, Hsinchun Chen, Randall A. Brown, Shu-Hsing Li, and Hung-Jen Yang (2014)
  12. Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes, by Xiao Lu and Hsinchun Chen (2015)
  13. HackerWeb and Shodan Access (Jonathan Jiang)
    1. Hacker Web Sample Code
    2. Shodan Sample Code
  14. Homeland Security Data Mining using Social (Dark) Network Analysis, ISI 2008, Keynote Address, by Dr. Chen (18.4M)
  15. Health Big Data Analytics: Clinical Decision Support and Patient Empowerment , by Dr. Hsinchun Chen
  16.  IEEE Intelligent Systems, Trends & Controversies; with introductions by Dr. Hsinchun Chen (2009, 2010, 2011):
    1. AI and Global Science and Technology Assessment, by Hsinchun Chen (July/August 2009)
    2. AI, E-Government, and Politics 2.0, by Hsinchun Chen (September/October 2009)
    3. AI for Global Disease Surveillance, by Hsinchun Chen and Daniel Zeng (November/December 2009)
    4. Business and Market Intelligence 2.0, by Hsinchun Chen (January/February 2010)
    5. AI and Opinion Mining, by Hsinchun Chen and David Zimbra (May/June 2010)
    6. AI and Security Informatics, by Hsinchun Chen, (September/October 2010)
    7. AI, Virtual Worlds, and Massively Multiplayer Online Games, by Hsinchun Chen and Yulei Zhang (January/February 2011)
    8. Smart Health and Wellbeing, by Hsinchun Chen (September/October 2011)
    9. Smart Market and Money, by Hsinchun Chen (November/December 2011)

TOPIC 6: AI Lab Cybersecurity Papers: Journal and Conferences

  1. 2016 Benjamin et al. Examining Hacker Participation Length in Cybercriminal Internet-Relay-Chat Communities
  2. 2016 Li et al. Identifying and Profiling Key Sellers in Cyber Carding Community: AZSecure Text Mining System
  3. 2016 Li et al. Identify Key Data Breach Services with Nonparametric Supervised Topic Model
  4. 2016 Li et al. Identify High Quality Carding Services in Underground Economy using Nonparametric Supervised Topic Model
  5. 2016 Samtani et al. Identifying SCADA Vulnerabilities Using Passive and Active Vulnerability Assessment Techniques
  6. 2016 Samtani et al. AZSecure Hacker Assets Portal: Cyber Threat Intelligence and Malware Analysis
  7. 2016 Samtani and Chen Using Social Network Analysis to Identify Key Hackers for Keylogging Tools in Hacker Forums
  8. 2016 Li et al. Targeting Key Data Breach Services in Underground Supply Chain
  9. 2016 Grisham et al. Identifying Top Listers in Alphabay Using Latent Dirichlet Allocation
  10. 2016 Jicha et al. SCADA Honeypots: An In-depth Analysis of Conpot
  11. 2016 Jicha et al. Identifying Devices across the IPv4 Address Space
  12. 2016 Rohrmann et al Anonymous Port Scanning Performing Network -Reconnaissance Through Tor
  13. 2016 Ercolani et al. Shodan Visualized
  14. 2016 Zhao et al. Chinese Underground Market Jargon Analysis Based on Unsupervised Learning
  15. 2016 Zhen et al. Exploring Key Hackers and Cybersecurity Threats in Chinese Hacker Communities
  16. 2016 Huang and Chen Exploring the Online Underground Marketplaces through Topic-Based Social Network and Clustering
  17. 2016 Benjamin and Chen Identifying Language Groups within Multilingual Cybercriminal Forums
  18. 2015 Benjamin and Chen Developing Understanding of Hacker Language through the use of Lexical Semantics
  19. 2015 Benjamin et al Exploring Threats and Vulnerabilities in Hacker web: Forums, IRC and Carding Shops
  20. 2015 Samtani et al. Exploring Hacker Assets in Underground Forums
  21. 2014 Abbasi et al. Descriptive Analytics: Investigating Expert Hackers in Hacker Forums
  22. 2014 Patton et al Uninvited Connections: A Study of the Vulnerable Devices on the Internet of Things (IoT)
  23. 2014 Benjamin and Chen Time-to-event Modeling for Predicting Hacker Community Participant Trajectory
  24. 2014 Li and Chen Identifying Top Sellers in Underground Economy Using Deep Learning-based Sentiment Analysis
  25. 2013 Benjamin et al. Evaluating text visualization: An experiment in authorship analysis
  26. 2013 Benjamin and Chen Machine learning for attack vector identification in malicious source code
  27. 2012 Benjamin and Chen Securing Cyberspace: Identifying Key Actors in Hacker Communities

MISCELLANEOUS RESOURCES


red bullet to mark item Return to Class Page for MIS 496A, Special Topics in Data Analytics

red bullet to mark item Return to Class page for MIS 611-D, Topics in Data and Web Mining

Photo provided through courtesy of DARPA and available through Wikimedia Commons.