MIS 510, Web Computing and Mining

Course Description

The MIS 510 course introduces web computing and mining techniques, systems, and applications that are suitable for developing web-based information systems in e-commerce, knowledge management systems, web/data/text mining, business intelligence, security informatics, and health informatics. The course contains lectures, readings, lab sessions, and two hands-on group system development projects. The course will cover web mining, data mining, and text mining. In web mining, we will introduce web architecture, search engines, search algorithms, web services/APIs, Web 2.0/3.0, cloud computing, and mobile web. State-of-the-art data and text mining algorithms are discussed in the context of modern and emerging information systems in business, security, and health informatics. Selected data mining algorithms such as neural networks, decision trees, statistical learning, and social network analysis will be presented for clustering, classification, and predictive analytics problems. Information retrieval, natural language processing, sentiment analysis, authorship analysis, and information visualization will be discussed in text mining, especially for emerging business intelligence and big data applications.

Selected algorithms will be introduced in the classroom using English-like pseudo-code. This course requires hands-on web-based system development and business analytics. The focus of the course is on web computing and mining concepts and applications. Two TAs with good technical skills will be provided to assist in guest lectures and lab sessions for system development and analytics. Members of the highly-regarded Tucson Desert Angels will also help develop and evaluate the final group web mining project ideas and business models. The class will prepare students to gain cutting-edge web computing and mining knowledge and hands-on experiences that are critical for future careers at leading Internet companies (e.g., Google, Facebook, Twitter, Amazon, Expedia, Microsoft, IBM) and/or for future web entrepreneurial activities.

Students use team work for a variety of class projects.

Web Mining Project Resources (Past Classes)

Syllabus and Other Important Materials (Spring 2014)

  1. MIS 510 Syllabus
  2. TA Office Hours: Jonathan: Room 424 Tu/Th 1:30-3:30PM, Julian: Room 424 Mo/We 1:30-3:30PM. Please note that the office hours for MIS510 are only available during the following days: Feb. 5th to Feb. 26thand Apr. 9th to Apr. 30t.
  3. Hsinchun Chen, (2001), Knowledge Management Systems: A Text Mining Perspective
  4. Hsinchun Chen, (2002), Trailblazing a Path Towards Knowledge and Transformation
  5. IEEE Intelligent Systems, Trends & Controversies; with introductions by Dr. Hsinchun Chen (2009, 2010, 2011):
    1. AI and Global Science and Technology Assessment, by Hsinchun Chen (July/August 2009)
    2. AI, E-Government, and Politics 2.0, by Hsinchun Chen (September/October 2009)
    3. AI for Global Disease Surveillance, by Hsinchun Chen and Daniel Zeng (November/December 2009)
    4. Business and Market Intelligence 2.0, by Hsinchun Chen (January/February 2010)
    5. AI and Opinion Mining, by Hsinchun Chen and David Zimbra (May/June 2010)
    6. A Lexicon Enhanced Method for Sentiment Classification, by Yan Dang, Yulei Zhang, and Hsinchun Chen (July/August 2010)
    7. AI and Security Informatics, by Hsinchun Chen, (September/October 2010)
    8. AI, Virtual Worlds, and Massively Multiplayer Online Games, by Hsinchun Chen and Yulei Zhang (January/February 2011)
    9. Smart Health and Wellbeing, by Hsinchun Chen (September/October 2011)
    10. Smart Market and Money, by Hsinchun Chen (November/December 2011)
  6. MISQ BI Special Issue: Business Intelligence and Analytics: From Big Data to Big Impact, by Hsinchun Chen et al. (2012)
  7. Sports Data Mining Book (Dr. Hsinchun Chen)

Other Course-Related Materials (Papers)

  1. Harvard Business Review (October 2012)
    1. Big Data: The Management Revolution
    2. Data Scientist: The Sexiest Job Of the 21st Century
    3. Making Advanced Analytics Work For You
  2. The Anatomy of a Large-Scale Hypertextual Web Search Engine, S. Brin and L. Page (1998)
  3. AI, Chapter 4, Winston (1984)
  4. GA Handout (27M)
  5. Assignment 1: GA (Spring 2012)
  6. A Smart Itsy Bitsy Spider for the Web
  7. Web 2.0 ... The Machine is Us/ing Us (YouTube)
  8. What Is Web 2.0? Design Patterns and Business Models for the Next Generation of Software, by Tim O'Reilly (2005)
  9. Web 2.0 (Wikipedia)
  10. The Long Tail, by Chris Anderson, WIRED Magazine (December 2004)
  11. The Great Giveaway (25M), by Erick Schonfeld, Business 2.0 (April 2005)
  12. Using Open Web APIs in Teaching Web Mining, by Hsinchun Chen et al. (2001)
  13. The Economist A Special Report on Social Networking---A World of Connections (January 30th 2010):
    1. A world of connections
    2. Global swap shops
    3. Twitter's transmitters
    4. Profiting from friendship
  14. The Economist, Data, Data, Everywhere: A Special Report on Managing Information (February 25th 2010); includes the following pieces:
    1. The data deluge
    2. Data, data everywhere
    3. All too much
    4. A different game
    5. Show me
    6. Needle in a haystack
    7. Open these items individually:
    8. New rules for big data
    9. Clicking for gold
    10. Handling the cornucopia
    11. The open society
    12. Sources and acknowledgments
  15. The Economist, A Special Report on Personal Technology---Beyond the PC (October 8th 2011):
    1. The Power of Many
    2. The Beauty of Bite-sized Software
    3. It's Arab Spring
    4. Up Close
    5. It's Arab Spring
  16. Communications of the ACM (2011):
    1. Reflecting on the DARPA Red Balloon Challenge, by John C. Tang et al. (April 2011)
    2. Crowdsourcing Systems on the World-Wide Web, by Anhai Doan et al. (April 2011)
    3. An Overview of Business Intelligence Technology, by Surajit Chaudhuri et al. (August 2011)
  17. IEEE Spectrum (June 2011): -5 Technologies that will Shape the Web, by Elise Ackerman and Erico Guizzo
    1. China's Social Networking Problem, by Sky Canaves
    2. Welcome to the Surveillance Society, by Siva Vaidhyanathan
    3. The Revolution Will Not Be Monetized, by Bob Garfield
  18. Big Data, by Doug Henschen, InformationWeek (Oct. 2011)
  19. Magic Quadrant for Business Intelligence Platforms, by Rita L. Sallam et al., Gartner Report (Jan. 27 2011)
  20. Hype Cycle for Business Intelligence, 2011, by Andreas Bitterer, Gartner Report (Aug. 12 2011)
  21. The 2011 IBM Tech Trends Report, by IBM (Nov. 15th, 2011)
  22. MISQ BI Special Issue: Business Intelligence and Analytics: From Big Data to Big Impact, by Hsinchun Chen et al. (2012)
  23. Data Science and Prediction, by Vasant Dhar (2013)
  24. The Scientific Research Potential of Virtual Worlds, by William Sims Bainbridge, Science (July 27th 2007)
  25. Web Mining: Machine Learning for Web Applications, by Hsinchun Chen and Michael Chau (2004)
  26. Top 10 Algorithms in Data Mining (PDF)
  27. ID3 Handout
  28. Backpropagation Neural Network Handout
  29. Expert Prediction, Symbolic Learning, and Neural Networks-An Experiment on Greyhound Racing, by Hsinchun Chen et al., IEEE Expert (December 1994)
  30. Self-organizing Maps Handout
  31. SIGKDD explorations 2009:
    1. What is Analytic Infrastructure and Why Should You Care?, by Robert L. Grossman
    2. What's PMML and What's New in PMML 4.0?, by Rick Pechter
    3. The WEKA Data Mining Software: An Update, by Mark Hall et al.
  32. Reynard: Broad Agency Announcement IARPA-BAA-09-05, issued by the Intelligence Advanced Research Projects Activity (IARPA), Incisive Analysis Office. This funding opportunity description "sets forth research areas of interest in the area of identifying behavioral indicators in Virtual Worlds (VWs) and Massive Multiplayer Online Games (MMOGs) that are predictive of real world characteristics of the users." (April 2009)
  33. Assignment 2: Neural Network (Spring 2009)
  34. Assignment 2: Iris dataset (Spring 2009)
  35. Prim's and Kruskal's Minium Spanning Tree Algorithms
  36. Credit Rating Analysis with Support Vector Machines and Neural Network: A Market Comparative Study, by Zan Huang et al. (PPT)
  37. An Automatic Classification Approach to Business Stakeholder Analysis on the Web, by Wingyan Chung et al. (PPT)
  38. Major Web Intelligence Tools , by AI Lab
  39. Web Marketing Research (Dr. Hsinchun Chen)

 Guest Lectures (Slides)

  1. Introduction to Web Application and APIs (Revised by Jonathan Jiang and Julian Guo)
  2. Cloud Computing Platforms (Jonathan Jiang and Julian Guo)
  3. Introduction to Weka and NetDraw
  4. Introduction to Support Vector Machine (SVM) and Conditional Random Field (CRF) (Long Version, Short Version)
  5. Programming with Amazon, Google, and eBay (slide set 1) (Chun-Ju Tseng)
  6. Programming with Amazon, Google, and eBay (slide set 2) (Chun-Ju Tseng)
  7. Software Agents, Multi-Agent Systems, and Data Mining (Dr. Daniel Zeng)
  8. Pattern Recognition using Support Vector Machine and Principal Component Analysis (Ahmed Abbasi)
  9. TimelyBid (Sean Humphreys)
  10. iDog (Chris Chang)
  11. Smart Gift Card (Gavin Zhang)
  12. Introduction to Web APIs (T.J. Fu)
  13. MapReduce/Hadoop (Jonathan Jiang)
  14. Android Overview (Josh Dehlinger and Siddharth Kaza)

Tutorials and Lab Sessions

  1. Data Collection and Web Crawling
    1. Data Collection Sample Code
  2. Tutorial for Cloud Computing Platforms
    1. Google App Engine Sample Code
  3. HackerWeb and Shodan Access (Jonathan Jiang)
    1. Hacker Web Sample Code
    2. Shodan Sample Code
  4. Introduction to Web Application and APIs
    1. Flickr Photo Search API Sample Code
    2. Amazon Product Advertising API Sample Code
    3. YouTube Data API Sample Code
    4. Yelp API Sample Code
  5. Recorded Videos
    1. HackerWeb and Data Collection
    2. API Examples

Class Lectures (Slides)

  1. UA MIS Program Overview (846K)
  2. Journals, Conferences, and Funding Sources for MIS Researchers and Educators: A Resource Guide, by Dr. Hsinchun Chen (846K)
  3. Cloud Computing Overview: Big Data and Business Analytics, by Dr. Hsinichun Chen, January 2014
  4. Page Rank and Google Story, by Vise and Malseed, 2005
  5. Facebook Story (2012)
  6. Inside Internet Search Engines: Fundamentals, by Jan Pedersen and William Chang (SIGIR 1999) (398K)
  7. Inside Internet Search Engines: Spidering and Indexing, by Jan Pedersen and William Chang (SIGIR 1999) (41K)
  8. Inside Internet Search Engines: Search, by Jan Pedersen and William Chang (SIGIR 1999) (553K)
  9. Inside Internet Search Engines: Products, by William Chang and Jan Pedersen (SIGIR 1999) (75K)
  10. Inside Internet Search Engines: Business, by William Chang and Jan Pedersen (SIGIR 1999) (37K)
  11. Introduction to Web Applications & APIs
  12. Web 2.0: Introduction, by Dr. Hsinchun Chen (2009)
  13. From Search Engines to Web Mining, by Dr. Hsinchun Chen
  14. An Introduction to Virtual World: Second Life and Beyond
  15. World (Patent) War, from the BloombergBusinessweek Technology section, March 12, 2012.
  16. Taiwan Semiconductor Manufacturing Company: Competitor Analysis, by Dr. Hsinchun Chen
  17. COPLINK, Dark Web, and Hacker Web: A Research Path in Security Informatics, by Dr. Hsinchun Chen
  18. Dark Web-Collection, Search, and Analysis, by Dr. Hsinchun Chen
  19. Cybersecurity Project Overview, by Victor Benjamin
  20. Cybersecurity Research Overview, by Victor Benjamin
  21. Shodan Introduction, from DefCon 18
  22. The National Cybersecurity WorkForce Framework: Interactive How-To and Implementation Guide, by NICE (National Initiative for Cybersecurity Education) (2015 web update available here)
  23. CyberGate: A Design Framework and System for Text Analysis of CMC, by Ahmed Abbasi and Hsinchun Chen
  24. Detecting Fake Websites: The Contribution of Statistical Learning Theory, by Abbasi, Zhang, Zimbra, Chen, and Nunamaker
  25. Web Mining: Machine Learning for Web Applications, by Hsinchun Chen and Michael Chau
  26. Challenges and Opportunities with Big Data, by Hammou Messatfa
  27. A Graph-based Recommender System, byZan Huang,  Wingyan Chung, Thian-Huat Ong, Hsinchun Chen
  28. A Lexicon Enhanced Method for Sentiment Classification, by Yan Dang, Yulei Zhang, and Hsinchun Chen (2002)
  29. Business Intelligence and Analytics: Overview and Examples, by Dr. Hsinchun Chen
  30. Analytical and Visual Data Mining (5.29M)
  31. Homeland Security Data Mining using Social (Dark) Network Analysis, ISI 2008, Keynote Address, by Dr. Chen (18.4M)
  32. Text Mining: Techniques, Tools, Ontologies and Shared Tasks, by Xiao Liu
  33. Healthcare Informatics, by Dr. Chen (2012)
  34. Health Big Data Analytics: Clinical Decision Support and Patient Empowerment , by Dr. Hsinchun Chen
  35. Infectious Disease Informatics: Overview and The BioPortal Experience, by Dr. Chen (2012)
  36. Predicting Market Movements: From Breaking News to Emerging Social Media, by Dr. Chen (2012)
  37. Information Visualization for Digital Library (2.26M)
  38. Information Visualization
  39. Data Mining: Part I (1.96M)
  40. Data Mining: Part II (3.83M)
  41. Data Mining: Part III (3.35M)
  42. Knowledge Management Systems: Development and Applications Part I: Overview and Related Fields (1.91M)
  43. Knowledge Management Systems: Development and Applications Part II: Techniques and Examples (2.46M)
  44. Knowledge Management Systems: Development and Applications Part III: Case Studies and Future (13.91M)
  45. Internet Searching and Browsing in a Multilingual World (2.23M)
  46. An Automatic Text Mining Framework for Knowledge Discovery on the Web (3.43M)
  47. Achieving Information Resources Empowerment: A Digital Library and Knowledge Management Perspective (10.9M)
  48. Digital Library Development in the Asia Pacific (16.9M)
  49. What is Visual Analytics? Part I, by Jim Thomas (7 MB)
  50. What is Visual Analytics? Part II, by Jim Thomas (3 MB)
  51. From Search Engines to Web Mining