Commercial Internet Filters: Perils and Opportunities

Organizations are becoming increasingly aware of Internet abuse in the workplace. Such abuse results in loss of workers' productivity, network congestion, security risks, and legal liabilities. To address this problem, organizations have started to adopt Internet usage policies, management training, and filtering software. Several commercial Internet filters are experiencing an increasing number of organizational adoptions. These products mainly rely on black lists, white lists, and keyword/profile matching to filter out undesired web pages. In this paper, we describe three top-ranked commercial Internet filters – CYBERSitter, Net Nanny, and CyberPatrol – and evaluate their performance in the context of an Internet abuse problem. We then propose a text mining approach to address the problem and evaluate its performance using six different classification algorithms: naïve Bayes, multinominal naïve Bayes, support vector machine, decision tree, k-nearest neighbor, and neural network. The evaluation results point to the perils of using commercial Internet filters on one hand, and to the prospects of using text mining on the other. The proposed text mining approach outperforms the commercial filters. We discuss the possible reasons for the relatively poor performance of the filters and the steps that could be taken to improve their performance.

Publication

Decision Support Systems 48 (2010) 521–530

Authors

Chen-Huei Chou, Atish P. Sinha, and Huimin Zhao

About Decision Support Systems

The common thread of articles published in Decision Support Systems is their relevance to theoretical and technical issues in the support of enhanced decision making. The areas addressed may include foundations, functionality, interfaces, implementation, impacts, and evaluation of decision support systems (DSSs). Manuscripts may draw from diverse methods and methodologies, including those from decision theory, economics, econometrics, statistics, computer supported cooperative work, data base management, linguistics, management science, mathematical modeling, operations management, cognitive science, psychology, user interface management, and others. However, a manuscript focused on direct contributions to any of these related areas should be submitted to an outlet appropriate to the specific area.

Source Normalized Impact per Paper (SNIP): 2.271
SCImago Journal Rank (SJR): 2.262
Impact Factor: 2.604
5-Year Impact Factor: 3.271