%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Dreuw & Deselaer's Poster
% LaTeX Template
% Version 1.0 (11/04/13)
%
% Created by:
% Philippe Dreuw and Thomas Deselaers
% http://www-i6.informatik.rwth-aachen.de/~dreuw/latexbeamerposter.php
%
% This template has been downloaded from:
% http://www.LaTeXTemplates.com
%
% License:
% CC BY-NC-SA 3.0 (http://creativecommons.org/licenses/by-nc-sa/3.0/)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%----------------------------------------------------------------------------------------
% PACKAGES AND OTHER DOCUMENT CONFIGURATIONS
%----------------------------------------------------------------------------------------
\documentclass[final,hyperref={pdfpagelabels=false}]{beamer}
\usepackage[orientation=portrait,size=a0,scale=1.4]{beamerposter} % Use the beamerposter package for laying out the poster with a portrait orientation and an a0 paper size
\usetheme{I6pd2} % Use the I6pd2 theme supplied with this template
\usepackage[english]{babel} % English language/hyphenation
\usepackage{amsmath,amsthm,amssymb,latexsym} % For including math equations, theorems, symbols, etc
%\usepackage{times}\usefonttheme{professionalfonts} % Uncomment to use Times as the main font
%\usefonttheme[onlymath]{serif} % Uncomment to use a Serif font within math environments
\boldmath % Use bold for everything within the math environment
\usepackage{booktabs} % Top and bottom rules for tables
\graphicspath{{figures/}} % Location of the graphics files
\usecaptiontemplate{\small\structure{\insertcaptionname~\insertcaptionnumber: }\insertcaption} % A fix for figure numbering
%----------------------------------------------------------------------------------------
% TITLE SECTION
%----------------------------------------------------------------------------------------
\title{\huge Anti-scraping Tool: A system that blocks automatic scrapers through Computer Test and Human Test} % Poster title
\author{Paul Alejandro, Algene de la Paz, John Guevara, John Ong David, Alexis Pantola} % Author(s)
\institute{Computer Technology Department, De La Salle University} % Institution(s)
%----------------------------------------------------------------------------------------
% FOOTER TEXT
%----------------------------------------------------------------------------------------
\newcommand{\leftfoot}{Anti-scraping Tool} % Left footer text
\newcommand{\rightfoot}{Sentinel} % Right footer text
%----------------------------------------------------------------------------------------
\begin{document}
\addtobeamertemplate{block end}{}{\vspace*{2ex}} % White space under blocks
\begin{frame}[t] % The whole poster is enclosed in one beamer frame
\begin{columns}[t] % The whole poster consists of two major columns, each of which can be subdivided further with another \begin{columns} block - the [t] argument aligns each column's content to the top
\begin{column}{.02\textwidth}\end{column} % Empty spacer column
\begin{column}{.465\textwidth} % The first column
%----------------------------------------------------------------------------------------
% INTRODUCTION
%----------------------------------------------------------------------------------------
\begin{block}{Introduction}
\begin{itemize}
\item E-commerce is defined as the dealings and transactions that occur over the Internet [1]. The availability of the products’ information along with its currency value over the Internet has become the target of competing companies. Competitors use a method website scraping where they extract the information from a website for data manipulation [2]. Automatic scrapers called bots run program that automatically harvest huge amount of information at a rapid rate. They also affect the performance of the website because of performing requests at a very fast rate. In order to counter scraping tools, anti-scraping tools and techniques were developed.
\end{itemize}
\end{block}
%----------------------------------------------------------------------------------------
% MATERIALS
%----------------------------------------------------------------------------------------
\begin{block}{Sentinel Anti-scraping tool}
\begin{columns} % Subdivide the first main column
\begin{column}{.54\textwidth} % The first subdivided column within the first main column
\begin{itemize}
\item Sentinel is an anti-scraping tool that does not solely depend on predetermined IP addresses. The system also diminishes the occurrences of scrapers attacking with new authentic IP addresses. This paper discusses three of the five main modules of the system. The first module, called the Rate Limiter Module, is responsible for limiting the requests coming from the users and checking if the speed of the request is suspicious. The second and third modules are Computer Test Provider Module and Computer Test Checker Module respectively. These modules are responsible for verifying if the user is an automatic scraper through a reverse CAPTCHA called HoneyPot CAPTCHA, a CAPTCHA that is hidden from legitimate users.
\end{itemize}
\end{column}
\begin{column}{.43\textwidth} % The second subdivided column within the first main column
\centering
\begin{figure}
\includegraphics[width=15cm,height=30cm,keepaspectratio]{archi.jpg}
\caption{Architectural Design}
\end{figure}
\end{column}
\end{columns} % End of the subdivision
\end{block}
%----------------------------------------------------------------------------------------
% RATE LIMITER MODULE
%----------------------------------------------------------------------------------------
\begin{block}{Rate Limiter Module}
\begin{itemize}
\item This module limits the speed a user is allowed to send HTTP Requests. The Token Bucket method is an algorithm that can be used in any language, but in this case the module uses Java. This method is executed by giving each user a “bucket” with a set amount of “tokens” inside. The user can then spend a token to send a request, but the system gives the users a set amount of tokens every so often, to avoid the depletion of the supply. If the bucket of a user should become empty, then the said user is unable to send requests until the system refills his/her bucket.
\end{itemize}
\begin{figure}
\centerline{\includegraphics[width=15cm,height=20cm,keepaspectratio]{nani.jpg}}
\caption{Token Bucket Algorithm [3]}
\end{figure}
\end{block}
%----------------------------------------------------------------------------------------
% COMPUTER TEST PROVIDER & COMPUTER TEST CHECKER MODULE
%----------------------------------------------------------------------------------------
\begin{block}{Computer Test Provider and Computer Test Checker Module}
\begin{itemize}
\item These modules are in charge of inserting tests into HTTP Reply Packets to verify a user’s authenticity and checking if the tests are answered correctly. The Computer Test Provider Module inserts an additional test depending on the level of the user involved, and the Computer Test Checker Module labels users as scrapers once a specific test is failed. The tests implemented in these modules are CAPTCHAs.
\end{itemize}
\end{block}
%----------------------------------------------------------------------------------------
\end{column} % End of the first column
\begin{column}{.03\textwidth}\end{column} % Empty spacer column
\begin{column}{.465\textwidth} % The second column
%----------------------------------------------------------------------------------------
% RESULTS
%----------------------------------------------------------------------------------------
\begin{block}{Results: Table}
\begin{itemize}
\item Results of Rate Limiter Only (RLO)
\end{itemize}
\begin{table}
\begin{tabular}{l l l}
\toprule
\textbf{Request/Sec} & \textbf{Result: RLO} & \textbf{Average No. Replies}\\
\midrule
10 & Detected & 12.8 \\
9 & Detected & 12 \\
8 & Detected & 12 \\
7 & Detected & 12.6 \\
6 & Detected & 11.6 \\
5 & Detected & 13.8 \\
4 & Detected & 14.6 \\
3 & Detected & 17.4 \\
2 & Detected & 21.8 \\
1 & Not Detected & 100* \\
\bottomrule
\end{tabular}
\caption{Test Result: RLO (* Note: 100 since it scrapes every web page)}
\end{table}
\begin{itemize}
\item Results of Sentinel
\end{itemize}
\begin{table}
\begin{tabular}{l l l}
\toprule
\textbf{Request/Sec} & \textbf{Result: Sentinel} & \textbf{Average No. Replies}\\
\midrule
10 & Detected & 1 \\
9 & Detected & 1 \\
8 & Detected & 1 \\
7 & Detected & 1 \\
6 & Detected & 1 \\
5 & Detected & 1 \\
4 & Detected & 1 \\
3 & Detected & 1 \\
2 & Detected & 1 \\
1 & Detected & 1 \\
\bottomrule
\end{tabular}
\caption{Test Result: Sentinel}
\end{table}
\end{block}
%------------------------------------------------
\begin{block}{Results: Figure}
\begin{figure}
\includegraphics[width=25cm,height=20cm,keepaspectratio]{graph.jpg}
\caption{Figure caption}
\end{figure}
\end{block}
%----------------------------------------------------------------------------------------
% CONCLUSION
%----------------------------------------------------------------------------------------
\begin{block}{Conclusion}
\begin{itemize}
\item Using only rate limiting requires a specified speed limit to be broken, which leads to false negatives once scraping tools are able to abide by the speed limit. Sentinel have the Computer Test Provider and the Computer Test Checker which both use the Honey Pot CAPTCHA for verifying a user’s authenticity and this has proven to be more effective in the detection of scraping tools. This approach was able to successfully detect the scraping tools regardless of the number of requests per second. Furthermore, the approach also keeps the number of scraped web pages to a minimum compared to only using the rate limiter method.
\end{itemize}
\end{block}
%----------------------------------------------------------------------------------------
% REFERENCES
%----------------------------------------------------------------------------------------
\begin{block}{References}
[1] What is electronic commerce?, Webopedia, [online] n.d., http://www.webopedia.com/TERM/E/electronic\_commerce.html (Accessed: 28 January 2013).
[2] What is Web Scraping?, Webopedia, [online] n.d., http://www.webopedia.com/TERM/W/Web\_Scraping.html (Accessed: 28 January 2013).
[3] Qos Introdcution, H3C, [online] n.d., http://www.h3c.com/portal/res/200705/31/20070531\_107793\_image004\_
195599\_57\_0.jpg (Accessed: 30 October 2013).
\end{block}
%----------------------------------------------------------------------------------------
\end{column} % End of the second column
\begin{column}{.015\textwidth}\end{column} % Empty spacer column
\end{columns} % End of all the columns in the poster
\end{frame} % End of the enclosing frame
\end{document}