<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
	<title></title>
	<link rel="stylesheet" href="http://kouznetsov.awardspace.com/styles_text.css" type="text/css" />
</head>
<body>
<h1>Converting <span class="emph">Adblock plus</span> filters to <span class="emph">Epiphany's Adblock</span> format</h1>

<div class="credentials">
Andrei Kouznetsov
<p style="font-style: italic; margin: 0; font-size: smaller; margin-top: 0.5em;">Department of Mathematics</p>
<p style="font-style: italic; margin: 0; font-size: smaller;">Washington State University</p>
<p style="font-style: italic; margin: 0; font-size: smaller;">Pullman, WA, 99163</p>
<p style="font-style: italic; margin: 0; font-size: smaller;">USA</p>
<p style="font-style: italic; margin: 0; font-size: smaller; margin-top: 0.5em;">
	email: akouznet<span style="display: none;">&lt;EMPTY></span>@<span style="display: none;">&lt;EMPTY></span>math.wsu.edu
</p>
</div>

<div class="text">
<p>
I am not happy with epiphany's standard <span class="emph">Adblock</span> filter because it does not catch many ads.
On the other hand, there is a great <span class="emph">Firefox</span> extension called <span class="emph">Addblock plus</span> that catches almost
everything (official site is <a href="http://adblockplus.org/en/">here</a>).
So, I decided to take the filter from <span class="emph">Adblock plus</span> and convert it
to the form that can be fed to <span class="emph">epiphany</span>.
</p>
<p>
Unfortunately, it looks like it is not possible to convert every rule.
The script below just skips the rules that can't be converted (or, at least I do not know how to convert them).
</p>

<p>
Applying the <a href="convert">script</a> to the filter from
<a href="http://easylist.adblockplus.org/morpeh+easylist.txt" rel="nofollow">http://easylist.adblockplus.org/morpeh+easylist.txt</a>
one gets the <a href="blacklist">blacklist</a> and the <a href="whitelist">whitelist</a>. Just put them in
</p>
<pre class="code">~/.gnome2/epiphany/extensions/data/adblock/</pre>

<p>
I have noticed that for a long blacklist the GUI of <span class="emph">Adblock</span> takes more time to open.
But this should be acceptable so far, since we do not want to open it very frequently.
</p>

<p> And here is the script for those who are interested. </p>

<pre class="code"><![CDATA[
#!/usr/bin/perl

##########################################################
# this subroutine works as a filter
# usage: cat list.txt | convert > blacklist
#     or cat list.txt | convert --white > whitelist
##########################################################

use strict;
use Getopt::Long;

# if $white is set to true, then the whitelist is printed
# otherwise print the blacklist
my $white = 0;
GetOptions('white' => \$white);

# the first line is a header, skip it
<>;

while (my $line = <>){
	chomp $line; # remove trailing EOL
	next if $line =~ /^!/; # if it is a comment, go to the next line
	if ($white){
		next if not $line =~ s/^@@//; # next if it is not a white list
	} else {
		next if $line =~ /^@@/; # next if the line represents a white list
	}
	print "$_\n" if $_=convert($line);
}

sub convert {
	my $pattern = shift;
	
	# if the filter cannot be converted due to some reasons, return undef
	return undef if $pattern =~ /#|\||\$|\^/;
	
	# first, quote all special symbols
	$pattern =~ s/(\.|\-)/\\\1/g;
	
	# add the start and the end matchers if necessary
	$pattern =~ s/^(?!\*)/\^/;
	$pattern =~ s/(?<!\*)$/\$/;
	
	# remove * from the start and the end of the pattern
	$pattern =~ s/^\*|\*$//g;
	
	# replace * with .*
	$pattern =~ s/\*/\.\*/g;
	
	#replace ? with .
	$pattern =~ s/\?/\./g;
	return $pattern;
}
]]></pre>
</div>

</body>
</html>

