Ben Armstrong wrote:I solved it with HTML::TokeParser.On Sat, 2004-06-05 at 19:17, robin wrote:I've just updated that program for comparing texts to the Academic Word List. Changes:
1. No longer attempts to deal with HTML files - I'm putting this off for a while.
This certainly isn't the most efficient or elegant way to handle it, but what about filtering the html through lynx -dump -nolist? Otherwise, HTML::Parser, I guess (libwww-perl). Anyway, I've had fun playing with it. Nice work! Ben
As a bonus, the program now allows you to check web pages by pasting in the URL. However, as with short text files, short web pages are liekly to give artificially high scores (presumably because a lot of webspeak is in the AWL, e.g. "link").
Robin