User avatar
prazim
Posts: 263
Joined: Mon Aug 08, 2005 4:17 pm

Postby prazim » Tue Aug 09, 2005 5:48 pm

i am trying to install the WebScrape plug in so that I can view an number of different new release pages from our customers and competition. I followed the directions with the plug in and pointed to it in my channels window but that didn't install the plug in.

Here is one link I need to view in Awasu:
http://www.editorialmanager.com/homepage/press.html

I can provide more.

Thank you in advance for your help with this. This feature is a key reason why i am using awasu.
Sue

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Fri Aug 12, 2005 4:27 am

There are two plugins available for monitoring web pages.

--- UrlMonitor ---

This is the simpler one. You just enter the URL of the page you want to monitor and it will check it regularly to see if anything has changed. This is how you use it:

- Download the plugin and unpack the ZIP file somewhere (e.g. C:\Program Files\Awasu\ChannelPlugins).
- In Awasu, start the Channel Wizard and browse to the new awasuMonitorURLs.exe file.
- On the next page, replace the awasu.com URL with the URL you want to monitor. You can monitor up to 10 URL's with the one channel.
- Finish the Channel Wizard.

You now have a channel that will monitor the URL(s) you specified and will let you know when changes are detected.

IMPORTANT NOTE: Items in this channel should never marked as read since Awasu will not flag them as new.

--- WebScrape ---

This is more sophisticated but requires some technical knowledge. It extracts content from the page you are interested in and generates an RSS feed from that content. This is how you use it:

- Download the plugin and unpack the ZIP file somewhere (e.g. C:\Program Files\Awasu\ChannelPlugins).
- Copy the text below and save it in a file somewhere. This file controls what page will be monitored and what information will be extracted from it.

Code: Select all

[ChannelParameters]
URL=http://www.editorialmanager.com/homepage/press.html
Title=Editorial Manager
Description=
BaseUrl=http://editorialmanager.com/homepage/
MaxItems=15
Shorthand=
SectionPattern=class="interiorcontent"(.*)</table>
ItemPattern-1=<div><a>(?P<T>.*?)</a>
ItemPattern-2=.*?<a href=".*?>(?P<D>.*?)</font>
ItemPattern-3=

- In Awasu, start the Channel Wizard and browse to the new WebScrape.exe file.
- In the next page, enter the path to the config file you saved above.
- Finish the Channel Wizard.

You now have a channel that will extract information off the page you are interested in and return it to you in an RSS feed.

IMPORTANT NOTE: This process of "scraping" web pages is inherently fragile. The plugin is looking for certain things on the page so if the publisher changes the way the web page is laid out, the plugin won't be able to find the information it is looking for and will have to be updated for the new layout. You can see this happening already: they changed the layout slightly after Feb 25 2004 and so the plugin doesn't find any entries on or before that date.

Nevertheless, this plugin is a good way to monitor web sites that don't offer an RSS feed. The tricky bit is writing the config files. A utility is included in the release ZIP (WebScrapeSettings.exe) to help with this process or you can contact us and we'll do it for you.
Last edited by support on Fri Apr 20, 2007 10:51 am, edited 1 time in total.

User avatar
prazim
Posts: 263
Joined: Mon Aug 08, 2005 4:17 pm

Postby prazim » Fri Aug 12, 2005 12:43 pm

Thank you for the information. What is the extension of the config file for the WebScrape plug in? Can it be a txt file? or does it need to be an ini file?

thanks in advance for your help,
Sue

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Fri Aug 12, 2005 12:51 pm

They're INI files but the extension doesn't matter. The plugin will read them regardless of what you call the file :-)

User avatar
prazim
Posts: 263
Joined: Mon Aug 08, 2005 4:17 pm

Postby prazim » Fri Aug 12, 2005 4:31 pm

I tried using the webscrape plug in and got this error:
auto-discovery failed. Can't create the URL: rc=87

I was able to install the URL monitor plug in and load the URL's into that, and it's working.

I want to be using the best tool and am not certain what webscrape would give me that url monitor doesn't.

Thanks in advance,
Sue

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Fri Aug 12, 2005 6:11 pm

sue wrote:I tried using the webscrape plug in and got this error:
auto-discovery failed. Can't create the URL: rc=87


This is a problem in 2.1 and has already been fixed in a later release. You can ignore the error and continue on with the Channel Wizard.

sue wrote:I want to be using the best tool and am not certain what webscrape would give me that url monitor doesn't.


UrlMonitor just tells you if a page has changed or not. WebScrape extracts content from the page and presents it to you in a feed. In the case of your editorialmanager.com example, it pulls out the summary of each press release so that you can see it in the channel.

abwilson
Posts: 247
Joined: Sun Feb 09, 2003 12:36 am
Location: San Francisco, CA -- USA

Postby abwilson » Fri Aug 12, 2005 6:17 pm

(I've replied separately to your e-mail, but for everyone else:)

The URL Monitor plug-in just tells you when a conventional Webpage has changed -- hopefully with new, interesting content, but perhaps they just fixed some spelling mistakes... :)

WebScrape, on the other hand, reads a Webpage and -- based on text patterns ("regular expressions") you give it when you create the channel -- will match Webpage text against your patterns and generate an RSS feed, which Awasu then displays as a news feed.

So, to use WebScrape you have to analyze the actual HTML source of each Web page you are interested in and determine what text patterns will match the RSS title, link, and description for news items; with these coded patterns, the plug-in can scrape the page and generate the RSS Awasu uses to display the news.

In summary, URL Monitoring basically lets you know when its time to manually check out a (changed) Webpage. Once it has the right regular expressions, WebScrape will dynamically show you news extracted from a particular page. However, if the format of the Webpage changes enough to "break" your regular expressions, you'll have to figure out what changed and update your regular expressions so WebScrape knows how to scrape.

Allan


Return to “Awasu - Extensions”

Who is online

Users browsing this forum: No registered users and 2 guests