User avatar
support
Site Admin
Posts: 3022
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Sun Apr 15, 2007 11:42 am

diabloNL wrote:The script failed: rc=0

Normally, you get this error message if the script has a non-zero exit code or writes something to stderr (which is included in the error message).

The ret code is 0 and there's no stderr output logged but looking at the code, there is one additional case where this error message is raised: if nothing is written to stdout (I'll change things to output a more helpful message). So I think what's happened is that the plugin has not written out anything at all to stdout nor stderr.

Did you write the intermediary script I mentioned in the earlier post? You need to make sure it calls WebScrape.exe after logging the input HTML file so that Awasu receives some output, otherwise this error is probably what you will see.

User avatar
diabloNL
Posts: 55
Joined: Mon Feb 26, 2007 6:08 am

Postby diabloNL » Sun Apr 15, 2007 12:00 pm

support wrote:
diabloNL wrote:The script failed: rc=0

Normally, you get this error message if the script has a non-zero exit code or writes something to stderr (which is included in the error message).

The ret code is 0 and there's no stderr output logged but looking at the code, there is one additional case where this error message is raised: if nothing is written to stdout (I'll change things to output a more helpful message). So I think what's happened is that the plugin has not written out anything at all to stdout nor stderr.

Did you write the intermediary script I mentioned in the earlier post? You need to make sure it calls WebScrape.exe after logging the input HTML file so that Awasu receives some output, otherwise this error is probably what you will see.


I'm trying to figure out how to make this script. :P

User avatar
support
Site Admin
Posts: 3022
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Mon Apr 16, 2007 5:39 am

diabloNL wrote:I'm trying to figure out how to make this script. :P

OK, I had a bit of a play around with this and it's not as simple as I first thought.

It seems that while WebScrape offers the DownloadUrl parameter, it appears to be ignoring it and only downloads whatever is specified in it's own INI file :-( So the trick of writing a replacement plugin may not tell you what you need to know since it's the plugin that's doing the download, not Awasu.

If you still want to try, this is what to do:

(*) We need a plugin that looks the same as WebScrape so take a copy of WebScrape.plugin and call it, say, foobar.plugin (in the same directory as WebScrape).

(*) Create a file called foobar.py (in the same directory) that looks like this:

Code: Select all

import sys
import os
import win32api

# get the path to the HTML file Awasu downloaded for us
configFilename = sys.argv[1]
htmlFile = win32api.GetProfileVal( "DownloadUrl Response" , "DownloadUrlFile" , "" , configFilename )

# dump the HTML file
fp = open( htmlFile , "r" )
buf = fp.read()
fp = open( "foobar.log" , "w" )
fp.write( buf )

# invoke the real WebScrape
os.system( ".\\WebScrape.exe "" + configFilename + """ )

This dumps the HTML file Awasu downloaded and then invokes the real WebScrape.

(*) Exit Awasu, find your channel's .CHANNEL file and edit the ScriptFilename parameter to point to foobar.py instead of WebScrape.exe

(*) Restart Awasu and update the channel. You should find a file called foobar.log that contains a copy of the downloaded HTML file.

As I said, WebScrape is doing the download, not Awasu and IIRC, it doesn't honor any of IE's cookies or other settings. So maybe the best way to simulate what's happening is to clear all your IE cookies and cached files, then download the HTML page and play around with that in WebScrapeSettings.

You could also run Ethereal or some other HTTP monitor and watch what WebScrape is downloading... :-)

User avatar
diabloNL
Posts: 55
Joined: Mon Feb 26, 2007 6:08 am

Postby diabloNL » Mon Apr 16, 2007 7:20 am

Thanks for the effort Taka, I will have a play tonight. ;)

User avatar
diabloNL
Posts: 55
Joined: Mon Feb 26, 2007 6:08 am

Postby diabloNL » Mon Apr 16, 2007 5:10 pm

Taka, I did exactly what you said but I got an error. So I added a new channel directly connected to foobar.py and this is the error I get:



Code: Select all

<HEAD>
<META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=utf-8'>
</HEAD>
<HTML><BODY>
<P>The script caused an error:
<PRE style='margin-left:30px'>
Traceback (most recent call last):
  File "D:\Multimedia\Awasu\ChannelPlugins\Webscraper\foobar.py", line 10, in ?
    fp = open( htmlFile , "r" )
IOError: [Errno 2] No such file or directory: ''

</PRE>
</BODY></HTML>




Any idea what goes wrong?

User avatar
support
Site Admin
Posts: 3022
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Mon Apr 16, 2007 7:56 pm

diabloNL wrote:No such file or directory

Sigh, my bad :oops: I was in a bit of a hurry and forgot to mention you need to set the DownloadUrl parameter in the channel's Properties dialog as well.

User avatar
diabloNL
Posts: 55
Joined: Mon Feb 26, 2007 6:08 am

Postby diabloNL » Mon Apr 16, 2007 8:59 pm

No problem Taka. :wink:

Anyway, I got the foobar.log/html and saved it to my desktop. Tried it with my config file in webscrapersettings and it works fine. But still nothing shows up in Awasu.

Config file for local foobar.html:

Code: Select all

[ChannelParameters]
URL=file:///C:/Documents and Settings/Bobby/Desktop/foobar.html
Title=Test
Description=
BaseUrl=file:///C:/Documents and Settings/Bobby/Desktop
MaxItems=15
Shorthand=
SectionPattern=<!-- show threads -->(.*)<!-- end show threads -->
ItemPattern-1=
ItemPattern-2=firstnew.gif.*?<a href="(?P<L>.*?)".*?">
ItemPattern-3=(?P<T>.*?)</a>(?P<D>)



Here you can find the foobar.html:

CLICK

This is what webscrapersettings generates:

Code: Select all

<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
   <generator>Scrape Web page and convert to RSS, by Allan B. Wilson; abwilson@awasu.com</generator>
   <title>Test</title>
   <link>file:///C|\DOCUME~1\Bobby\LOCALS~1\Temp\awasu50</link>
   <description></description>
<item>
      <title>Mooiste vrouw/man op dit moment?</title>
      <link>file:///C:/Documents and Settings/Bobby/Desktop/showthread.php?t=3408</link>
      <guid>file:///C:/Documents and Settings/Bobby/Desktop/showthread.php?t=3408</guid>
      <description>
</description>
</item>
</channel>
</rss>



So I really am lost here and have no clue what's happening. :(

User avatar
support
Site Admin
Posts: 3022
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Tue Apr 17, 2007 3:37 am

diabloNL wrote:So I really am lost here and have no clue what's happening. :(

The reason your config is not working is because you're looking for the wrong thing :-)

There's no reference to <tt>firstnew.gif</tt> in the current HTML so either they've changed the structure of the page since you started working on this or you saw something that only happens sometimes.

I've been having a bit of play with it and got this far (angle brackets replaced with curly braces):

Code: Select all

[ChannelParameters]
URL=http://www.gamingonly.nl/forum/forumdisplay.php?f=20
Title=Test
Description=
BaseUrl=http://www.gamingonly.nl/forum/
MaxItems=15
Shorthand=
SectionPattern={!-- show threads --}(.*){!-- end show threads --}
ItemPattern-1={a href="(?P{L}showthread.php.*?)" id=".*?"}(?P{T}.*?){/a}(?P{D})
ItemPattern-2=
ItemPattern-3=

Basically, I'm looking for <tt><a></tt> tags that have an <tt>id</tt> attribute. Unfortunately, it's still not quite working right since the RE doesn't seem to be stopping if it doesn't find an <tt>id</tt> attribute in an <tt><a></tt> tag but you can have a play with it.

User avatar
diabloNL
Posts: 55
Joined: Mon Feb 26, 2007 6:08 am

Postby diabloNL » Tue Apr 17, 2007 6:23 am

The "firstnew.gif" is an icon that shows in front of new posts in topics. So in the foobar.html there is one topic having a link to this icon. I used this because I want to show the new posts titles in Awasu. And like I said with webscrapersettings it works perfectly everytime.

If you tried that URL yourself you won't get that "firstnew.gif" since you are not a member. :wink:

User avatar
support
Site Admin
Posts: 3022
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Tue Apr 17, 2007 11:07 am

diabloNL wrote:If you tried that URL yourself you won't get that "firstnew.gif" since you are not a member. :wink:

Then this gets back to what I said earlier. The WebScrape tool doesn't use IE's cookies so when it downloads the page, the web site has no way of knowing you're a member and so you get the page that has no <tt>firstnew.gif</tt> in it.

User avatar
diabloNL
Posts: 55
Joined: Mon Feb 26, 2007 6:08 am

Postby diabloNL » Tue Apr 17, 2007 12:03 pm

support wrote:
diabloNL wrote:If you tried that URL yourself you won't get that "firstnew.gif" since you are not a member. :wink:

Then this gets back to what I said earlier. The WebScrape tool doesn't use IE's cookies so when it downloads the page, the web site has no way of knowing you're a member and so you get the page that has no <tt>firstnew.gif</tt> in it.


So you're saying that webscrapersettings.exe does use the cookies but webscrape.exe not? Because with webscrapersettings.exe it does find the "firstnew.gif".

But if "firstnew.gif" is inside foobar.html why webscrape.exe doesn't use/find it if the config file is working fine with webscrapersettings.exe?

User avatar
diabloNL
Posts: 55
Joined: Mon Feb 26, 2007 6:08 am

Postby diabloNL » Tue Apr 17, 2007 12:26 pm

I'm thinking...yes it's possible :P Anyways, I will make a config file for webscraper.exe that just gets the first 10 topic titles. Since it is a forum the topics with a new post will jump to the beginning of a page and I can monitor it in that way. If I think about it, it will work the same in the end like looking for the "firstnew.gif" when you are logged in to the website. I presume Awasu will still show me when a topic has changed and moved to the top of the page, right?

User avatar
support
Site Admin
Posts: 3022
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Tue Apr 17, 2007 12:32 pm

diabloNL wrote:So you're saying that webscrapersettings.exe does use the cookies but webscrape.exe not?

That's correct :-)

WebScrapeSettings was written by myself using the core Awasu libraries which use Wininet, the part of Windows that handles internet. IE also uses Wininet, so the two programs can share the same cookies.

WebScrape was written by someone else who doesn't work for us (Allan Wilson) in Python which does it's own internet handling i.e. it doesn't get access to IE's Wininet cookies.

User avatar
diabloNL
Posts: 55
Joined: Mon Feb 26, 2007 6:08 am

Postby diabloNL » Tue Apr 17, 2007 12:53 pm

So in our little test, foobar.html was downloaded by Awasu, but webscrape.exe downloaded his own version without "firstnew.gif"?

Ladies and gentlemen, we got him (the cause that is) :P

Thanks for all your help Taka!

User avatar
diabloNL
Posts: 55
Joined: Mon Feb 26, 2007 6:08 am

Postby diabloNL » Tue Apr 17, 2007 8:32 pm

Well I got it working so it will show all titles of the last 10 topics. But as soon as someone posts in a thread and the thread is put on top of the rest of the threads on the page nothing changes in Awasu. If I select "show last feed" the thread is indeed on top of the others but in Awasu's item/summary pane it's not on top or not even showing if the topic is past 10 items.

Is there a way to force Awasu to rebuild the "item/summary" window every time it updates the feed, so that it will take the items and order of the last feed downloaded?


Return to “Awasu - General Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests