David Edwards
Posts: 3
Joined: Mon Oct 10, 2005 4:32 pm
Contact:

Postby David Edwards » Mon Oct 10, 2005 4:43 pm

I have had good success with webscrape but I've hit a roadblock with one page. The "preview" button in WebscrapeSettings returns a valid feed but I get the following error from Webscrape.exe:

Code: Select all

Traceback (most recent call last):
  File "<string>", line 95, in ?
  File "<string>", line 34, in encodeText
LookupError: unknown encoding: none


Here is the ini file I'm using:

Code: Select all

[ChannelParameters]
URL=http://www.buzzflash.com/?time=12
Title=BuzzFlash
Description=BuzzFlash Headlines
BaseUrl=http://www.buzzflash.com/
MaxItems=15
Shorthand=
SectionPattern=newstip.html(.*?)links-bar
ItemPattern-1=href="(?P<L>.*?)".*?>(?P<T>.*?)</a>(?P<D>)
ItemPattern-2=
ItemPattern-3=


Can anyone point me in the direction of a solution for this?

Thanks!
David

abwilson
Posts: 247
Joined: Sun Feb 09, 2003 12:36 am
Location: San Francisco, CA -- USA

Postby abwilson » Mon Oct 10, 2005 11:36 pm

The problem is that the Webpage has this Content-Type header: "text/html; charset=none".

WebScrape tries to use the charset given for the page encoding. As reported, "none" is not valid.

Taka, does WebScrapeSettings simply ignore encoding issues or is it picking up the encoding elsewhere (for example, in the META tag)?

Thanks

Allan

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Tue Oct 11, 2005 8:01 am

abwilson wrote:does WebScrapeSettings simply ignore encoding issues or is it picking up the encoding elsewhere (for example, in the META tag)?


Cute problem :-)

WebScrapeSettings downloads the page once and caches it in a temp file, then passes that location of that temp file to the plugin. This is why WebScrapeSettings is working, since no encoding information is being passed.

Obviously this can be fixed by either getting the webmaster to fix their page (if it's incorrect) or tweaking the plugin to workaround this problem.

However, you might be able to fudge a solution like this: write an intermediate plugin that downloads the page, saves it in a temp file and then spawns the the real plugin (passing it the location of the temp file, not the real URL). How's that for creative? :-) Just call me the HackMeister :cool:

David Edwards
Posts: 3
Joined: Mon Oct 10, 2005 4:32 pm
Contact:

Postby David Edwards » Tue Oct 11, 2005 5:46 pm

support wrote:However, you might be able to fudge a solution like this: write an intermediate plugin that downloads the page, saves it in a temp file and then spawns the the real plugin (passing it the location of the temp file, not the real URL). How's that for creative? :-) Just call me the HackMeister :cool:


Thanks! I'll give it a try.

abwilson
Posts: 247
Joined: Sun Feb 09, 2003 12:36 am
Location: San Francisco, CA -- USA

Postby abwilson » Tue Oct 11, 2005 6:28 pm

Glad you have something to try for now... :)

I have revised WebScrape encoding handling as follows:

First, it tries any encoding specified in the Content-Type header; if this is available, it uses it. If it's not available (as in your particular case) WebScrape then looks on the Webpage for a META tag containing a charset spec and uses that; this change makes your scrape work. If there is no META tag charset spec, it defaults to iso-8859-1.

Therefore WebScrape is now more tolerant of a bad Content-Type charset spec; however, if it's bad AND there is a META tag charset that's also bad, WebScrape will fail. I believe this is reasonable behavior.

Unfortunately since I last released WebScrape, my delivery process and procedure have changed, and WebScrape needs some fairly significant code reorganization to fit the new style. This will take me some time, so I can't get a test version to anyone until I've made those changes. I'll try to get to it over the next day or two.

Allan

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Fri Oct 14, 2005 11:12 am

A new version of the WebScrape plugin is now available here.


Return to “Awasu - Extensions”

Who is online

Users browsing this forum: No registered users and 1 guest