User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Sun Jul 30, 2017 11:59 pm

awasu.user wrote:first channel finish - get data from here to Awasu
n channel finish - get data from here to Awasu
end of the script execution

Write your script to generate one RSS file for each channel.
Create a channel in Awasu for each file.

awasu.user wrote:I use in generation code between tags:

Code: Select all

 <![CDATA[  some text  ]]>

Creating a CDATA block is not 100% reliable e.g. if the content is talking about how to use CDATA blocks, you will end up with nested CDATA blocks, which doesn't work :) Microsoft smart quotes will also break things. Post the failing bit of content and I'll see if I can spot the error.

awasu.user
Posts: 81
Joined: Fri Jan 06, 2017 12:50 pm

Postby awasu.user » Mon Jul 31, 2017 5:19 am

For test I use code:

rss_data = u'''<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">

<channel>
<title><![CDATA[W3Schools Home Page]</title>
<link><![CDATA[https://www.w3schools.com]</link>
<description><![CDATA[PL characters: ąęłńóźć! ?!Free web building tutorials]</description>
<item>
<title><![CDATA[RSS Tutorial]</title>
<link><![CDATA[https://www.w3schools.com/xml/xml_rss.asp]</link>
<description><![CDATA[New RSS tutorial on W3Schools]</description>
</item>
<item>
<title><![CDATA[XML Tutorial]</title>
<link><![CDATA[https://www.w3schools.com/xml]</link>
<description><![CDATA[New XML tutorial on W3Schools]</description>
</item>
</channel>

</rss>
'''
print(rss_data)


In Awasu I get:

Code: Select all

XML parse failed (4:L7:C39): not well-formed (invalid token)


data from Channel Feed in Awasu:
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">

<channel>
<title><![CDATA[W3Schools Home Page]</title>
<link><![CDATA[https://www.w3schools.com]</link>
<description><![CDATA[PL characters: �����! ?!Free web building tutorials]</description>
<item>
<title><![CDATA[RSS Tutorial]</title>
<link><![CDATA[https://www.w3schools.com/xml/xml_rss.asp]</link>
<description><![CDATA[New RSS tutorial on W3Schools]</description>
</item>
<item>
<title><![CDATA[XML Tutorial]</title>
<link><![CDATA[https://www.w3schools.com/xml]</link>
<description><![CDATA[New XML tutorial on W3Schools]</description>
</item>
</channel>

</rss>


I tried

Code: Select all

print(bytes(rss_data, "utf-8"))
, but it has on the string beginning python bytes indicator (b'). I tried too

Code: Select all

from __future__ import print_function
import sys

def safeprint(s):
    try:
        print(s)
    except UnicodeEncodeError:
        if sys.version_info >= (3,):
            print(s.encode('utf8').decode(sys.stdout.encoding))
        else:
            print(s.encode('utf8'))

safeprint(rss_data)


and is the same problem.

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Mon Jul 31, 2017 6:25 am

You haven't closed the CDATA sections properly :-)

When you're having problems like this, one trick you can use is to save the output to a file, then open it in a browser - it might give you more clues as to what the problem is. Also, if I open the XML up in Notepad++, the syntax highlighting tells me something is wrong.

awasu.user
Posts: 81
Joined: Fri Jan 06, 2017 12:50 pm

Postby awasu.user » Mon Jul 31, 2017 7:37 am

Oops! You've right! :oops: I closed tag correctly and still can get right output. Problem is with putting UTF-8 to console. On test charset characters Awasu is talking about mistake... :wall: I'm looking for alternative to print to write output.

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Mon Jul 31, 2017 7:50 am

awasu.user wrote:Problem is with putting UTF-8 to console. On test charset characters Awasu is talking about mistake... :wall: I'm looking for alternative to print to write output.

I talked about this in the big Unicode tutorial. Your safeprint() function is not really the way to go, take a look at my print_utf8().

awasu.user
Posts: 81
Joined: Fri Jan 06, 2017 12:50 pm

Postby awasu.user » Mon Jul 31, 2017 9:20 am

Your function in my code:

Code: Select all

import sys

def print_utf8( val ) :
    sys.stdout.buffer.write( val.encode( "utf-8" ) )
    sys.stdout.buffer.write( b"\n" )
   
rss_data = u'''<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">

<channel>
  <title><![CDATA[W3Schools Home Page]</title>
  <link><![CDATA[https://www.w3schools.com]</link>
  <description><![CDATA[PL characters: ąęłńóźć! ?!Free web building tutorials]</description>
  <item>
    <title><![CDATA[RSS Tutorial]</title>
    <link><![CDATA[https://www.w3schools.com/xml/xml_rss.asp]</link>
    <description><![CDATA[New RSS tutorial on W3Schools]</description>
  </item>
  <item>
    <title><![CDATA[XML Tutorial]</title>
    <link><![CDATA[https://www.w3schools.com/xml]</link>
    <description><![CDATA[New XML tutorial on W3Schools]</description>
  </item>
</channel>

</rss>
'''
print_utf8(rss_data)


gets me error:
sys.stdout.buffer.write( val.encode( "utf-8" ) )
AttributeError: 'PseudoOutputFile' object has no attribute 'buffer'


I'm start digging in python docs to find more...

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Mon Jul 31, 2017 10:06 am

Works for me (Python 3.6.1).

awasu.user
Posts: 81
Joined: Fri Jan 06, 2017 12:50 pm

Postby awasu.user » Mon Jul 31, 2017 10:38 am

It's strange. I use the same version on Win7 and this code make me error :(

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Mon Jul 31, 2017 10:49 am

Do you have the "official" Python distribution (from python.org), or something else e.g. ActiveState's

Start Python from the command line, and tell me what this gives you:

Code: Select all

import sys
sys.version
type(sys.stdout.buffer)

awasu.user
Posts: 81
Joined: Fri Jan 06, 2017 12:50 pm

Postby awasu.user » Mon Jul 31, 2017 12:44 pm

Official from python.org and ipython installed by pip (for jupyter notebook).

Result typing is:

Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import sys
>>> sys.version
'3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)]'
>>> type(sys.stdout.buffer)
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
type(sys.stdout.buffer)
AttributeError: 'PseudoOutputFile' object has no attribute 'buffer'
>>>

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Mon Jul 31, 2017 1:00 pm

My misteak, I should've asked for

Code: Select all

type(sys.stdout)
I get io.TextIOWrapper, you seem to have PseudoOutputFile, and given that the traceback points to a file called pyshell, I rather suspect it's ipython that's screwing things up (at a guess, I'd say it's capturing the output so it can do something with it).

There are other ways of outputing UTF8, just Google around a bit.

awasu.user
Posts: 81
Joined: Fri Jan 06, 2017 12:50 pm

Postby awasu.user » Mon Jul 31, 2017 1:10 pm

support wrote:My misteak, I should've asked for


I change and I got:

Code: Select all

type(sys.stdout.buffer)
<class '_io.BufferedWriter'>


Eh, googling again.

awasu.user
Posts: 81
Joined: Fri Jan 06, 2017 12:50 pm

Postby awasu.user » Mon Jul 31, 2017 1:56 pm

Only working solution - change charstet to Windows. In Awasu no errors, characters coding ok.


Return to “Awasu - Extensions”

Who is online

Users browsing this forum: No registered users and 4 guests