Lossadil
Posts: 15
Joined: Fri Jul 09, 2010 12:11 pm

Postby Lossadil » Tue Nov 29, 2011 1:01 pm

Hello,

It seems that awasu doesn't support Cyrillic alphabet, is it normal or maybe something to configure on my computer/awasu ? (http://en.wikipedia.org/wiki/Windows-1251)

The problem occur when I use the export function in txt file.

By exemple, characters including in this url (you have to type it in firefox to see the translation)

= http://ru.wikipedia.org/wiki/%D0%90%D0% ... 0%BE%D0%B2

The problem is for my database... Because of this, some url needs fields of more than 500 characters in order to store them.

Other question : is it possible to export directly this report in UTF-16 or Unicode ? Because Microsoft SQL Server doesn't support UTF8 anymore for 4 years ...

Thanks.

Thanks.

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Tue Nov 29, 2011 1:25 pm

When you say "export", what exactly do you mean?

If you're talking about getting content out of Awasu via a report, yes, it will be UTF-8. But how are you then getting it into SQL Server? The easiest would would probably to take the UTF-8 report and convert it into something SQL Server can handle before importing it.

As an aside, I'm completely astounded that SQL Server doesn't support UTF-8. When I read your post, my first thought "nah, he must be mistaken" but a bit of research suggests that you're right :bigshock:

Lossadil
Posts: 15
Joined: Fri Jul 09, 2010 12:11 pm

Postby Lossadil » Tue Nov 29, 2011 2:43 pm

By "export", I mean "report" results from a channel ... in txt file.

UTF-8 give me in the txt file this kind of character instead of Cyrillic : D0%B5%D0%BA%D1%81%D0%

Beside of that, for the conversion from UTF8 to UTF16, I use a script with iconv.exe, so, it's not the end of the world ... But, off course, I can't convert previous code in cyrillic.

And yes, Microsoft like us so much ... ;-)

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Tue Nov 29, 2011 6:40 pm

OK, leave it with me and I'll take a look at it.

BTW, when you say "D0%B5%D0%BA%D1%81%D0%", do you mean you're seeing these exact characters (with %'s) or do you mean binary bytes? Wouldn't these be the UTF8-encoded Cyrillic characters you're expecting to see? What are you using to look at the generated output?

Lossadil
Posts: 15
Joined: Fri Jul 09, 2010 12:11 pm

Postby Lossadil » Wed Nov 30, 2011 2:35 pm

After some tests, I can tell you it's a really strange problem, even inside windows ...

By exemple, if I take this url (see the conversion when you copy/paste it in firefox) :

http://ru.wikipedia.org/w/index.php?tit ... d=39639343

.. .and I want to copy/paste it directly from Firefox (in real cyrillic) to SQL (in an unicode field), it fails => it gives me these characters "%B1%D1%81%D1%83%D0%" instead of cyrillic characters.

But when I take a copy/paste of a part of the url from Firefox, it works ... I have real cyrillic stuff in my sql field. => ???

Second thing : In the export from Awasu ... a simple report in a file.txt ... I get real cyrillic for "description"/"title"/"author" fields but never for the "url" field ...

So, apparently, awasu only convert cyrilic for the url ...

Finally, it's not the end of the world ... It's maybe normal ... It works for most important fields ...

thanks.

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Wed Nov 30, 2011 6:56 pm

It sounds like something is not quite right here. Please send me the template you are using to generate your report and I'll do some testing with that.

Lossadil
Posts: 15
Joined: Fri Jul 09, 2010 12:11 pm

Postby Lossadil » Thu Dec 01, 2011 10:13 am

Here's ...

{%REPEAT% FeedItems}xyx614xyx{%ITEM-METADATA% name}xyx624xyx{%ITEM-METADATA% url}xyx634xyx{%ITEM-METADATA% Author}xyx644xyx{%ITEM-METADATA% timestamp format="%Y-%m-%d" noCaption}xyx644xyx{%ITEM-METADATA% timestamp format="%H:%M" noCaption}xyx644xyx{%ITEM-METADATA% description}xyx694xyx{%/REPEAT%}

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Thu Dec 01, 2011 6:39 pm

Thanks for this, I'll take a look at it tomorrow.

It shouldn't really make a difference but do you really have all those "xyx" thingies in your template...?

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Fri Dec 02, 2011 8:52 am

Things seem to working correctly.

I created a feed with some Cyrillic text in the item title, description and URL, then generated a report using your template, with an output file with a .TXT extension. The output was generated using UTF-8, and after I told IE to interpret it as UTF-8, everything displayed correctly.

Can you send me an example of one of your output files, and what you're expecting it to contain.

Just as background information, Awasu always encodes everything it outputs as UTF-8 since I figured (incorrectly, as it turns out - thanks Microsoft :roll:) that everyone understands UTF-8. It doesn't check whether it's outputing item titles, descriptions, URL's, etc. - everything gets UTF8'ed. Awasu was written long before internationalized domain names came out and quite frankly, the whole idea gives me the screaming heebie-jeebies :bigshock: If you really are having to deal with internationalized URL's, I think your best bet would be to take the UTF-8 output generated by Awasu, then run a post-processing step (which you're doing anyway) to encode the URL's appropriately. I had a quick look at the IDNA spec and it's definitely not UTF-8. It looks horrendous... :bah: But depending on your circumstances, it may not be necessary to encode it like this, UTF-16 might be enough.


Return to “Awasu - Bug Reports”

Who is online

Users browsing this forum: No registered users and 2 guests