Difference between revisions of "Automatic Feed Translation"

From AwasuWiki
Jump to: navigation, search
(Error Reporting=)
(Feed language)
Line 19: Line 19:
 
                               xmlns:content="http://purl.org/rss/1.0/modules/content/">
 
                               xmlns:content="http://purl.org/rss/1.0/modules/content/">
 
   <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
 
   <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   <msxsl:script language="JScript" implements-prefix="ktc"><![CDATA[
+
   <msxsl:script language="JScript" implements-prefix="ktc">
 +
  <![CDATA[
 
   /*
 
   /*
 
   Name: AutoTranslate.xsl
 
   Name: AutoTranslate.xsl
  
   Description: Automatically translates Atom and RSS feed items received in Awasu to your language using Google's translation service.
+
   Description: Automatically translates Atom and RSS feed items received in Awasu to your language using  
 +
  Google's translation service.
  
 
   Author: kevotheclone (http://www.awasu.com/forums/profile.php?mode=viewprofile&amp;u=24618)
 
   Author: kevotheclone (http://www.awasu.com/forums/profile.php?mode=viewprofile&amp;u=24618)
Line 59: Line 61:
 
   var cache = {}; // Case-sensitive results cache.
 
   var cache = {}; // Case-sensitive results cache.
 
   var chunks;    // Array of Strings, for breaking up large text into multiple HTTP calls.  
 
   var chunks;    // Array of Strings, for breaking up large text into multiple HTTP calls.  
 +
  var chunk;      // A single
  
 
   var WshShell = new ActiveXObject("WScript.Shell"); // For logging error messages.
 
   var WshShell = new ActiveXObject("WScript.Shell"); // For logging error messages.
Line 67: Line 70:
 
     try
 
     try
 
     {
 
     {
       textToTranslate = textToTranslate.replace(/^\s+|\s+$/g,"");             // Remove leading and trailing whitespace.
+
       //textToTranslate = textToTranslate.replace(/^\s+|\s+$/g,"");           // Remove leading and trailing whitespace.
       textToTranslate = textToTranslate.replace(/^\s*|\s(?=\s)|\s*$/g," "); // Replace repeated spaces, newlines and tabs with a single space.
+
       //textToTranslate = textToTranslate.replace(/^\s*|\s(?=\s)|\s*$/g," "); // Replace repeated spaces, newlines and tabs with a single space.
  
       if (cache[textToTranslate])         // It's it in the cache,
+
       if (cache[textToTranslate])     // It's it in the cache,
 
         return cache[textToTranslate]; // return it.
 
         return cache[textToTranslate]; // return it.
  
 
       if (textToTranslate)
 
       if (textToTranslate)
 
       {
 
       {
 
+
        /*
 +
        Split the text up into 2000-4000 character "chunks", only breaking on a word boundary,
 +
        and don't forget the final chunk of text regardless of it's size (.+$).
 +
        */
 
         chunks = textToTranslate.match(/.{2000,4000}\b|.+$/g);
 
         chunks = textToTranslate.match(/.{2000,4000}\b|.+$/g);
 
 
         for (var i=0; i<chunks.length; i++)
 
         for (var i=0; i<chunks.length; i++)
 
         {
 
         {
           xmlhttp.open("POST", baseURL, false);
+
           chunk = chunks[i];
          xmlhttp.setRequestHeader("Referer", "http://www.awasu.com/forums/profile.php?mode=viewprofile&amp;u=24618");
+
           if (chunk)
          xmlhttp.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
 
          xmlhttp.send("v=1.0&q=" + encodeURIComponent(chunks[i]) + "&langpair=" + feedLanguage + "|" + yourLanguage);
 
           eval("var response = " + xmlhttp.responseText);
 
          /*
 
          If successful, concatenate the translated chunk of text to our output buffer,
 
          else, if any chunk fails return the original untranslated text.
 
          */
 
          if (xmlhttp.status == 200)
 
 
           {
 
           {
             if (response.responseStatus == 200) // check responseStatus, 200=success, 400=error
+
            xmlhttp.open("POST", baseURL, false);
              translatedText += encodeURI(response.responseData.translatedText);
+
            xmlhttp.setRequestHeader("Referer", "http://www.awasu.com/forums/profile.php?mode=viewprofile&amp;u=24618");
 +
            xmlhttp.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
 +
            xmlhttp.send("v=1.0&q=" + encodeURIComponent(chunk) + "&langpair=" + feedLanguage + "|" + yourLanguage);
 +
            eval("var response = " + xmlhttp.responseText);
 +
 
 +
            /*
 +
            If successful, concatenate the translated chunk of text to our output buffer,
 +
            else, if any chunk fails return the original, complete untranslated text.
 +
            */
 +
             if (xmlhttp.status == 200)
 +
            {
 +
              if (response.responseStatus == 200) // check responseStatus, 200=success, 400=error
 +
                translatedText += encodeURI(response.responseData.translatedText);
 +
              else
 +
              {
 +
                WshShell.LogEvent(1, "Application: Awasu Auto Translate XSLT\r\n\r\n" +
 +
                                    "Error from Google's Translate service.\r\n\r\nStatus code and description:\r\n" +
 +
                                    response.responseStatus + " - " + response.responseDetails +
 +
                                    "\r\n\r\nText to translate:\r\n" + textToTranslate);
 +
                return textToTranslate;
 +
              }
 +
            }
 
             else
 
             else
 
             {
 
             {
               WshShell.LogEvent(1, "Error from Google's Translate service.\r\n\r\nStatus code and description:\r\n" +  
+
               WshShell.LogEvent(1, "Application: Awasu Auto Translate XSLT\r\n\r\n" +
                                   response.responseStatus + " - " + response.responseDetails + "\r\n\r\nText to translate:\r\n" + textToTranslate);
+
                                  "Error connecting to Google's translate service.\r\n\r\nHTTP Status code and description:\r\n" +  
 +
                                   xmlhttp.status + " - " + xmlhttp.statusText);
 
               return textToTranslate;
 
               return textToTranslate;
 
             }
 
             }
 
           }
 
           }
 
           else
 
           else
          {
+
             return "";
             WshShell.LogEvent(1, "Error connecting to Google's translate service.\r\n\r\nHTTP Status code and description:\r\n" +
 
                                xmlhttp.status + " - " + xmlhttp.statusText);
 
            return textToTranslate;
 
          }
 
 
         }
 
         }
 
         cache[textToTranslate] = decodeURI(translatedText); // Add the results to the cache.
 
         cache[textToTranslate] = decodeURI(translatedText); // Add the results to the cache.
Line 118: Line 133:
 
     catch(e)
 
     catch(e)
 
     {
 
     {
       cache[textToTranslate] = e;
+
      WshShell.LogEvent(1, "Application: Awasu Auto Translate XSLT\r\n\r\n" +
       return e;
+
                          "Script error code - type - description:\r\n" +
 +
                          e.number + " - " + e.name + " - " + e.message);
 +
       cache[textToTranslate] = "Script error code and description: " + e.number + " - " + e.name + " - " + e.message;
 +
       return textToTranslate;
 
     }
 
     }
 
   }
 
   }
 
+
   ]]>
  function itsBetterWithBacon(textToTranslate)
+
  </msxsl:script>
  {
 
    return "bacon: " + textToTranslate + " :bacon";
 
   }
 
 
 
]]></msxsl:script>
 
 
   <xsl:template match="node()|@*">
 
   <xsl:template match="node()|@*">
 
     <xsl:copy>
 
     <xsl:copy>
Line 137: Line 150:
 
    
 
    
 
   <!-- Atom-based template match patterns -->
 
   <!-- Atom-based template match patterns -->
   <xsl:template match="/atom:feed/atom:title | /atom:feed/atom:subtitle | /atom:feed/atom:entry/atom:title | /atom:feed/atom:entry/atom:content | /atom:feed/atom:entry/atom:category/@term">
+
   <xsl:template match="/atom:feed/atom:title |  
 +
                      /atom:feed/atom:subtitle |  
 +
                      /atom:feed/atom:entry/atom:title |  
 +
                      /atom:feed/atom:entry/atom:content |  
 +
                      /atom:feed/atom:entry/atom:category/@term">
 
     <xsl:call-template name="translateText"/>
 
     <xsl:call-template name="translateText"/>
 
   </xsl:template>
 
   </xsl:template>
  
 
   <!-- RSS-based template match patterns -->
 
   <!-- RSS-based template match patterns -->
   <xsl:template match="/rss/channel/title | /rss/channel/description | /rss/channel/item/title | /rss/channel/item/description | /rss/channel/item/content:encoded | /rss/channel/item/category">
+
   <xsl:template match="/rss/channel/title |  
 +
                      /rss/channel/description |  
 +
                      /rss/channel/item/title |  
 +
                      /rss/channel/item/description |  
 +
                      /rss/channel/item/content:encoded |  
 +
                      /rss/channel/item/category">
 
     <xsl:call-template name="translateText"/>
 
     <xsl:call-template name="translateText"/>
 
   </xsl:template>
 
   </xsl:template>

Revision as of 07:56, 21 January 2010

Automatic Feed Translation

Using the XSLT file listed below you can automatically translate a foreign language feed into your language, utilizing Google's Translation API, whenever Awasu updates the associated Channel. Here is the list of supported languages.

Customising the XSLT file

Your language

There is a constant in the XSLT file called "yourLanguage" which you can change to a specific two-character language code of your language. If your language is English, the XSLT file below should work well without any modification.

Feed language

There is a constant in the XSLT file called "feedLanguage" which you can change to a specific two-character language code if you know the language of the feed; this will improve Google's language translation accuracy. Leaving the "feedLanguage" constant blank (empty quotation marks: "") should still work as Google will attempt to guess the feed's langauge. More intomation about setting the feed langiage is provided below under the "Error Reporting" section.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:msxsl="urn:schemas-microsoft-com:xslt"
                              xmlns:ktc="http://www.awasu.com/forums/profile.php?mode=viewprofile&u=24618"
                              xmlns:atom="http://www.w3.org/2005/Atom"
                              xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <msxsl:script language="JScript" implements-prefix="ktc">
  <![CDATA[
  /*
  Name: AutoTranslate.xsl

  Description: Automatically translates Atom and RSS feed items received in Awasu to your language using 
  Google's translation service.

  Author: kevotheclone (http://www.awasu.com/forums/profile.php?mode=viewprofile&u=24618)

  For additional documentation: http://www.awasu.com/wiki/Automatic_Feed_Translation
  */

  // Constants...

  // Don't edit these two constants until Google says otherwise.
  var baseURL = "http://ajax.googleapis.com/ajax/services/language/translate";
  var version = "v=1.0";

  /*
  Change this constant (feedLanguage) to the two-character language code of the feed (if known).
  This will improve Google language translation accuracy.
  
  Leave is blank (empty quotation marks: "") if the feed language is unknown or 
  the feed contains multiple languages. Google will attempt to guess the langauge
  each time the translateLang() function is called.
  
  Supported language codes are listed here:
  http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray
  */
  var feedLanguage = "";
  
  /*
  Change this constant (yourLanguage) to your desired two-character language code.
  Supported language codes are listed here:
  http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray
  */
  var yourLanguage = "en";

  var xmlhttp = new ActiveXObject("Msxml2.XMLHTTP.4.0");
  var cache = {}; // Case-sensitive results cache.
  var chunks;     // Array of Strings, for breaking up large text into multiple HTTP calls. 
  var chunk;      // A single 

  var WshShell = new ActiveXObject("WScript.Shell"); // For logging error messages.

  function translateText(textToTranslate) 
  {
    var translatedText = "";
    try
    {
      //textToTranslate = textToTranslate.replace(/^\s+|\s+$/g,"");           // Remove leading and trailing whitespace.
      //textToTranslate = textToTranslate.replace(/^\s*|\s(?=\s)|\s*$/g," "); // Replace repeated spaces, newlines and tabs with a single space.

      if (cache[textToTranslate])      // It's it in the cache,
        return cache[textToTranslate]; // return it.

      if (textToTranslate)
      {
        /*
        Split the text up into 2000-4000 character "chunks", only breaking on a word boundary,
        and don't forget the final chunk of text regardless of it's size (.+$).
        */
        chunks = textToTranslate.match(/.{2000,4000}\b|.+$/g);
        for (var i=0; i<chunks.length; i++)
        {
          chunk = chunks[i];
          if (chunk)
          {
            xmlhttp.open("POST", baseURL, false);
            xmlhttp.setRequestHeader("Referer", "http://www.awasu.com/forums/profile.php?mode=viewprofile&u=24618");
            xmlhttp.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
            xmlhttp.send("v=1.0&q=" + encodeURIComponent(chunk) + "&langpair=" + feedLanguage + "|" + yourLanguage);
            eval("var response = " + xmlhttp.responseText);

            /*
            If successful, concatenate the translated chunk of text to our output buffer,
            else, if any chunk fails return the original, complete untranslated text.
            */
            if (xmlhttp.status == 200)
            {
              if (response.responseStatus == 200) // check responseStatus, 200=success, 400=error
                translatedText += encodeURI(response.responseData.translatedText);
              else
              {
                WshShell.LogEvent(1, "Application: Awasu Auto Translate XSLT\r\n\r\n" +
                                     "Error from Google's Translate service.\r\n\r\nStatus code and description:\r\n" + 
                                     response.responseStatus + " - " + response.responseDetails + 
                                     "\r\n\r\nText to translate:\r\n" + textToTranslate);
                return textToTranslate;
              }
            }
            else
            {
              WshShell.LogEvent(1, "Application: Awasu Auto Translate XSLT\r\n\r\n" +
                                   "Error connecting to Google's translate service.\r\n\r\nHTTP Status code and description:\r\n" + 
                                   xmlhttp.status + " - " + xmlhttp.statusText);
              return textToTranslate;
            }
          }
          else
            return "";
        }
        cache[textToTranslate] = decodeURI(translatedText); // Add the results to the cache.
        return decodeURI(translatedText);
      }
      else // The element's value is a null string.
      {
        cache[textToTranslate] = "";
        return "";
      }
    }
    catch(e)
    {
      WshShell.LogEvent(1, "Application: Awasu Auto Translate XSLT\r\n\r\n" +
                           "Script error code - type - description:\r\n" + 
                           e.number + " - " + e.name + " - " + e.message);
      cache[textToTranslate] = "Script error code and description: " + e.number + " - " + e.name + " - " + e.message;
      return textToTranslate;
    }
  }
  ]]>
  </msxsl:script>
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
  
  <!-- Atom-based template match patterns -->
  <xsl:template match="/atom:feed/atom:title | 
                       /atom:feed/atom:subtitle | 
                       /atom:feed/atom:entry/atom:title | 
                       /atom:feed/atom:entry/atom:content | 
                       /atom:feed/atom:entry/atom:category/@term">
    <xsl:call-template name="translateText"/>
  </xsl:template>

  <!-- RSS-based template match patterns -->
  <xsl:template match="/rss/channel/title | 
                       /rss/channel/description | 
                       /rss/channel/item/title | 
                       /rss/channel/item/description | 
                       /rss/channel/item/content:encoded | 
                       /rss/channel/item/category">
    <xsl:call-template name="translateText"/>
  </xsl:template>

<!-- Named template that calls the embedded JavaScript function that translates the text -->
  <xsl:template name="translateText">
    <xsl:variable name="elementName" select="name()"/>
    <xsl:element name="{name(.)}">
      <xsl:value-of select="ktc:translateText(normalize-space(.))"/>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>

Error Reporting

Due to the nature of XSLT processors, it is impossible to output error information in such a way that Awasu could capture it and display it in the Channel Properties dialog box. So as an alternative error reporting mechanism, any errors will be logged in the Windows "Application" Event Log. Two types of errors are reported: 1) Problems connecting to the Google Translation service, and 2) problems reported by the the Google Translation service.

These events will have "WSH" as the "Source": AutoTranslateEventViewer.jpg

Double-clicking on one of the events will show the detail error code and description of the problem: AutoTranslateEventProperties.jpg

The most common problem I encountered when testing this XSLT was a "400" error code from Google with this description "could not reliably detect source language". By setting the feed language to a specific two-character language code I was able to eliminate this type of error completely. If a feed item fails to translate you can manually translate it using these [Send to/User tools].

So you may want to keep the AutoTranslate.xsl file with the field language blank, but make copies of it and add specific two-character language code of the languages that you translate the most. You might rename the copied files something like this: AutoTranslate_ES_to_EN.xsl (Spanish to English), AutoTranslate_PL_to_EN.xsl (Polish to English), etc.