marc.e

Postby marc.e » Fri Oct 17, 2008 4:30 am

I am just looking at your software with the intention of starting a news-flow based service for our clients. I would like to be able to screen certain feeds for a combination of keywords.

Example:

1. I would like to be able to define a basket of ‘trigger’ words, such as {fraud, litigation, class action}
2. Then I would like to set up a universe of ‘combination’ words, such as {renaissance, farallon, caxton}

Now I would like to create a filtered channel which only contains articles that include BOTH at least one trigger word AND at least one combination word.

However, as far as I understand even your advanced search engine would ask me to provide all populated combinations which can become extremely cumbersome and hard to maintain.

How, and with which version, of your software could I implement the above task?

Secondly, If I would like to provide such a filtered stream to our clients what would it cost to have a customized version of Awasu?

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Fri Oct 17, 2008 4:42 am

marc.e wrote:Now I would like to create a filtered channel which only contains articles that include BOTH at least one trigger word AND at least one combination word.

However, as far as I understand even your advanced search engine would ask me to provide all populated combinations which can become extremely cumbersome and hard to maintain.

You should be able to do it with a query like this:

Code: Select all

( fraud OR litigation OR "class action" ) AND ( renaissance OR farallon OR caxton )


Note the use of parentheses to control the order in which each part of the query is processed.

This will work in all versions of Awasu (although search channels are only available in the Pro Edition).

marc.e wrote:Secondly, If I would like to provide such a filtered stream to our clients what would it cost to have a customized version of Awasu?

It depends how you want to send the information to your clients. The easiest way would be to create a report and FTP it up to a web server. The latest beta release (2.3.4) also allows reports to be emailed out. Both these features are available in the Advanced and Pro Editions i.e. you don't need a special customized version.

zakky
Posts: 27
Joined: Fri Oct 17, 2008 10:55 am

Postby zakky » Fri Oct 17, 2008 11:03 am

Hi Taka,

Thanks for answering my question. However, this does not solve the problem of maintenance. I would have about 500 combination words. It
would be nice if there was a possibility to define lists.

Example:

1) Define list one of type OR
a. List members: fraud, litigation, "class action"
2) Define list two of type OR
a. List members: renaissance, farallon, caxton
3) Define a query which uses list one and two and criteria 'AND'

This way I can separately handle any number of lists by topic. And then
create queries by simply combine any number of lists i want using criteria keywords (AND OR NOT ....)

If this is not possible, which would be a pity, would it then be possible to have one agent run a query on list one and then automatically run a second query on the resulting feed with list two? But automized?. Like that I could 'drill down' within the newsflow and condense the results only to relevant material,

Thanks again,

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Fri Oct 17, 2008 11:35 am

zakky wrote:This way I can separately handle any number of lists by topic. And then create queries by simply combine any number of lists i want using criteria keywords (AND OR NOT ....)

I'm guessing you want to be able to use any given list in multiple channels? Otherwise, it'd be fairly easy - just create a channel with one massive query in it. It'd be a bit of a pain to edit but I would just copy it out into Notepad, edit it and then copy it back.

We could whip up something that hacked the channel config files (where the search query is stored) but "hack" would definitely be the operative word here :|

A list-based feature such as the one you suggested is pretty esoteric and I can't imagine too many other people wanting such a thing. For specialized information processing like this, your best bet would probably be to save the content in a MySQL database as it arrives, then write your own tools to query the data there.

zakky wrote:If this is not possible, which would be a pity, would it then be possible to have one agent run a query on list one and then automatically run a second query on the resulting feed with list two? But automized?. Like that I could 'drill down' within the newsflow and condense the results only to relevant material,

You might be able to do something using XSLT which can be used to analyze and munge the incoming XML (where in your case, "munge" = remove any non-matching feed items). XSLT is not exactly my strong point so the MySQL solution would be the way I'd go :-) :oops:

zakky
Posts: 27
Joined: Fri Oct 17, 2008 10:55 am

Postby zakky » Mon Oct 20, 2008 8:33 am

Thanks for the reply, Taka.

Hm... first please note that I'm really new to this.

If possible, I would like to solve the task within Awazu and not have to store data to a database for processing first.

What i suggest (looks like its a suggestion now) is that I would like to be able to have several channels (feeds) running on which new items should be screened not simply using a very large query but by appling lists of keywords.

By being able to define lists its possible to group relevant keywords in a list, say " 'General Electric' 'Jeffrey Immelt' ". Jeffrey Immelt belongs to genelec because he's the ceo and chairman. This way related keywords can be grouped.

Now I would like to be able to create a second list of keywords that relate to fraudulent activity, such as 'fraud, theft, litigation'.

now, I would like to be able to define a search agent rule that says: search all my channels for at least one keyword of list one AND at least one keyword of list two.

That would be very cool, because the resulting 'result channel' could directly be feeded to clients. and its very easy to maintain.

why do you think such a feature wouldn't be popular? maybe i am still misunderstanding something.

thanks,
Marc

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Mon Oct 20, 2008 9:19 am

zakky wrote:now, I would like to be able to define a search agent rule that says: search all my channels for at least one keyword of list one AND at least one keyword of list two.

You can do this using the example query I gave in my first reply above. If each channel has its own different list of keywords, it's no problem. You have to maintain a search query rather than a list of keywords but that's not drastically different (and there's little chance of us ever implementing a list-based search like the one you're suggesting :-) see below).

If it really is an issue, we could write you a utility program that manages the Awasu config files based on lists of keywords. This would be a bit hacky but will work. Send us an email if you're interested and we'll put together a quote.

zakky wrote:That would be very cool, because the resulting 'result channel' could directly be feeded to clients. and its very easy to maintain.

If you want to send the results somewhere, search channels would be the way to go. Search agents simply highlight matching items in the UI but there's no way to export this out of Awasu.

zakky wrote:why do you think such a feature wouldn't be popular? maybe i am still misunderstanding something.

Just guessing. Can you point me to a main-stream application or search engine that does such a thing? :-)

zakky
Posts: 27
Joined: Fri Oct 17, 2008 10:55 am

Postby zakky » Mon Oct 20, 2008 10:19 am

If each channel has its own different list of keywords, it's no problem


i would think that every channel will be searched for the same keywords

If you want to send the results somewhere, search channels would be the way to go.


I see, thats what I will do then.

Just guessing. Can you point me to a main-stream application or search engine that does such a thing?


No, thats why I asked if you can do it :)

User avatar
kevotheclone
Posts: 239
Joined: Mon Sep 08, 2008 7:16 pm
Location: Elk Grove, California

Postby kevotheclone » Mon Oct 20, 2008 10:36 pm

I did a little Python scripting last night and I think I've got something that may do some of what zakky wants:

If you keep your two sets of words/phrases in two seperate ASCII text files this Python script will: 1) OR all of the words/phrases in file #1 together, 2) OR all of the words/phrases in file #2 together, 3) AND the results of file #1 and file #2 and 4) updates the "QueryString" entry of the "[Search Channel Config]" section of the specified CHANNEL file.

This was only tested on my stock copy of Awasu 2.3 Pro.

File #1:

Code: Select all

fraud
litigation
class action


File #2:

Code: Select all

renaissance
farallon
caxton


Builds this search criteria:

Code: Select all

(fraud OR litigation OR "class action") AND (renaissance OR farallon OR caxton)


and updates the QueryString" entry of the "[Search Channel Config]" section of the specified CHANNEL file with the built search criteria.

Code: Select all

import array
import win32api

List1       = "List1.txt"
List2       = "List2.txt"
ChannelFile = "C:\\Documents and Settings\\UserName\\Application Data\\Awasu\\Channels\\MySearchChannel.channel"
Criteria    = ""

def ReadListFile(FilePath):
# initilize local variables
  arr = []
  str = ""

  InputFile = open(FilePath, "r")

# read each line into an array, get rid of EOL character
  for line in InputFile.readlines():
    arr.append(line.replace("\n", ""))

  InputFile.close()

# concatenate each word or phrase into a string delimited with " OR "
# if the line contains a phrase, wrap it in quotation marks
  for i in range(len(arr)):
    if arr[i].find(" ") > 0:
      arr[i] = "\"" + arr[i] + "\""
    str += arr[i] + " OR "

# remove last " OR " and wrap with parenthesis.
  str = "(" + str[0:len(str) - 4] + ")"
  return str

def UpdateChannelFile(ChannelFilePath, NewCriteria):
  win32api.WriteProfileVal("Search Channel Config", "QueryString", NewCriteria, ChannelFilePath)

# main body of program
if __name__ == '__main__':
  Criteria = ReadListFile(List1) + " AND " + ReadListFile(List2)
  UpdateChannelFile(ChannelFile, Criteria)


You'll need to change the file path values of the "List1", "List2" and "ChannelFile" variables (their basically contstants in this script).

zakky I think you'll find that Awasu will get you much closer to your desired solution than any other product on the market at a similar low cost.[/b]

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Tue Oct 21, 2008 1:07 am

kevotheclone wrote:I did a little Python scripting last night and I think I've got something that may do some of what zakky wants

Hey, that's cool :cool:

Thanks :-)

User avatar
kevotheclone
Posts: 239
Joined: Mon Sep 08, 2008 7:16 pm
Location: Elk Grove, California

Postby kevotheclone » Tue Oct 21, 2008 5:54 pm

Thank you Sensei!

There's an old saying about Perl which I think is attributed to Randal L. Schwartz:
Making Easy Things Easy and Hard Things Possible


That's kind of how I feel about Awasu compared to other feed readers. :D

After I posted my code I thought about another variation that kicks the coolness up a notch. Instead of hardcoding in a couple of "Search Criteria" file paths and calling ReadListFile() for each hardcoded "Search Criteria" file path, why not run the script against a single directory containing as many "Search Criteria" files as needed. Each file's search terms/phrases are OR'd together and the results are AND'd together before updating the "QueryString" entry of the "[Search Channel Config]" section of the specified CHANNEL file.

Also a previous ommision on my part was not trimming the leading and trailing whitespace for each line of the file(s). I normally trim leading and trailing whitespace on any user-supplied values, but I'm new to Python and I forgot about it. :oops:

This update includes the trimming code via Python's String.strip() method.

So without further ado:

Code: Select all

import array
import os
import os.path
import win32api

# update "StartingDir" and "ChannelFile" with the correct file paths

StartingDir = "X:\\Awasu\\SearchCriteriaWordLists\\" # Be sure to include the trailing backslash.
ChannelFile = "C:\\Documents and Settings\\UserName\\Application Data\\Awasu\\Channels\\MySearchChannel.channel"
Criteria    = ""

def ReadListFile(FilePath):
# initilize local variables
  arr = []
  str = ""

  InputFile = open(FilePath, "r")

# read each line into an array, strip leading and trailing spaces, and get rid of EOL character
  for line in InputFile.readlines():
    arr.append(line.strip().strip("\n"))

  InputFile.close()

# concatenate each word or phrase into a string delimited with " OR "
# if the line contains a phrase, wrap it in quotation marks
  for i in range(len(arr)):
    if arr[i].find(" ") > 0:
      arr[i] = "\"" + arr[i] + "\""
    str += arr[i] + " OR "

# remove last " OR " and wrap with parenthesis.
  str = "(" + str[0:len(str) - 4] + ")"
  return str

def UpdateChannelFile(ChannelFilePath, NewCriteria):
  win32api.WriteProfileVal("Search Channel Config", "QueryString", NewCriteria, ChannelFilePath)

# main body of program
if __name__ == '__main__':
  for filename in os.listdir(StartingDir):
    filefullpath = os.path.join(StartingDir, filename)
    Criteria += ReadListFile(filefullpath) + " AND "

# remove last " AND "
  Criteria = Criteria[0:len(Criteria) - 5]

# update the channel file
  UpdateChannelFile(ChannelFile, Criteria)
[/u]

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Wed Oct 22, 2008 8:19 am

kevotheclone wrote:There's an old saying about Perl which I think is attributed to Randal L. Schwartz:
Making Easy Things Easy and Hard Things Possible

That's kind of how I feel about Awasu compared to other feed readers. :D

I always try to make my software flexible and it's always a good sign when people use your software in ways that you never even thought of :cool:

kevotheclone wrote:After I posted my code I thought about another variation that kicks the coolness up a notch.

zakky is going to have quite a few channels like this which means he would either need one script per channel, or the script would need to accept some parameters to specify what channel it should work on and where to find the query files.

But your idea gave me a really cool idea. What if the script iterates through all Awasu channels (by processing each <tt>.CHANNEL</tt> file in a directory), looking for search channels. For each search channel it finds, it checks the channel's description for lines that look like this (for example):

Code: Select all

QUERY-FILE = c:\data\file1.txt
QUERY-FILE = c:\data\file2.txt
etc...

It reads the query terms from the specified files, generates the search query and updates the <tt>.CHANNEL</tt> file.

This means that zakky can then manage where the query terms come from for each channel from the Awasu UI. He can update the channel description (in the channel's Properties dialog) for each channel, exit Awasu and run your script, and each channel gets their search query updated. Now, that would be seriously cool :-)

zakky
Posts: 27
Joined: Fri Oct 17, 2008 10:55 am

Postby zakky » Wed Oct 22, 2008 11:21 am

@kevotheclone
@support

Hey cool guys! thanks a lot! I do appreciate your efforts! Now let me try to make sense of what your wrote and how I apply it :)

in any case, thats really cool, open architecture which lets us adjust awasu to our needs. if only i was used to work with python. I am happy i can handle vb.net fairly good enough :)

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Wed Oct 22, 2008 11:58 am

zakky wrote:if only i was used to work with python.

It's not that different. If you install the Python interpreter, everything should Just Work <sup><small>(TM)</small></sup>.

zakky
Posts: 27
Joined: Fri Oct 17, 2008 10:55 am

Postby zakky » Wed Oct 22, 2008 3:09 pm

i'm slowly getting warm to the issue.

If I write my own piece of code, which I could actually add to our analysis software, wouldn't it be sufficient for me to amend the query strings in the channel files?

Idea:

Step 1: User sets up search channels in awasu, without providing query string

Step 2: From within my existing software I provide access to the channel folder, browse through all of them, and provide a GUI where my people can set up the necessary queries easily

step 3: i change the QueryString line in the respective channel file

step 4: search channel is updated in awasu and new items are stored as html or plain text

step 5: report html is sent via email to client

i have a few questions though:

1) does this make sense or am i missing something obvious. i do not want to reinvent the wheel.

2) is it sufficient to simply change the QueryString in each channel file? or do i have to change any other lines, such as SearchLocations and so on?

jeah, well i think thats it for the moment. thank you!

@kevotheclone: I would like to express my sincerest thanks for the code you wrote! I would very much love that such a functionality would be built into awasu for the future. however, if I have to rely on external code I think it makes more sense I write my own, which means i can create a GUI directly within my existing software.

User avatar
support
Site Admin
Posts: 3021
Joined: Fri Feb 07, 2003 12:48 pm
Location: Melbourne, Australia
Contact:

Postby support » Wed Oct 22, 2008 3:41 pm

zakky wrote:Step 2: From within my existing software I provide access to the channel folder, browse through all of them, and provide a GUI where my people can set up the necessary queries easily

Should be OK, as long as you give the user some way of knowing which channel they are working on.

Also, be aware that if you're going to hack config files like this, Awasu must not be running when you do it.

zakky wrote:2) is it sufficient to simply change the QueryString in each channel file? or do i have to change any other lines, such as SearchLocations and so on?

Yep, unless you actually want to change the search locations :roll:


Return to “Awasu - General Discussion”

Who is online

Users browsing this forum: No registered users and 1 guest