Aigents News Monitoring — Tips and Tricks

Aigents with Anton Kolonin
13 min readMar 17, 2018

Introduction

Let’s talk how to use Aigents engine for custom news monitoring. The Aigents engine can be deployed on either demo site https://aigents.com/ or installed at at any other site following the instructions available at http://aigents.com/download/latest/readme.html

In order to do news monitoring with Aigents, few things may be required, as follows:

  • Create Topics and Patterns
  • Create Sites
  • Understand Searching and Monitoring
  • Configure Settings
  • Set-up Integration with Social Networks
  • Control Monitoring
  • Export Channel and RSS feed
  • Refer to References

Aigents Topics and Patterns

Aigents does news monitoring on basis of what we called Topics, where each Topic may have one or more Patterns. Historically, topics in Aigents are identified by word “knows” in Aigents Language, as a topic refers to something that is “known” by user.

Three ways to create Patterns for Topics

There are three ways to create alternative keywords or keyword combinations, as follows. Each of the three ways is possible with chat-style interaction with Aigents engine via Aigents language — can do via “Chat” view of Aigents Demo Web UI at https://aigents.com/. Two of them are possible also possible via “Topics” view of Aigents Demo Web UI.

  1. Create separate topic for each keyword or keyword combination (also doable via Web UI in “Topics” view), e.g.:

My knows “Artificial General Intelligence”, trusts “Artificial General Intelligence”.

My knows AGI, trusts AGI.

2. Create single topic for all keywords or keyword combinations listed in disjunctive set (also doable via Web UI in “Topics” view), e.g.:

My knows “{[Artificial General Intelligence] AGI}”, trusts “{[Artificial General Intelligence] AGI}”.

3. Create single topic for all keywords or keyword combinations as separate patterns attached to topic body (not doable via Web UI), e.g.:

My knows ArtificialGeneralIntelligence, trusts ArtificialGeneralIntelligence.

ArtificialGeneralIntelligence patterns AGI, “Artificial General Intelligence”.

In the cases 1 and above, the “name” of a topic is the “pattern” itself. In case 3, the “name” is just a label and actual patterns are listed in “patterns” property of the topic.

Each topic is referred by “knows” and “trusts” words because of the reason. The “knows” keeps topic in list of items that may be actually searched (if also associated with “trusts”) or suspended for search (if associated with “knows” only). So many topics can be listed as associated with “knows” but only certain subset of them may be actively used for search and respective monitoring — the items associated with “trusts” as well.

Pattern Case Insensitivity

The Aigents patterns are case insensitive. Case sensitivity for patterns may be implemented later with some complication of syntax.

Pattern Structure

Aigents patterns may include the following.

  1. Keywords as individual tokens, e.g.:

intelligence

2. Ordered sets of keywords (skip-grams, with skip distance unlimited within single sentence), separated by spaces or framed with brackets optionally, e.g.:

artificial intelligence

[artificial intelligence]

3. Disjunctive sets of keywords separated by spaces and framed with braces, e.g.:

{ai agi}

4. Disjunctive sets of keywords or ordered sets separated by spaces and framed with braces with ordered sets framed with brackets, e.g.:

{ai agi [artificial intelligence]}

5. Ordered set of anything of the above, e.g.

[artificial {psychology intelligence}]

[{strong general} artificial intelligence]

6. Disjunctive sets of anything of the above, e.g.:

{agi [{strong general} artificial intelligence] [artificial general intelligence]}

7. Anything of the above with some words replaced with regular expressions, e.g.:

/artificial(|ly)$/ /intelligen(t|ce)$/

[{agi [{strong general} artificial intelligence] [artificial general intelligence]} /appl(y|ies|ied|icable)$/ to robotics]

8. Anything of the above with some words replaced with named variables (prefixed with dollar sign), e.g.:

artificial $subject

[{agi [{strong general} artificial intelligence] [artificial general intelligence]} /appl(y|ies|ied|icable)$/ to $application]

9. Anything involving variables above, with variables bound to few pre-defined scopes. The following scopes are pre-defined: email — email address; word — single word, and not any text; daytime — specific time of the day; number — single decimal or floating point number; money — number prefixed with currency symbols. For instance, to get numeric year number into named variable year, the following pattern can be used with scope applied:

singularity will come in $year

In order to apply scope bound to variable $year, the following Aigents Language statements have to be issued, to create scoped variable explicitly and to assign scope to it, using “is” property of the named variable:

My knows “singularity will come in $year”, trusts “singularity will come in $year”.

There is year.

Year is number.

Escaping Quotes

Aigents patterns expressed in Aigents Language and be bounded with either single or double quotes (‘

or “), so if there is a need to have such quotes as part of pattern, they have to be escaped with backslash, e.g. “\”Google\”” or “\’Google\’” or ‘\”Google\”’ or ‘\’Google\’’.

Auto Patterns

In case if there are no multiple patterns attaсhed to topic explicitly, and if there is no variables in pattern represented by topic name, automatic “extended” patterns are produced from “basic” pattern, adding variable $context in front of the “basic” pattern and variable $about after it. During the pattern matching process, such extension is done incrementally, so firs there is full-blown pattern created with both variables added and if no matches is found then the matching process is repeated with with weaker patterns with only one variable.

For example, for pattern “artificial intelligence”, extended pattern “$context artificial intelligence $about” will be created and searched first. If match is found for it, nothing happens, otherwise relaxed pattern “$context artificial intelligence” is created and searched. If the later fails, the last attempt is made for “artificial intelligence $about” pattern.

See quick introductory video which show how to set up the patterns for news monitoring.

Aigents Sites

Aigents does news monitoring for sites, specified in “Sites” view of the Aigents Demo Web UI, where each site corresponds to starting URL, for site crawling. Sites in Aigents are identified by word “sites” in Aigents Language.

Each site is referred by “sites” and “trusts” words because of the reason. The “sites” keeps URL in list of items that may be actually searched (if also associated with “trusts”) or suspended for search (if associated with “sites” only). So many URLs can be listed as associated with “knows” but only certain subset of them may be actively used for search and respective monitoring — the items associated with “trusts” as well.

Aigents Searching and Monitoring

On each Aigents monitoring cycle, for each of the URLs identified here and for each of the topics identified per section above, the following logic is executed, in simple words. The page is searched for occurrence of any pattern of a topic, and all of pattern matchings are returned, with variables filled in with corresponding text.

If no match found on the starting URL adaptive targeted search algorithm executed. Simply speaking, this algorithm runs in one or two passes. On the first pass (called PathTracker phase), if there is known possible sets of sequences of navigation-able Web links from the root page, all of the known link paths are followed in order to get the matches. If no matches are found on the first pass, or there is no known sets of sequences of navigation-able Web links, the second pass is performed. On the second pass (called PathFinder phase), all possible Web link sequences from the starting URL are followed, restricted to Web pages under the same Web domain, limited by configurable “hop limit”. Once page matching any number of patterns is found, the search is completed and the successful sequence of links is remembered as associated with given topic for future use.

Default configurable “hop limit” is set to 1 for regular searches and it is set to 3 for “forced” searches discussed in section dedicated to Aigens Monitoring Control.

When the matching is being performed, besides evaluating patterns of each of the topic against the text, attempt is made to suppress repeating news. This is done so that text obtained with non-variable part of the pattern and contents of variables filled from text is checked whether identical text has been matched in the same day or any day preceding given day, if no any other texts for the same topic is matched later. If such repetitive match is found, the new match is discarded. Note that, in Aigents version 1.2.3 described feature is not working quite properly, so duplicates appear in monitoring results, it is expected to have this fixed shortly.

After all, for each of non-repetitive matches, news object is created with the following properties filled:

  • is — refers to topic;
  • sources — refers to matching page URL which can be either starting URL corresponding to value in list of “sites”, or it can be URL in the same domain with starting URL;
  • times — contains day when the news object has been created
  • new — user-specific attribute, with default true value indicating that news item has been found and can be displayed in the “News” view of Aigents Demo Web UI, false value may be turned by user to hide the item;
  • trust — user-specific attribute, with default false value indicating that user has not explicitly indicated relevance and trustability of this news item yet, true value may be turned by user to provide feedback for the further use of machine learning.

Further, per each Aigents monitoring cycle for each of the users, or by explicit request by a user, total number of news items per user is compacted down to “news limit” items per topic. To do so, machine learning is applied to rank all news items for each of the topics per user accordingly to the the “personal relevance”, computed on basis of positive feedback given to earlier news by the same user (rendered as blue bar in Aigents Demo Web UI). Besides the “personal relevance”, if the user has trusted friend connections, “social relevance” is computed for the same news items, based on respective feedback given by other user’s friends. After then, both relevances are combined and all news items that are have property new=true are sorted accordingly. For the sorted list, only the top “news limit” items with new=true are retained per topic and the rest are discarded.

See the following video on details how the adaptive targeted search is performed behind the scene.

Aigents Settings

Aigents settings for user profile involve four properties affecting monitoring experience for end user, here they are:

  • news limit — maximum number of news items with new=true property allowed to keep for given user;
  • items limit — maximum number of either topics or sites allowed to keep for given user;
  • trusts limit — maximum number of either topics or sites marked with trust (listed in “trusts” list for a user) allowed to keep for given user;
  • check cycle — number of hours between monitoring cycles for given user.

The limits for items and trusts are introduced for the case when Aigents Integration with Social Networks (described further) proactively searches for new topics and sites and adds it to the user profile behind the scene.

Aigents Integration with Social Networks

Any user can have its Aigents account integrated with his or her account on one of social networks, in version 1.2.3 including Facebook, Google+, VKontakte, Steemit and Golos. For Facebook, Google+ and VKontakte, it can be done via Aigents Demo Web UI upon registration via social network or after registration by email, with use of corresponding icons in the top-right corner of the Web UI. For Steemit and Golos, it can be done via user settings.

If user has their account integrated with one of these social networks, the lists of topics and sites are updated from these social networks on daily basis, based on posts and comments that user post and like or vote for them. Within this process, posts and comments authored by users or marked by them as liked or voted for are used to extract URLs to add them to the list of sites. Also, patterns of words found in these posts and comments are used to create topics and their patterns as well. All additions are marked as trusted if limits permit. Such automatic addition of topics and sites is limited by settings such as “trusts limit” and “items limit”.

See more on Aigents integration with social networks in the following video.

Aigents Monitoring Control

By default, Aigents server consider minimum “check cycle” settings across all registered users and performs crawling all sites and all topics for all users that were active during last month. Also, for any site that has its URL clicked by user in Aigents Demo Web UI, the spidering is executed ad hoc, not waiting for completion of “check cycle”. More control on the crawling, searching and monitoring for particular sites and topics is discussed below.

Should be noted, that behavior for site crawling spidering under “manual” mode described below in version 1.2.3 is different to crawling behavior in regular “automated” mode. Specifically, in this “manual” mode “hop limit” for site spidering is equal to 3 while in “automated” mode it is equal to 1, so “manual” mode may potentially find more news items than “automated” one.

The following interactive communications in Aigents Language can be performed to do the following things manually:

  • Crawl/spider particular site for all topics “trusted” by user
  • Crawl/spider particular site for specific topic
  • Get current text of the site
  • Get news for particular site and/or topic
  • Clear found news

Crawl/spider particular site for specific topic

To do so, need both topic and site to be listed and trusted in “Topics” and “Sites” views of Aigents Demo Web UI, then enter the statement, having site url quoted in single or double quotes:

You read <quoted topic> in <site url>!

For instance:

You read ‘intelligent’ in ‘https://aigents.com/'!

You will get reply:

My reading <quoted topic> in <site url>.

For instance

My reading intelligent in https://aigents.com/.

Crawl/spider particular site for all topics “trusted” by user

To do so, need all topics and site to be listed and trusted in “Topics” and “Sites” views, then enter the statement, having site url quoted in single or double quotes:

You reading url <site url>!

For instance:

You read url “https://aigents.com/"!

You will get reply:

My reading site <site url>.

For instance

My reading site https://aigents.com/.

Note that for this case it does spidering longer in background, so results may appear later. In this case, greater value of “hop limit” is used, compared to case with specific topic as specified in previous section.

Get current text of the site

To do so, just enter the statement:

What is <site url> text?

For instance:

What is https://aigents.com/ text?

You will get reply:

There text <quoted text>.

For instance:

There text ‘aigents topics sites news …’.

Get news for particular site and/or topic

Normally, all found news will be rendered in “News” view and filtered with text input field. However, more fine grained control on found news returned may be achieved by making direct requests involving the following properties of news items — used as query parameters and output properties both:

  • times — date in YYYY-MM-DD format, “today” and “yesterday” words may be also used;
  • sources — url of the site containing the news text;
  • new — true or false value representing if the news item is appearing in the “News” view of in Aigents Demo Web UI;
  • trust — true or false value indicating if the news item is “trusted” so it can be used to compute “personal relevance” for other items;
  • text — text of the news items;
  • variables of the news objects, depending on actual patterns, with “context” and “about” variables used silently for patterns that don’t have explicit variables.

For instance, to get properties of times, new, trust and text for all news items corresponding to topic named ‘intelligent’, ask the following:

What is ‘intelligent’ times, new, trust, text?

For another instance, to get just text of all news items corresponding to topic named ‘intelligent’ for today only and appearing in “News” view, ask this:

What is ‘intelligent’, times today, new true text?

For third example, to get all values of “about” (implicit) variable for all news under https://aigents.com/ site, ask the following question:

What sources https://aigents.com/ about?

Clear found news

In some cases, especially when testing topics and patterns for sites, it may be useful to remove news items currently present and displayed in “News” view of Aigents Demo Web UI. To remove news items, two things are required:

  1. First, news items should be removed from “News” view of given user so they can be garbage collected.
  2. Second, the actual garbage collection should be enforced manually.

Two operations above can be performed in few different ways.

For one example, let us remove news items for particular topic — first statement below makes news items not new and not visible for user and second statement makes them not existent.

Is ‘intelligent’ new false.

No there is ‘intelligent’.

For another example, if there are multiple news items across different topics for specific site, it can be done on basis of the site, not a topic, like in the following example:

Sources https://aigents.com/ new false.

No there sources https://aigents.com/.

Note that removal of news items in “News” view of Aigents Demo Web UI only clears “new” attribute of news objects, but the objects themselves remain invisible under the hood. That is, it you remove them from user interface and try to find again, they will be not appearing because existing objects marked as not new are not found over and over again. So you need to explicitly force garbage collection of objects with “no there …” statement so they can be completely forgotten and found again on the next crawl operation.

In real production multi-user environment, clearing found news may be not trivial in case if the same topics and sites are used by different users and/or you are sharing your news to other users (via “Friends” view in Aigents Demo Web UI) and these users are “trusting” you to get your news shared. In such cases, the news items may get shared across users and garbage collector protection may prevent clearing such news using the above directives. Under such circumstances, the news items can be removed from “News” view only but can not deleted from memory.

Aigents Channel and RSS feed

Any Aigents user can create personalized news channel and RSS feed for public use on any site, powered by Aigents. To do so, user just create named property “areas” with unique name and make indicate that he or she “shares” this area. This can be done using Aigents Language only, for instance — via “Chat” view of Aigents Demo Web UI.

For example, if user wants to create channel and RSS feed for certain area of interest, say “agi”, they can do that on Aigents Demo Website https://aigents.com/ easily. To do so, one just would create topics and sites as described above and then issue two following commands in “Chat” view.

My areas agi.

My shares agi.

After then, https://aigents.com/#agi would serve as a public news channel for non-authorized users, and https://aigents.com/al?rss%20agi would serve as source of respective RSS feed.

See short demonstration video on making Aigents channel and RSS feed quickly.

Aigents References

Following references provide more information on Aigents capabilities for news monitoring.

Read more on Aigents at https://medium.com/@aigents and https://steemit.com/@aigents, view Aigents video at https://www.youtube.com/aigents, and follow https://www.facebook.com/aigents.

--

--

Aigents with Anton Kolonin

Creating personal artificial intelligence and agents of collective intelligence for individuals and small businesses.