Accession Number:



Monitoring Business Activity

Descriptive Note:

Final technical rept. Sep 2001-Oct 2005

Corporate Author:


Personal Author(s):

Report Date:


Pagination or Media Count:



Under this project, the authors studied and developed technologies to score entities to build models that will produce an estimate of the likelihood that an entity exhibits some characteristic. For example, a social network may include malicious individuals. Suspicion scoring assigns a numeric value to each entity in the network, representing the estimated likelihood that the entity is malicious. The authors have focused on scoring entities that are interconnected in some sort of network and on techniques for building and using scoring models when important information is unknown, but may be acquired at a cost. This project has built an integrated toolkit, called Netkit, of methods for scoring networked entities, relaxing the standard assumption that entities to be scored are independent. NetKit has been applied to various benchmark networked data sets, showing that simple methods alone can produce remarkably good scores. Additional development and experimentation was conducted with NetKits Relational Neighbor RN algorithms, which combine a form of guilt-by-association with collective inferencing in which the entire network is scored simultaneously, so that scores of related entities can affect each other. The RN algorithms were applied to the terrorist-world simulation data produced under another project within this program. The Automated Construction of Relational Attributes ACORA system addresses a particular characteristic of building and using scoring models with networked data, and other relational data. Under this project the authors introduced techniques for automatically constructing attributes from high-dimensional categorical attributes, and showed that they can consistently and sometimes dramatically improve modeling and scoring. They also have produced a collection of techniques and results focused on the problem of how to utilize information-gathering resources most cost-effectively, when building and using classificationscoring models.

Subject Categories:

  • Information Science
  • Psychology
  • Statistics and Probability
  • Cybernetics

Distribution Statement: