Improving Anonymized Search Relevance with Natural Language Processing and Machine Learning

Petrocelli, Niko A

Improving Anonymized Search Relevance with Natural Language Processing and Machine Learning

Active / Technical Report | Accesssion Number: AD1166917 |

Open PDF

Abstract:

Users often sacrifice personal data for more relevant search results, presenting a problem to communities that desire both search anonymity and relevant results. To balance these priorities, this research examines the impact of using Siamese networks to extend word embeddings into document embeddings and detect similarities between documents. The predicted similarity can locally re-rank search results provided from various sources. This technique is leveraged to limit the amount of information collected from a user by a search engine. A prototype is produced by applying the methodology in a real-world search environment. The prototype yielded an additional function of finding new documents related to a provided sample document. The prototype is evaluated using real-world search examples. Results indicate that the Siamese network can produce document embeddings superior to current encoders like the Universal Sentence Encoder. Results also show the promising performance of the prototype in improving search relevancy while limiting user data transmission.

Author(s):

Petrocelli, Niko A

Author Organization(s):

AIR FORCE INSTITUTE OF TECHNOLOGY WRIGHT-PATTERSON AFB OH

Funding Organization(s):

AIR FORCE INSTITUTE OF TECHNOLOGY WRIGHT-PATTERSON AFB OH, WRIGHT-PATTERSON AFB , OH

Document Type:

Technical Report/Master's Thesis

Publication Date:

2022 Mar 24

Pagination:

104

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution Code:

A - Approved For Public Release

Distribution Statement: Public Release

RECORD

Collection: TRECMS

Identifying Numbers

Report Number(s):

AFIT-ENG-MS-22-M-055

Subject Terms

Modernization Areas:

Autonomy

Communities of Interest:

Autonomy

Descriptor(s):

supervised machine learning, computational science, machine learning, neural networks, unsupervised machine learning, bayesian networks, computer languages, artificial intelligence software, dimensionality reduction, information science, artificial intelligence, language, air force, natural language processing, natural languages, engineering, information processing

Keyword(s):

Anonymized Search Relevance, Siamese Networks, document embeddings, nlp (NATURAL LANGUAGE PROCESSING), triplet loss functions, lda (latent dirichlet allocation), ml (MACHINE LEARNING), similarity, text classification, ann (artificial NEURAL NETWORKS), text pre-processing

Subject Categories:

Mathematical and Computer Sciences

Creation Date:

2022 Apr 19

Update Date:

2022 Jun 07