Uncovering and Managing the Impact of Methodological Choices for the Computational Construction of Socio-Technical Networks from Texts
CARNEGIE-MELLON UNIV PITTSBURGH PA INST OF SOFTWARE RESEARCH INTERNAT
Pagination or Media Count:
This thesis is motivated by the need for scalable and reliable methods and technologies that support the construction of network data based on information from text data. Ultimately, the resulting data can be used for answering substantive questions about socio-technical networks. One main limitation with this approach is that the validation of the resulting network data can be hard to infeasible, e.g. in the cases of covert, past and large-scale networks. This thesis addresses this problem by identifying the impact of coding choices that must be made when extracting network data from text data on the structure of networks and network analysis results. The findings suggest that conducting reference resolution on the text data can alter the identity and weight of 76 of the nodes and 23 of the links, and cause major changes in the value of commonly used network metrics. Also, completely different sets of key nodes are found when reference resolution is applied to the text data prior to conducting relation extraction. Based on the outcome of these experiments, I recommend strategies for avoiding or mitigating the outlined issues in practical applications.
- Sociology and Law