Utilizing public repositories to improve the decision process for security defect resolution and information reuse in the development environment
MetadataShow full item record
- Master theses 
Security risks are contained in solutions in software systems that could have been avoided if the design choices were analyzed by using public information security data sources. Public security sources have been shown to contain more relevant and recent information on current technologies than any textbook or research article, and these sources are often used by developers for solving software related problems. However, solutions copied from public discussion forums such as StackOverflow may contain security implications when copied directly into the developers environment. Several different methods to identify security bugs are being implemented, and recent efforts are looking into identifying security bugs from communication artifacts during software development lifecycle as well as using public security information sources to support secure design and development. The primary goal of this thesis is to investigate how to utilize public information sources to reduce security defects in software artifacts through improving the decision process for defect resolution and information reuse in the development environment. We build a data collection tool for collecting data from public information security sources and public discussion forums, construct machine learning models for classifying discussion forum posts and bug reports as security or not-security related, as well as word embedding models for finding matches between public security sources and public discussion forum posts or bug reports. The results of this thesis demonstrate that using public information security sources can provide additional validation layers for defect classification models, as well as provide additional security context for public discussion forum posts. The contributions of this thesis are to provide understanding of how public information security sources can better provide context for bug reports and discussion forums. Additionally, we provide data collection APIs for collecting datasets from these sources, and classification and word embedding models for recommending related security sources for bug reports and public discussion forum posts.