Vis enkel innførsel

dc.contributor.authorNyero, Waltereng
dc.date.accessioned2013-09-05T12:13:22Z
dc.date.available2013-09-05T12:13:22Z
dc.date.issued2009-11-20eng
dc.date.submitted2009-11-20eng
dc.identifier.urihttps://hdl.handle.net/1956/7058
dc.description.abstractToday many organisations and enterprises are using data from several sources either for strategic decision making or other business goals such as data integration. Data quality problems are always a hindrance to effective and efficient utilization of such data. Tools have been built to clean and standardize data, however, there is a need to pre-process this data by applying techniques and processes from statistical semantics, NLP, and lexical analysis. Data profiling employed these techniques to discover, reveal commonalties and differences in the inherent data structures, present ideas for creation of unified data model, and provide metrics for data standardization and verification. The IBM WebSphere tool was used to pre-process dataset/records by design and implementation of rule sets which were developed in QualityStage and tasks which were created in DataStage. Data profiling process generated set of statistics (frequencies), token/phrase relationships (RFDs, GRFDs), and other findings in the dataset that provided an overall view of the data source's inherent properties and structures. The examination of data ( identifying violations of the normal forms and other data commonalities) from a dataset and collecting the desired information provided useful statistics for data standardization and verification by enable disambiguation and classification of data.en_US
dc.format.extent894898 byteseng
dc.format.mimetypeapplication/pdfeng
dc.language.isoengeng
dc.publisherThe University of Bergenen_US
dc.titleData Profiling to Reveal Meaningful Structures for Standardizationen_US
dc.typeMaster thesis
dc.rights.holderCopyright the author. All rights reserveden_US
dc.description.degreeMaster i Informatikk - programutviklingen_US
dc.description.localcodeMAMN-INFPR
dc.description.localcodeINFPR
dc.subject.nus754115eng
fs.subjectcodeINFPR


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel