Tuesday, May 21, 2019
Data Preprocessing Essay
Data Preprocessing 3 Todays real-world entropybases be highly susceptible to noisy, missing, and inconsistent data due to their typically huge size of it (often several gigabytes or more) and their likely origin from five-fold, heterogenous sources. Low-quality data will lead to low-quality mining results. How tush the data be preprocessed in order to jockstrap improve the quality of the data and, consequently, of the mining results? How can the data be preprocessed so as to improve the ef? ciency and ease of the mining process? There are several data preprocessing techniques.Data cleaning can be applied to remove noise and be inconsistencies in data. Data integration merges data from multiple sources into a coherent data store such as a data warehouse. Data reduction can reduce data size by, for instance, aggregating, eliminating pleonastic features, or clustering. Data transformations (e. g. , normalization) may be applied, where data are scaled to fall within a smaller ra nge like 0. 0 to 1. 0. This can improve the accuracy and ef? ciency of mining algorithms involving distance measurements. These techniques are not mutually exclusive they may work together.For example, data cleaning can involve transformations to correct wrong data, such as by transforming all entries for a date ? eld to a common format. In Chapter 2, we learned about the different depute types and how to use basic statistical descriptions to study data characteristics. These can help identify erroneous values and outliers, which will be useful in the data cleaning and integration steps. Data processing techniques, when applied before mining, can substantially improve the overall quality of the patterns mined and/or the time infallible for the actual mining.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment