Data Cleaning 101: Understanding Dirty Data And How To Form A Cleaning Strategy

2020-08-25
There are seven key areas of data management – creation, security, transmission, cleaning, analytics, storage, and sharing. Among all these, cleaning is widely regarded as the most labor- and time-intensive process for data scientists. For example, according to a survey by CrowdFlower, up to 80% of a data scientist's time is spent on finding, cleaning, and organizing data, leaving only 20% of their time to actually perform analysis. More remarkably, close to 60% of the respondents considered data cleaning the least enjoyable aspect of their job.
To view the complete content, please log in.