We invite you to explore our latest knowledgebase articles and to join the Smarten user community on Smarten Support Portal.
1. Missing Data – Why does it matter so much?
Ever worked upon an analytical project and noticed the presence of blank or NAN or undefined values in the records representing the data and being in need of correctly dealing with them? This might be a routine situation while working with real world data. It becomes a crucial step to execute fair technique to handle these missing values after understanding the analysis required from the data as often data for one party can be a noise to another party. Data can be missing owing to corrupt data, incomplete data extraction process, data entry errors or simply the data is rare and is actually missing! But handling such data is of great challenge in order to make right decisions and generate robust predictive models or reports. This article sums up key steps to handle missing values using Smarten Augmented Analytics and further explains its utility from the Employee Salary Prediction dataset.
1. The more the merrier, but the fewer the better!
Often, it’s difficult to determine the impact of an individual influencer on the response variable when multiple influencing factors have more or less the same influence. Let’s streamline this with a realistic example. Say for instance, we want to examine a child’s weight based upon various influencing factors including child’s height and age. It becomes evident that as children grow older, they get taller! Hence, both height as well as age are highly correlated in determining child’s weight. So, this case study has indeed a multicollinearity problem!