Quantcast
Channel: KNIME RSS
Viewing all articles
Browse latest Browse all 4157

Replace missing data baed on mean of alternative column

$
0
0

Hi

I'm very very new to Knime and just about manageing to get my head around it, and so far its ease of use is great.

I am using the Titanic Kaggle competition as my data set to learn, and have come to a point where I'm trying to do my method for replacing missing age data.

I have a colum called New.title, which contains the values Mr, Master, Mrs and Miss

what I want to do, is where there is an missing cell in age, I want it to look at the new.title cell and determine what its value is, if it is Master, then I want the mean of all other rows where the new.title = Master to be placed as the missing value. e.g.

row idagenew.title
117Master
22Master
356Mrs
4?Master

The mean of all the masters, (17 and 2) is 9.2 so I want  row 4 age to be replaced by 9.2

I have managed to do this in a very long winded way (CSV Reader -> Row Splitter (based on new.title = master) -> csv riter -> Row Splitter (based on new.title = Mr) -> CSV Writer -> Row Splitter (based on new.title) = miss -> CSV writer -> Row Splitter (based on new.title = Mrs.) -> CSV Writer

This gives me 4 csvs all split by new.title, i can then do the mean calculation and replace missing data and then import the 4 csvs into one data set.  Just seems a little long winded to me.

I believe I should be able to achieve this a better way, maybe using the Rule Engine?  had a look at this but not sure on the Syntax

Any thoughts out there please on how I can do this more efficiently?

TIA
Jade


Viewing all articles
Browse latest Browse all 4157

Trending Articles