apache spark - Recursive case class to DataFrame -
how make dataframe given case class references itself? take following:
case class testcase(id: long, parent: option[testcase]) if do:
val testcases = seq(testcase(1l, none), testcase(2l, some(testcase(1l, none)))).todf it throws big, 'olde scalareflection error. of course, do:
case class testcase(id: long, parentid: option[long]) but that's not want.
incidentally, avro has no issue encoding , decoding recursive schemas. don't think i'm asking impossible. seems normal use-case handling parent-child relationships.
update
i can manually create schema, far can tell, have hard-code how far chain can go repeatedly nesting structtypes. this:
val schema = structtype(array( structfield("id", longtype, false), structfield("parent", structtype(array( structfield("id", longtype, false), structfield("parent", structtype(array( structfield("id", longtype, false), structfield("parent", nulltype) ))) ))) )) note last parent chain, it's type nulltype. above schema, following work:
df.select($"parent") df.select($"parent.parent") df.select($"parent.parent.parent") based on schema above, first 2 might return null, or parent. third 1 returns null.
interestingly enough, create row objects dataframe, have do:
val testcaseseq = seq[testcase](...) val df = sqlcontext.createdataframe( sc.parallelize(testcaseseq.map(tc => row(tc.id, tc.parent))), schema ) i guess more or less works. have figure out in advance how many levels of parent-child hierarchy support. kind of sucks. can better this?
Comments
Post a Comment