apache spark - Recursive case class to DataFrame -

- February 15, 2015

how make dataframe given case class references itself? take following:

case class testcase(id: long, parent: option[testcase])

if do:

val testcases = seq(testcase(1l, none), testcase(2l, some(testcase(1l, none)))).todf

it throws big, 'olde scalareflection error. of course, do:

case class testcase(id: long, parentid: option[long])

but that's not want.

incidentally, avro has no issue encoding , decoding recursive schemas. don't think i'm asking impossible. seems normal use-case handling parent-child relationships.

update

i can manually create schema, far can tell, have hard-code how far chain can go repeatedly nesting structtypes. this:

val schema = structtype(array(   structfield("id", longtype, false),   structfield("parent", structtype(array(     structfield("id", longtype, false),     structfield("parent", structtype(array(       structfield("id", longtype, false),       structfield("parent", nulltype)     )))   ))) ))

note last parent chain, it's type nulltype. above schema, following work:

df.select($"parent") df.select($"parent.parent") df.select($"parent.parent.parent")

based on schema above, first 2 might return null, or parent. third 1 returns null.

interestingly enough, create row objects dataframe, have do:

val testcaseseq = seq[testcase](...) val df = sqlcontext.createdataframe(   sc.parallelize(testcaseseq.map(tc => row(tc.id, tc.parent))),   schema )

i guess more or less works. have figure out in advance how many levels of parent-child hierarchy support. kind of sucks. can better this?

Search This Blog

First Image

apache spark - Recursive case class to DataFrame -

Comments

Post a Comment

Popular posts from this blog

php - Passing multiple values in a url using checkbox -

java - nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet Hibernate+SpringMVC -

sql - Postgresql tables exists, but getting "relation does not exist" when querying -