python - Check Type: How to check if something is a RDD or a dataframe? -


i'm using python, , spark rdd/dataframes.

i tried isinstance(thing, rdd) rdd wasn't recognized.

the reason need this:

i'm writing function both rdd , dataframes passed in, i'll need input.rdd underlying rdd if dataframe passed in.

isinstance work fine:

from pyspark.sql import dataframe pyspark.rdd import rdd  def foo(x):     if isinstance(x, rdd):         return "rdd"     if isinstance(x, dataframe):         return "dataframe"  foo(sc.parallelize([])) ## 'rdd' foo(sc.parallelize([("foo", 1)]).todf()) ## 'dataframe' 

but single dispatch more elegant approach:

from functools import singledispatch  @singledispatch def bar(x):     pass   @bar.register(rdd) def _(arg):     return "rdd"  @bar.register(dataframe) def _(arg):     return "dataframe"  bar(sc.parallelize([])) ## 'rdd'  bar(sc.parallelize([("foo", 1)]).todf()) ## 'dataframe' 

if don't mind additional dependencies multipledispatch interesting option:

from multipledispatch import dispatch  @dispatch(rdd) def baz(x):     return "rdd"  @dispatch(dataframe) def baz(x):     return "dataframe"  baz(sc.parallelize([])) ## 'rdd'  baz(sc.parallelize([("foo", 1)]).todf()) ## 'dataframe' 

finally pythonic approach check interface:

def foobar(x):     if hasattr(x, "rdd"):         ## dataframe     else:         ## (probably) rdd 

Comments

Popular posts from this blog

java - nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet Hibernate+SpringMVC -

sql - Postgresql tables exists, but getting "relation does not exist" when querying -

asp.net mvc - breakpoint on javascript in CSHTML? -