python - Check Type: How to check if something is a RDD or a dataframe? -
i'm using python, , spark rdd/dataframes.
i tried isinstance(thing, rdd) rdd wasn't recognized.
the reason need this:
i'm writing function both rdd , dataframes passed in, i'll need input.rdd underlying rdd if dataframe passed in.
isinstance
work fine:
from pyspark.sql import dataframe pyspark.rdd import rdd def foo(x): if isinstance(x, rdd): return "rdd" if isinstance(x, dataframe): return "dataframe" foo(sc.parallelize([])) ## 'rdd' foo(sc.parallelize([("foo", 1)]).todf()) ## 'dataframe'
but single dispatch more elegant approach:
from functools import singledispatch @singledispatch def bar(x): pass @bar.register(rdd) def _(arg): return "rdd" @bar.register(dataframe) def _(arg): return "dataframe" bar(sc.parallelize([])) ## 'rdd' bar(sc.parallelize([("foo", 1)]).todf()) ## 'dataframe'
if don't mind additional dependencies multipledispatch
interesting option:
from multipledispatch import dispatch @dispatch(rdd) def baz(x): return "rdd" @dispatch(dataframe) def baz(x): return "dataframe" baz(sc.parallelize([])) ## 'rdd' baz(sc.parallelize([("foo", 1)]).todf()) ## 'dataframe'
finally pythonic approach check interface:
def foobar(x): if hasattr(x, "rdd"): ## dataframe else: ## (probably) rdd
Comments
Post a Comment