identify the correct CSS selector of a url for an R script -
i trying obtain data website , helper following script:
require(httr) require(rvest) res <- httr::post(url = "http://apps.kew.org/wcsp/advsearch.do", body = list(page = "advancedsearch", attachmentexist = "", family = "", placeofpub = "", genus = "arctodupontia", yearpublished = "", species ="scleroclada", author = "", infrarank = "", infraepithet = "", selectedlevel = "cont"), encode = "form") pg <- content(res, as="parsed") lnks <- html_attr(html_node(pg,"td"), "href")
however, in cases, example above, not retrieve right link because, reason, html_attr not find urls ("href") within node detected html_node. far, have tried different css selector, "td", "a.onwardnav" , ".plantname" none of them generate object html_attr can handle correctly. hint?
you close on getting answer expecting. if pull links off of desired page then:
lnks <- html_attr(html_nodes(pg,"a"), "href")
will return list of of links @ "a" tag "href" attribute. notice command html_nodes , not node. there multiple "a" tags plural.
if looking information table in body of try this:
html_table(pg, fill=true) #or html_nodes(pg,"tr")
the second line return list of 9 rows table 1 parse obtain row names ("th") and/or row values ("td").
hope helps.
Comments
Post a Comment