I would be happy if you could help me out.
I want to examine the changes of the preferences of the amazon best selling books. In the link "https://www.amazon.com/gp/bestsellers/2017/books/", you see that there is a new composition of the best-sellers for every year. In each of the links/books, you can check into which category the book falls. The first book e.g. is a “Children’s Book”. The second best-seller “Literature & Fiction”, and so forth. At the end, I count and visualize the changes of each category and derive a hypothesis. But now, regarding the webmining code:
I don’t know how to go to each link. (you can’t simply change the composition of the URL(?))
-> there’s no link showing it is ranked one or whatsoever.
So what makes more sense is to extract the XPath.
-> this needs to be done for the next 80 items and for the years 2013-2017.
(It only shows 20 best-sellers at once of a certain year)
How to implement it with a while loop?
So, the code should go to each of the links and extract basically everything:
Here's what I have started:
amazon_link <- read_html("https://www.amazon.com/gp/bestsellers/2017/books/")
amazon_title <- html_nodes(xpath = "//*[@id="zg_centerListWrapper"]/div/div/div/a/div")