I would be happy if you could help me out.
I want to examine the changes of the preferences of the amazon best selling books. In the link "https://www.amazon.com/gp/bestsellers/2017/books/", you see that there is a new composition of the best-sellers for every year. In each of the links/books, you can check into which category the book falls. The first book e.g. is a “Children’s Book”. The second best-seller “Literature & Fiction”, and so forth. At the end, I count and visualize the changes of each category and derive a hypothesis. But now, regarding the webmining code:
I don’t know how to go to each link. (you can’t simply change the composition of the URL(?))
-> this needs to be done for the next 80 items and for the years 2013-2017.(It only shows 20 best-sellers at once of a certain year)
How to implement it with a while loop?
So, the code should go to each of the links and extract basically everything:
I would be happy if you could help me out. I want to examine the changes of the preferences of the amazon best selling books. In the link "https://www.amazon.com/gp/bestsellers/2017/books/", you see that there is a new composition of the best-sellers for every year. In each of the links/books, you can check into which category the book falls. The first book e.g. is a “Children’s Book”. The second best-seller “Literature & Fiction”, and so forth. At the end, I count and visualize the changes of each category and derive a hypothesis. But now, regarding the webmining code: I don’t know how to go to each link. (you can’t simply change the composition of the URL(?))
https://www.amazon.com/Wonder-R-J-Palacio/dp/0375869026/ref=zg_bsar_books_1?_encoding=UTF8&psc=1&refRID=WHP2CV9Z86NK5VYK3W27
-> there’s no link showing it is ranked one or whatsoever. So what makes more sense is to extract the XPath.
-> this needs to be done for the next 80 items and for the years 2013-2017. (It only shows 20 best-sellers at once of a certain year) How to implement it with a while loop?
So, the code should go to each of the links and extract basically everything:
Here's what I have started: