Web Scraping

WEB SCRABING WITH SCRAPY AND VISUALIZE THE MATERIAL WITH WORDCLOUD
1. PROJECT PURPOSE AND STEPS
In this project, using scrapy, we will collect the articles of an author on the website with CrawlerProcess and translate it into a visual format with WordCloud. We will do this process in 3 stages:
- We will collect the addresses of the articles written by the author.
- By using this addresses, we will scrapy the articles which was written by the author.
- We will visualize the article word pool with WordCloud.
2.MODULES WE NEED
We start the project by loading scrapy, wordcloud, pandas, matplotlib modules we need.

3. COLLECTING THE ARTICLE ADRESSES OF THE WRITER
We create an author class;
- First function “start_request” use to specify url address we scrape.
- The second function “parse” collects the address of the author’s articles to articleurls according to the xpath at the url address.
- The third function “parse_pages” saves the articles of the articles in the articles as allarticles in the list.
- We start the crawler process on lines 30-31-32.

We are scraping the article urls using xpath by inspecting the url address.
(‘//h2[@class=”title”]/a/@href’)

We also collect the articles found in the urls of the articles using xpath.
(‘//div[@class=”post-content”]//p/text()’)

5. WORDCLOUD PROCESSING
When using the word_cloud method, we remove English stopwords from the “allarticles list” so that we get a healthier result.


As far as we understand from Wordcloud, Darren is a writer who explains how to make money by writing a blog. Prominent and important issues are shown larger in wordcloud.
At APTRON Solutions, located in the heart of Noida, we understand the significance of hands-on learning and practical experience. Our Python Course in Noida is meticulously crafted to cater to beginners and seasoned professionals alike. Whether you're a fresh graduate aiming to kickstart your career or a seasoned developer looking to upgrade your skills, our comprehensive curriculum ensures that you acquire a deep understanding of Python programming concepts and their real-world applications.
ReplyDelete