Data Extraction - Get Policy

For more information, you can check this Repository in my Github.

I developed this data extraction Web scraper program that is designed to extract valuable information from a policy announcement website. This program is capable of navigating each sub-item and saving the extracted data. In just one complete run, it can save up to 7,000 pieces of data, and it takes only about 85 minutes to complete (based on early 2021 statistics). The program utilizes Python,and SQLite database management system to automate the data extraction process.

When running, a SQLite database will be created and the content will be stored by id, article title, post date, abstract, url, body, attachments. If the text contains any attachments, the program will also download the attachments automatically.

Here is a demo when running

demo