Media Bias Analysis in Most Popular Indonesian Media
Bias Analysis in Most Popular Indonesian Media
The bias analysis process on articles relevant to the keyword "Presidential Candidate" has been designated on each news website.
Bias analysis is used to understand the media's bias toward presidential candidates. It is useful for observing whether the media favors a particular presidential candidate or presents information objectively.
Bias analysis is generally similar to sentiment analysis, as it falls within the realm of text mining and can analyze opinions, attitudes, judgments, and emotions of writers in news articles.
Additionally, bias analysis also utilizes the components newspaperk3 and BeautifulSoup in Web Scraping, as well as TextBlob for text analysis.
The difference between bias analysis and sentiment analysis lies in the metrics used. Bias analysis employs the metrics "polarity" and "subjectivity" to determine bias categories.
On the other hand, sentiment analysis only uses polarity to determine positive, negative, or neutral sentiment. The polarity metric measures whether the text expresses happiness, disappointment, or neutrality towards an object in the news.
Polarity scores for each text can vary depending on user needs. In analyzing news, we have observed several samples of sentiment analysis results and further examined whether they align with the given polarity scores.
In our news analysis, positive polarity has scores ranging from 1 to 0.33, negative polarity ranges from -1 to -0.33, and neutral polarity ranges from -0.32 to 0.32.
In addition, subjectivity refers to the degree of emotional involvement or personal opinion in the text. Subjective news typically involves feelings, opinions, or individual judgments about a subject, while objective news focuses on facts or information without including emotions.
Subjectivity has four categories: highly subjective with scores from 0 to 0.30, fairly objective with scores from 0.31 to 0.50, fairly subjective with scores from 0.51 to 0.75, and highly subjective with scores from 0.76 to 1.
Bias in news articles consists of three categories: objective, positively subjective (favoring positive), and negatively subjective (favoring negative).
In categorizing bias, we first look at the subjectivity score. If the news article falls under the fairly subjective or highly subjective categories, then the media is biased toward the subject being covered. If the news article falls under the fairly objective or highly objective categories, then it is categorized as objective.
RSS Feeds and Google News in Data Retrieval
The initial step in conducting bias analysis begins with data retrieval from websites, similar to sentiment analysis.
Data retrieval is done using RSS (Really Simple Syndication) and Google News to easily gather current news articles.
In practice, we search Google News and can receive results in the form of an RSS feed by opening the URL and replacing 'news.google.com/' with 'news.google.com/rss /'.
We obtain Google News RSS feed URLs for top news based on topics, geographical locations, and languages. Data retrieval using RSS and Google News facilitates obtaining data more easily and comprehensively.
Unfortunately, each domain provides a maximum of only 100 URLs of data. However, we can work around this limitation by adjusting the timing of data retrieval.
RSS feeds have the advantage of well-structured and formatted article links, making them easier to find and extract compared to regular websites.
Another advantage is that all RSS feeds have the same standard format. Therefore, the same code can often be used to extract article links from multiple RSS feeds.
In the process, we initiate start and end dates for news articles to address the limitation of RSS being able to retrieve only 100 URLs per domain.
Additionally, if we want to retrieve news related to the presidential candidate "Prabowo Subianto," we would first specify an RSS feed with a URL containing the keyword "Prabowo Subianto."
Web Scraping with Newspaper3k and BeautifulSoup
Web scraping is the process of automatically and systematically extracting data or information from web pages. This technique is used to gather data from various online sources such as websites, forums, blogs, or social media platforms.
Web scraping involves extracting text, images, links, tables, and other elements from web pages. The web scraping process utilizes Python libraries, including Newspaper3k and BeautifulSoup.
Newspaper3k is used to perform web scraping on news articles. This library utilizes the requests library and has BeautifulSoup as a dependency, while also performing parsing using lxml.
Newspaper3k not only extracts text data from articles but also other types of data such as publication dates, authors, URLs, images, and videos. Another reason for using Newspaper3k is its ability to understand article content without actually reading the entire article.
Newspaper3k can also perform more advanced functions, such as discovering RSS feeds and retrieving article URLs from primary news sources using the requests library.
In the process, we import the Article object from the Newspaper3k library and then extract its information. Additionally, we use the nlp() function to process keywords from the article using Natural Language Processing (NLP) and summarize the article.
In this context, we also include scraping code within a try/except block to handle the presence of potentially faulty article URLs that could disrupt the program.
This exception is in the form of an 'ArticleException' error when executing Newspaper3k, which is handled within the 'except' block. This code is included in the import section at the top.
Bias Analysis with TextBlob and NLTK
After conducting web scraping using Newspaper3k and BeautifulSoup, we use the TextBlob and NLTK libraries to process and analyze the text of the retrieved news articles.
Bias analysis with TextBlob is part of the Natural Language Processing (NLP) process, which is used to understand, manipulate, and analyze human language by computers.
The NLP process involves a combination of linguistic techniques, statistics, and machine learning to achieve an accurate understanding and analysis of text and human language.
Initially, the NLP process was conducted separately, starting with Tokenization, Text Cleaning, Stopword Removal, Stemming and Lemmatization, Entity Recognition, Part-of-Speech (POS) Analysis, etc.
However, the TextBlob library provides various text processing features, such as language detection, tokenization, phrase modeling, text analysis, part-of-speech analysis (POS), and other features within a single library, making it easy for users to perform these tasks automatically.
As a result, TextBlob can be used to analyze the bias of a text, calculate word frequencies, perform part-of-speech analysis to identify words in sentences and perform spell-checking.
TextBlob analyzes bias by calculating average scores for various types of words in the text, then assigning polarity and subjectivity scores to the text.
Bias Analysis Process on News Articles in Sequence Stats
Several steps need to be taken to obtain the final results of bias analysis along with the categorization of news articles as objective, positively subjective (favoring positive), or negatively subjective (favoring negative).
This is done through the following steps:
- Install the newspaper3k, BeautifulSoup, and TextBlob libraries if not already available in the Python environment.
- Import libraries such as requests, Article, and ArticleException from newspaper3k, TextBlob, BeautifulSoup, and NLTK. NLTK will be installed automatically if TextBlob is successfully installed.
- Download the 'punkt' package to enable tokenization in TextBlob.
- Define the time range for retrieving news articles to be processed using RSS feeds. The dateutil.rrule library is used to set the appropriate time range.
- Use the zip() function to pair the appropriate start and end dates into tuples and place these pairs into a larger list with the 'datetime' format.
- Create a list of news websites to be scraped. Multiple news websites can be specified for bias analysis.
- Use a loop to iterate through each news website and the designated date ranges.
- Formulate Google News URLs with RSS feeds and search keywords corresponding to the name of the presidential candidate. In the URL formation, we set the date range (date1 and date2) and the previously noted news websites.
- Send an HTTP request to the URL and retrieve the results from the Google News RSS feed.
- Parse the RSS feed using BeautifulSoup to obtain all 'item' elements representing articles.
- Initiate a loop through the articles to retrieve the URL of each article and store them in an article list.
- For each article link, the code performs the following steps:
- Use the Newspaper3k library to download, parse, and analyze the article text.
- Check if the initial keyword for the presidential candidate is present in the article text.
- Perform bias analysis on the article text using TextBlob.
- The results of bias analysis using TextBlob, in the form of polarity and subjectivity, are stored as tuples and added to the data list.
13. Bias in news articles consists of three categories: objective, positively subjective (favoring positive), and negatively subjective (favoring negative). This categorization is based on the polarity and subjectivity scores from TextBlob.
14. The final results of the bias analysis and the media bias categories are stored in a CSV file.
The purpose of this process is to gather news articles from designated news websites, conduct bias analysis on content articles relevant to the keyword "Presidential Candidate," and store the results in a CSV file containing various information related to bias.
Article Source
- Newspaper3k, https://pypi.org/project/newspaper3k/
- TextBlob: Simplified Text Processing, https://textblob.readthedocs.io/en/dev/
- Scraping websites with Newspaper3k in Python, https://www.geeksforgeeks.org/scraping-websites-with-newspaper3k-in-python/