Adv Gen Pract Med
Big data are characterized by the eight Vs[1–5] (i.e., volume, variety, velocity, veracity, value, variability, volatility, and validity and have shown great potential in forecasting and better decision making. Handling these data with conventional ways is inadequate and hence requires novel approaches and methods applied to healthcare research.
The online search queries have been popular in big data analytics for academic research[8–9]. The use of search traffic data from web-based sources can assist in facilitating a better understanding the Web-based behavior and behavioral changes. Online search traffic data have been deemed as a good analyzer of internet behavior, while Google Trends acts as a reliable tool in predicting changes in human behavior and as an accurate measure which is the publics interest. What are the most popular topics in health behavior using Google Trends remain unknown. The monitoring of web-based activity (or health behavior) can be thus a valid indicator of public behavior.
Healthcare is one of the top fields to which big data are widely applied[14–15] with many publications showing a high increase. Researchers have also placed their efforts on examining Web-based search queries for health behavior and medicine related topics. However, few were reported on the most cited articles and topics which were searched in Google Trends. Which author affiliations or regions searching health topics in Google Trends research are most prominent in the world are required to investigate.
As the use of Google Trends in examining human behavior is relatively novel, the search health topics of behaviors are constantly arising. Many topics have been examined, such as epilepsy[18–19], cancer, thrombosis, silicosis, and various medical procedures including cancer screening examinations[23–24], bariatric surgery, and laser eye surgery. We are thus interested in exploring (1) Which topics are most searched in healthcare; (2) which types of MESH terms are most characterized in Google Trends; (3) which articles are cited most in the past; (4) which regions in research affiliations are most in use of Google Trends in the world.
2.1 Data Collection
This review aimed to include all articles on the topics of health and medicine that have used Google Trends in the literature. We searched for the term ''Google Trends'' in article Title on PubMed  since 2009, and following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, the total number of publications included in this review was 86.
2.2 Search in Pubmed
First, we searched for the keyword ''Google Trends'' in the ''Abstract-Title-Keywords'' field for the journal articles. The first two articles using Google Trends were begun in 2009. The search returned 96 publications.
2.3 Social network analysis and Pajek software
Social network analysis (SNA)  was applied to explore the pattern of entities in a system using the software of Pajek . In keeping with the Pajek guidelines, we defined an author (or paper keyword) as a node that is connected to other nodes through the edge (or say the relation). Usually, the weight between two nodes is defined by the number of connections.
Centrality is a vital index to analyze the network. Any individual or keyword lies in the center of the social network will determine its influence on the network and its speed to gain information [30–32].
2.4 Tasks to report the feature on health behaviors in Google Trends research
2.4.1 What are the most number of author-defined keywords in the network
Author-defined keywords were collected and analyzed using SNA to separate clusters. The most number of betweenness centrality in each cluster is presentative which is the most occurred frequency in the respective cluster.
2.4.2 What are the most number of MESH terms in the network
Similarly, the MESH terms were used to characterize the feature for each cluster, which is significantly different from the author-defined keywords and actually and objectively represents the feature clusters of articles using Google Trends. Furthermore, all articles were grouped into the respective MESH terms through the maximum likelihood method (MLM). That is, the highest score among MESH cluster is the one article belonged to. Whether the types of MESH clusters are different in the number of the citation will be examined using the oneway ANOVA. The article citations were retrieved from the Pubmed Central (PMC). The most cited articles will be reported in this study.
2.4.3 What are the most number of citing journals related to the searching topics in Google Trends research
After retrieving the citing journals for articles from PMC, the association of journal citation can be analyzed using SNA. The most number of betweenness centrality were given to the journal representing the designated cluster of citing journals.
2.4.4 What are the most influential research regions regarding health behavior using Google Trends
The most influential research regions based on the order of author affiliations using the authorship-weighted scheme  . That is, the first (i.e., primary) and the last (i.e., corresponding) authors gain higher weighted contributions to the given article [34,35]. The visual display will be shown with a dashboard on Google Maps.
2.5 Statistical tools and methods used in this study
The Kendalls coefficient of concordance (W) was computed to examine the internal consistency (IC) of the data(i.e., the four indices of h, g, x, and L indexes as well as author impact factor (AIF))[37–42] related to MeSH clusters. If the agreement is accepted by the statistical alpha level (< 0.05), the following analysis of one-way ANOVA for inspecting the difference in mean of indices is meaningful.
3.1 Task 1: the most number of authordefined keywords in the network
Author-defined keywords displaying the top three representatives of clusters are Google trends, digital epidemiology, and infodemiology (Figure 1). It implies that many are methodology except for the only one topic of cosmetic surgery in the network.
3.2 Task 2: the most number of MESH terms in the network
As to the MESH terms, we see the MESH terms constituting the top five topics of the internet, trends, statistics & numerical data, information seeking behavior, and web browser (Figure 2), which indicate that all articles regarding health behaviors in Google Trends can be formed by the five types of clusters.
The relevant bibliometric indices (i.e., the four of h, g, x, and L indexes as well as AIF) to the MESH clusters are closely associated with each other (i.e., having high correlation coefficients). The Kendalls W is 0.67 with three degrees of freedom (χ3=12.05, p=0.01), indicating data with acceptable internal consistency (IC). The MESH term represented by Internet earns the highest impact factor (IF) and presents significantly different among term clusters ( F(3,20)=15.79, p<0.001).
3.3 Task 3: the most number of citing journals
The most number of citing journals is from PloS One (Figure 3) followed by the journals of JMIR Public Health Surveill and J Med Internet Res. The most cited articles are those from the US in 2009 (PMID= 19845471 cited 88 times) and the UK in 2013 (PMID= 23619126 cited 74 times)[44,45].
3.4 Task 4: the most influential research regions
The most number of author affiliations is from the US (Figure 4) followed by Italy and the UK as well as China, indicating the most influential research regions using Google Trends on healthcare are from the US, Europe, and China.
This study found that (1) the most cited articles are those from the US in 2009 (PMID= 19845471 cited 88 times) and the UK in 2013 (PMID= 23619126 cited 74 times); (2) the MESH term represented by Internet earns the highest impact factor (IF) and presents significantly different among MESH clusters (F (3,20)=15.79, p< 0.001); (3) the most number of citing journals is from PloS One; (4) the most number of author affiliations is from the US.
Referring the previous study which assessed the methods, tools, and statistical approaches in Google Trends Research, a total of 23.1% (24/104) studies used Google Trends data for examining seasonality, while 39.4% (41/104) and 32.7% (34/104) used correlations and modeling, respectively. A few around 8.7% (9/104) used for predictions and forecasting in health-related topics. All the 104 examined papers but two included data visualization to present the study results. For instance, a worldwide map examined country for assessing health and medicine related issues and found that the US data have been employed in the most (60) studies, while other countries include the UK (15), Australia (13), Canada (9), Germany (8), and Italy (7). The results are somewhat different from the visualization of our study (i.e., the US, Italy, the UK and China has shown in Figure 4. It is because we emphasize the author collaboration instead of the regions applied by Google Trends.
Many studies have employed Google Trends for visualizing the changes of interest or discussing peaks and spikes[44–47], such as the search volumes for related terms, terms related to the studied topic[48,49], and the related internet searches. Others include the reporting of the polynomial trend lines and the investigating of statistically significant differences in yearly increases. Even Google Correlate has also been used to explore related terms, we have not found any applying authordefined keywords or MESH terms to visualize the related terms, as we show them in Figure 1 and Figure 2.
The vast majority of studies using Google Trends in health assessment have employed data visualization, such as figures, maps, or screenshots. The most popular way is correlating them with official data on disease occurrence, spreading, and outbreaks. For instance, the assessment of suicide tendencies and (prescription or illegal) drug-related queries has been popular over the past years. However, the gap in the existing literature is the lack in use of Google Trends for predictions and forecasting in health-related topics which can benefit the general public by using and analyzing web-based data to provide insight to better assess health issues and topics in healthcare settings.
Finally, we particularly provided citation analysis to illustrated the most number of citing journals and proved that the MESH term of Internet earns the highest impact factor(IF) and presents significantly different among MESH clusters ( F (3,20)=15.79, p< 0.001). Another feature in this study is about the usage of Kendalls coefficient of concordance (W) which told us the bibliometric indices closely related to each other in Table, which is also rarely seen in the literature, particularly for data not following the normal distribution.
5. Limitations and Future study
The interpretation and generalization of the conclusions should be cautious. First, the data were extracted from Medline. It is worth noting that any generalization should be made in similar fields of paper related contents.
Second, several studies have used other sources of big data, namely, Google News[53–65,58], Twitter[56,57,59], Yandex, Baidu, Wikipedia, Facebook and Google+, and YouTube. Even Google is the most popular search engine,13 other Web-based sources are used or even preferred to Google in some regions (e.g., in China) which is worthy of study in the future.
Third, although the data were carefully extracted from Medline and were professionally dealt with in every linkage, the originally downloaded contexts with some errors in symbols might affect the resulting reports.
Fourth, there are many algorithms used for SNA. We just applied community cluster and betweenness centrality with weighted degrees in Figures. Any changes made with the specific algorithm will present different pattern and inference making.
This review consists of the studies published from 2009 to 2018 on Google Trends research in the PubMed databases based on the selected criteria. This review aimed to provide a point of reference for future research in health behaviors using Google Trends. Google Trends data are commonly used in infodemiology research and have been shown to empirically correlate with official health data in many topics. Using Google Trends become increasingly popular in health behavior assessment that can be crucial in monitoring and analyzing seasonal diseases as well as epidemics and outbreaks in healthcare settings.
7. List of abbreviations
AIF: author impact factor
BC: Betweenness centrality
IC: internal consistency
IF: impact factor
MESH: medical subject headings
PMC: Pubmed Central
SNA: Social network analysis
8. Competing interests
The authors declare that they have no competing interests.
9. Authors contributions
WC conceived and designed the study, TW performed the statistical analyses and were in charge of dealing with data. HY and TW helped design the study, collected information and interpreted data. WC monitored the research. All authors read and approved the final article.