HTML Text mining for Chronic Pain and Epigenetics

Vector Analysis of the Literature on Chronic Pain and Epigenetics – Trends, Convergences and Gaps.

Análise Vetorial da Estrutura da Literatura de Dor Crônica e Epigenética – Tendências, Convergências e Lacunas

About

Chronic pain is recognized as a disease in its own right, given its persistence beyond the resolution of the initial injury and its significant functional and psychological impact. Recent evidence indicates that epigenetic mechanisms, rather than fixed genetic mutations, play a central role in the development and maintenance of chronic pain. Acting as an interface between genes and the environment, epigenetics regulates gene expression without altering the DNA sequence, leaving potentially heritable marks such as DNA methylation, histone modifications, and RNA-based regulation. This study presents an analysis of the scientific literature available in the PubMed database, focusing on epigenetic learning processes related to pain. Titles and abstracts were vectorized using the SWeeP approach and analyzed with text-mining methods (HTML-TM), enabling semantic clustering and exploratory analysis. The findings revealed two major thematic axes that structure the field: intrinsic factors related to human development and aging, and acquired or contextual factors associated with chronic pain. The analysis also highlighted existing knowledge gaps and demonstrated how epigenetic mechanisms may serve as potential biomarkers and therapeutic targets. Overall, the integration of chronic pain and epigenetics, supported by bioinformatics tools, enhances the understanding of this complex phenomenon and offers valuable insights for advances in prevention, diagnosis, and treatment.

We’ve provided two HTML’s:

Texts - each line corresponds to an article
Words - each line corresponds to a term
Tree - phylogenetic tree of 1500 words (HTML of tree produced with Phy2HTML)

How to use the HTMLs?

TEXTS

TEXTS contains a list of all the articles analysed, as shown in the image below. Each article has a link to a list of articles (titles+abstract) that most closely match the query article.

WORDS

WORDS contains: 1) Cod: word id , 2) WORD: the query term, 3) Related words: a list of the 10 closest words, 3) Link title: a link to the articles (title+abstract) most related to the term, 4) link abstracts: a link to the tree of terms rooted in the query term, and 5) link graphic: a graph of the frequency of the term in the literature.

The trees are all rooted in the query term. The graph has two curves, one in blue with the frequency of the term yoga in the literature against time, and in purple the frequency that the query term is found.

TREE

TREE presents a phylogenetic tree of 1,500 words. The number before the word (label of branch) corresponds to the word id.

Authors

Maristela Palu Yamashita^1,2,3 ; Camilla R. De Pierri^1,2,3 ; Roberto T. Raittz^1,2,3,*

Programa de Pós Graduação em Bioinformática (SEPT), Universidade Federal do Paraná, Curitiba Paraná, Brasil
Laboratório de Inteligência Artificial aplicada à Bioinformática (AIBIA) , Curitiba, Paraná, Brasil
Programa de Pós-Graduação Associada em Bioinformática (SEPT), Universidade Federal do Paraná, Curitiba, Paraná, Brasil

Contact:

*raittz@ufpr.br