Resumen
Small and Medium Enterprises (SMEs) are critical to the success of the green transition, as they are responsible for 60% of greenhouse
emissions by enterprises. Current European and national policies are designed to support SMEs in this process. However, the lack of data
available to researchers is a significant challenge when studying sustainability practices in SMEs. Many SMEs do not have the resources
or capacity to collect and report data on their sustainability performance, making it difficult for researchers to understand the full scope of
sustainability practices in SMEs and to identify best practices and areas for improvement. Additionally, there is a lack of standardization in
the data that is collected, making it difficult to compare and analyze the sustainability practices of different SMEs
To assess how policymakers' priorities are actually permeating society, fresh and novel data sources are critical to success. In the current
digital age, some of the activities of individuals and businesses can be tracked online, as they leave the known as "digital footprint". This
digital footprint on the Internet has become the largest repository of fresh information about society. Recent advancements in Natural
Language Processing (NLP) technologies have significantly enhanced the ability to extract meaningful insights from digital footprints,
offering a deeper understanding of company behaviors and practices
The primary goal of this research project is to examine the extent to which the sustainability transition of small and medium-sized
enterprises (SMEs) can be understood through their digital footprint and how this positioning is associated with the competitiveness of
industries and regions. Previous work of the research team developed some methodological basis to study simple associations between
the digital footprint and the business economy: monitoring demography, and analyzing the relation between Internet adoption and
productivity. The current proposal goes one step further in order to examine more complex relationships, which also requires more
exhaustive web scraping and more advanced big data methods, including recent NLP models such as BERT and GPT. The focus on the
sustainability of the industrial ecosystem aligns with the current industrial strategy for Europe.