Due to intensive growth of the electronically available publications, bibliographic databases have become widespread. They cover a large variety of knowledge fields and provide a fast access to the wide variety of data. At the same time they contain a wealth of hidden knowledge that requires steps of extra processing in order to infer it. In this work we focus on extraction of such meta knowledge from the research bibliographic databases by looking at them from sociolinguistic, text mining and bibliometric perspectives. We choose the Digital Library and Bibliographic Database as a testbed for our experiments.
In the framework of the sociolinguistic analysis we build a statistical system for the language identification of personal names. We show also that extension of a purely statistical model with the co-authors network boosts the system's performance.
In the text mining scenario, we perform a number of experiments that focus on topic identification and ranking. While our topic detection approach remains generic and can be used for any kind of textual data, the topic ranking metrics are built upon the information provided by the bibliographic databases.
The goal of our bibliometric study is to create a researcher's profile on DBLP and analyze some of the research communities defined by the different conferences, in terms of the publication activity, interdisciplinarity of research, collaboration trends and population stability. We also aim at exploring to what extent these aspects correlate with the conference rank.
Each of the above topics constitutes a method of meta information extraction from bibliographic databases and other similarly structured data sources.
Show this publication on our institutional repository (orbi.lu).