Method

The corpus of this digital archive consists of academic works and cultural productions with mixed licensing permissions. Some works are freely available on the internet while others are copyrighted and cannot be published or accessed in this digital archive due to copyright restrictions. Metadata includes information about their licensing.

For cataloging the works, all productions were read or viewed carefully and their whole or a significant sample of their lexicon in Pajubá was annotated in a spreadsheet. CollectionBuilder was utilized as a framework for visualizing the metadata. Along with the annotated metadata, the framework itself utilized by this project is published under Creative Commons CC-BY, stemming from the original CollectionBuilder project licensed under MIT license.

While transcribing the lexicon present in written works, many discrepancies were found (i.e. alibã vs aliban). To better visualize the overlapping between writing of these words, the metadata of cultural productions prior to the 2010s included both their original spelling and their modernized spelling. The modernization of the spelling can serve for research with corpus linguistics, as it assists to visualize patterns in the cryptolect.