Method

The corpus of this digital archive consists of academic works and cultural productions with mixed licensing terms. Some works are freely available online and reproduced here in full; others are under copyright and cannot be reproduced, so only their descriptive metadata is made available in this archive.

For cataloging the works, all productions were read or viewed carefully and their complete lexicon in Pajubá, or a significant sample thereof, was annotated in a spreadsheet. CollectionBuilder was utilized as the underlying framework for visualizing the metadata. This project’s codebase is a fork of CollectionBuilder (MIT License) and of A Latino Children’s Book Collection (MIT License); the modifications and additions introduced by this project are released under the GNU General Public License v3.0 (GPL-3.0). The metadata, annotations, and all original intellectual contributions produced as part of this project are published separately under Creative Commons Attribution 4.0 International (CC BY 4.0), which permits reuse and adaptation with appropriate credit.

While transcribing the lexicon present in written works, spelling discrepancies were encountered across sources (e.g., alibã vs. aliban). To better capture orthographic variation, metadata for cultural productions predating the 2010s includes both the original attested spelling and a standardized modern spelling. This normalization supports corpus linguistics research by making spelling patterns across the cryptolect more systematically visible.

The front-end of the visualization tools utilized d3.js asssited by Claude.AI.