Generating implicit object fragment datasets for machine learning

Computers & Graphics

Alfonso López-Ruiz1,* ORCID logo Antonio J. Rueda-Ruiz1,* ORCID logo ,  Rafael J. Segura-Sánchez1,* ORCID logo
Carlos J. Ogayar-Anguita1,* ORCID logo ,  Pablo Navarro1,* ORCID logo ,  José M. Fuertes1,* ORCID logo

1Computer Science Department, University of Jaén  
2Instituto Patagónico de Ciencias Sociales y Humanas. Centro Nacional Patagónico, CONICET, Puerto Madryn, Argentina  

* Equal contribution

[Paper]  [Code]  [Voxel data (3GB)]  [Complete dataset (450GB)] 

Dataset


Our fragment dataset was generated with 1,052 Iberian vessels, by fragmenting every mesh between 2 and 10 times. The number of iterations for every number of fragments was linearly interpolated in [25, 15], hence generating 25 fragments while breaking models into 2 pieces. The fragmentation was stopped for every mesh once 1k fragments were obtained. A total of 1,040,428 point clouds and triangle meshes have been released together with 187,257 voxelizations. From these, 1,052 are dedicated to storing the original mesh. Triangle meshes are saved in their original format, as obtained from marching cubes, point clouds were sampled with 1024 points, and voxelizations have a resolution of up to 1283.


Accessing the dataset


The whole fragment data is available at our research institute's page. However, two lighter versions have been released since the complete dataset is too heavy (450 GB). Moreover, we encourage the readers to primarily use the Zenodo dataset if your work is centred on implicit data/voxels. In summary, these are the available datasets:


Decompress binary files


We have provided sample scripts to decompress meshes, point clouds and voxels. Decompression for mesh and voxel has been implemented in Python, where point clouds are decompressed in C++ since it requires the Point Cloud Library (PCL).


Vessel classification


The root name of the files in our dataset belongs to a vessel category, as detailed in a previous work of ours. This a yet unexplored branch since archaeological artefacts are hardly found intact; indeed, it is rather common to find small fragments. All these factors harden their digitization, and therefore, any application operating over 3D archaeological artefacts is hard to reproduce in the real-world. Yet, we provide a class.csv file for any future work that may find helpful this vessel categorization.



Eleven vessel profiles, as annotated in the provided class_vessel.csv file


Citation


                
    @article{LopezGenerating2024,
        title = {Generating implicit object fragment datasets for machine learning},
        journal = {Computers & Graphics},
        pages = {104104},
        year = {2024},
        issn = {0097-8493},
        doi = {https://doi.org/10.1016/j.cag.2024.104104},
        url = {https://www.sciencedirect.com/science/article/pii/S0097849324002395},
        author = {Alfonso López and Antonio J. Rueda and Rafael J. Segura and Carlos J. Ogayar and Pablo Navarro and José M. Fuertes}
    }