Code used to produce terms list in the work "NLP-Driven Electron Microscopy Ontology Development"

This is a collection of code written by Maurice Curran that was used to process the Microscopy and Microanalysis conference proceeding corpus into word products described in the publication "NLP-Driven Electron Microscopy Ontology Development". The scripts are written in Python, to be used in the following order:1. SettingUpTextFiles.py and CopyingText.py to get the raw text files; 2. SentenceConversion.py; 3. reference_remover.py; 4. testing.py and testingavg.py; 5. SentenceCreator.py; 6. matscholar_model.py to get matscholar tags; 7. training_model_gensim.py to get gensim model;8. word2vecscript.py and gensim_visual.py;

Data e Risorse

Campo Valore
accessLevel public
accrualPeriodicity irregular
bureauCode {006:55}
catalog_@context https://project-open-data.cio.gov/v1.1/schema/data.json
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy https://project-open-data.cio.gov/v1.1/schema/catalog.json
identifier ark:/88434/mds2-3198
issued 2024-09-05
landingPage https://data.nist.gov/od/id/mds2-3198
language {en}
license https://www.nist.gov/open/license
modified 2021-12-31 00:00:00
programCode {006:045}
publisher National Institute of Standards and Technology
references {https://doi.org/10.1007/s40192-024-00378-y}
resource-type Dataset
source_datajson_identifier true
source_hash 4d49de9ba151d6163286ebf8918a4950a9a5bef84953c26c6001e7b87bb48600
source_schema_version 1.1
theme {"Information Technology:Data and informatics","Materials:Modeling and computational material science","Materials:Materials characterization"}
Gruppi
  • AmeriGEOSS
  • National Provider
  • North America
Tag
  • AmeriGEO
  • AmeriGEOSS
  • CKAN
  • GEO
  • GEOSS
  • National
  • North America
  • United States
  • controlled-vocabulary
  • electron-microscopy
  • natural-language-processing
  • nlp
  • ontology
isopen False
license_id other-license-specified
license_title other-license-specified
maintainer June W. Lau
maintainer_email june.lau@nist.gov
metadata_created 2025-09-25T00:07:51.470388
metadata_modified 2025-09-25T00:07:51.470399
notes This is a collection of code written by Maurice Curran that was used to process the Microscopy and Microanalysis conference proceeding corpus into word products described in the publication "NLP-Driven Electron Microscopy Ontology Development". The scripts are written in Python, to be used in the following order:1. SettingUpTextFiles.py and CopyingText.py to get the raw text files; 2. SentenceConversion.py; 3. reference_remover.py; 4. testing.py and testingavg.py; 5. SentenceCreator.py; 6. matscholar_model.py to get matscholar tags; 7. training_model_gensim.py to get gensim model;8. word2vecscript.py and gensim_visual.py;
num_resources 1
num_tags 13
title Code used to produce terms list in the work "NLP-Driven Electron Microscopy Ontology Development"