Code used to produce terms list in the work "NLP-Driven Electron Microscopy Ontology Development"

This is a collection of code written by Maurice Curran that was used to process the Microscopy and Microanalysis conference proceeding corpus into word products described in the publication "NLP-Driven Electron Microscopy Ontology Development". The scripts are written in Python, to be used in the following order:1. SettingUpTextFiles.py and CopyingText.py to get the raw text files; 2. SentenceConversion.py; 3. reference_remover.py; 4. testing.py and testingavg.py; 5. SentenceCreator.py; 6. matscholar_model.py to get matscholar tags; 7. training_model_gensim.py to get gensim model;8. word2vecscript.py and gensim_visual.py;

Data e Risorse

NLP code to produce words about electron microscopyZIP
This zip file contains a set of scripts that extracts frequently occurring...
Esplora
- Altre informazioni
- Vai alla risorsa

Campo	Valore
accessLevel	public
accrualPeriodicity	irregular
bureauCode	{006:55}
catalog_@context	https://project-open-data.cio.gov/v1.1/schema/data.json
catalog_conformsTo	https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy	https://project-open-data.cio.gov/v1.1/schema/catalog.json
identifier	ark:/88434/mds2-3198
issued	2024-09-05
landingPage	https://data.nist.gov/od/id/mds2-3198
language	{en}
license	https://www.nist.gov/open/license
modified	2021-12-31 00:00:00
programCode	{006:045}
publisher	National Institute of Standards and Technology
references	{https://doi.org/10.1007/s40192-024-00378-y}
resource-type	Dataset
source_datajson_identifier	true
source_hash	4d49de9ba151d6163286ebf8918a4950a9a5bef84953c26c6001e7b87bb48600
source_schema_version	1.1
theme	{"Information Technology:Data and informatics","Materials:Modeling and computational material science","Materials:Materials characterization"}
Gruppi	AmeriGEOSS National Provider North America
Tag	AmeriGEO AmeriGEOSS CKAN GEO GEOSS National North America United States controlled-vocabulary electron-microscopy natural-language-processing nlp ontology
isopen	False
license_id	other-license-specified
license_title	other-license-specified
maintainer	June W. Lau
maintainer_email	june.lau@nist.gov
metadata_created	2025-09-25T00:07:51.470388
metadata_modified	2025-09-25T00:07:51.470399
notes	This is a collection of code written by Maurice Curran that was used to process the Microscopy and Microanalysis conference proceeding corpus into word products described in the publication "NLP-Driven Electron Microscopy Ontology Development". The scripts are written in Python, to be used in the following order:1. SettingUpTextFiles.py and CopyingText.py to get the raw text files; 2. SentenceConversion.py; 3. reference_remover.py; 4. testing.py and testingavg.py; 5. SentenceCreator.py; 6. matscholar_model.py to get matscholar tags; 7. training_model_gensim.py to get gensim model;8. word2vecscript.py and gensim_visual.py;
num_resources	1
num_tags	13
title	Code used to produce terms list in the work "NLP-Driven Electron Microscopy Ontology Development"