本体模型 数据分析 管理决策 原型系统 案例研究  


Domain ontology core entities for spectroscopic profiling data.

用于表征和集成多种谱图数据的本体模型

Building an Information Infrastructure of Spectroscopic Profiling Data for Food-Drug Quality and Safety Management [J]. Enterprise Information Systems, SCI, JCR Q2, 2019, doi: 10.1080/17517575.2019.1684567



Domain ontology core entities for spectroscopic profiling data.

Solid arrow lines indicate concrete foreign-key reference in the underlying database, while dashed lines indicate an external reference, e.g. URL or resource path. Elliptical shapes are entities that will be persisted in the database. The document icons represent external or intermediate file objects.


The domain ontology defines the core entities and their associations. 
A.	The “DataSet” is a collection of multiple “Spectrum” instances. The spectrums in one dataset are generated for the same purpose (e.g. classify milk brands or identify a specific genuine geo herb), by the same measurement modality (Raman or MALDI-TOF-MS), with the same data preprocessing methods (i.e. filtering, averaging, peak identification, baseline drift removal, etc.), and must have the same data dimensions (i.e. peak number).
B.	The “Spectrum” is a basic unit to represent a specific spectroscopic data. It is usually a final processed spectrum data (e.g. averaged and filtered from multiple scans) that can be used directly for successive data analysis, other than the original raw data.
A spectrum object contains an array of X values (e.g. wave number for Raman, or m/z for MALDI-TOF-MS), and an optional y label (used for supervised data analysis). 
C.	A “DataSet” can be exported as a matrix or tabular form, which is directly importable by major scientific data analysis platforms, such as MATLAB, R, or Python. In the following manuscript, we will show how to use this intermediate data format to drive the data analysis workflow.
D.	Each “Spectrum” instance has multiple “Log” items, which track the status change in its life cycle. The ontology defines several phases for the spectrum data life cycle, including generate, preprocess, curate, analyze and report. 
E.	Each “Spectrum” instance can be serialized to an mzML (for MS) or JCAMP-DX (for vibrational spectroscopy) file or deserialized from an external one. For third-party instruments and systems (e.g. Agilent, Bruker, Horiba, Shimadzu, Thermo, Waters, etc.), such standard file formats can be used to exchange and share spectrum data. Our team is also developing our own MALDI-TOF-MS hardware that will directly transmit data in the designed format.
F.	The “Pipeline” is a set of algorithm elements organized to achieve complex data analysis tasks. A typical pipeline for spectroscopic data contains several preprocessors (e.g. filter, normalization, dimension reduction) and one regressor or classifier.
G.	Each “Algorithm” instance represents a specific algorithm component used in the pipelines. The algorithms belong to several categories, such as baseline drift removal, averaging filter, feature scaling, feature selection, classifier, regressor, etc.
H.	Each algorithm can have multiple implementations from different scientific platforms and programming languages. The implementation can be machine-interpretable scripts, source codes or compiled binary files. Engineers can either call existing libraries to implement algorithms or upload their own implementations.
I.	Each “Pipeline” object is targeted to a specific data set and analysis purpose. The final state of a pipeline instance is usually a statistical model (e.g. logistic regression, SVM or neural network) trained by the related data set. The model with its various parameters is persisted to a model file (e.g. .mat file for MATLAB or pickle file for python), and can be reloaded back to runtime.
J.	A trained pipeline can predict/analyze a new sample data of the same topic, and generates both a human-readable report and a computer-processable structured report for further decision support uses.
                

Multi-modality Data


快检图谱数据(已导入)
Device DeviceCode Count
Raman Spectrum 1 3121
Ion Mobility Spectrometry 2 94
MALDI TOF 3 628
Single-Photon Ionization TOF 6 799
Ultraviolet and visible spectrum 11 40
High Performance Liquid Chromatography 12 248
Fluorescent X-ray Spectrometry (Energy Dispersive Type) 121 15
Electronic Nose 200 71
Electronic Tongue 201 72
图像数据
数据类型 应用场景
拉曼伪彩色显微图像 判断物质分布
显微镜图像 微生物检测
X-ray图像 粮食种子粉化检测

文本数据
数据类型 应用场景
网络评论 舆情分析、风险事件挖掘

Structured Data


Data Set Count
4种品牌奶粉线上及线下样本拉曼光谱 599
4种品牌奶粉线上及线下样本拉曼光谱(脂肪信息) 373
4种品牌酸奶拉曼光谱 87
OvarianCancer-NCI-PBSII-061902 253
三种品牌奶粉拉曼光谱 46
三种复合预混饲料的拉曼光谱 46
不同产地铁皮石斛拉曼光谱 120
不同品牌奶粉电子鼻数据 47
不同品牌牛奶电子舌数据 48
不同类型食盐拉曼光谱 125
乳糖标准品拉曼光谱 3
五种品牌3段配方奶粉拉曼光谱 5
内蒙古野生与种植柴胡质谱 120
内蒙古野生与种植柴胡质谱VALID 120
前胡拉曼光谱 9
厚朴药材_高效液相色谱 LPLC (High Performance Liquid Chromatography) of Magnolia officinalis 40
古井贡酒不同年份拉曼光谱 620
古井贡酒不同年份白酒SPI-MS 799
同品牌3种牛奶产品质谱 135
大米拉曼光谱 69
婴幼儿米糊(Rice Cereal)拉曼光谱 101
川贝母、浙贝母拉曼光谱 90
枳壳拉曼光谱532nm 18
枳壳拉曼光谱785nm 25
柴胡不同种类拉曼光谱 135
柴胡药材_高效液相色谱 LPLC (High Performance Liquid Chromatography) of Bupleurum chinense 155
浙贝母拉曼光谱 17
灵芝片拉曼光谱532nm 20
灵芝片拉曼光谱785nm 20
甘肃省道地药材离子迁移谱 14
白术拉曼光谱 7
白芍拉曼光谱 10
红杆绿杆黄精根切片拉曼光谱 80
羊奶粉拉曼光谱 12
覆盆子拉曼光谱 8
连翘拉曼光谱 8
连翘药材_高效液相色谱 LPLC (High Performance Liquid Chromatography) of Forsythia suspensa 53
连翘(forsythia)不同产地电子舌数据 24
连翘(forsythia)不同产地电子鼻数据 24
酒黄精拉曼光谱532nm 20
酒黄精拉曼光谱785nm 20
醋延胡索拉曼光谱 3
醋柴胡拉曼光谱 10
野生及种植黄精拉曼光谱 80
野生及种植黄精根切片拉曼光谱 80
铁皮石斛拉曼光谱532nm 20
铁皮石斛拉曼光谱785nm 25
食品添加剂/污染物检测 46
食盐拉曼光谱 24
食盐能量色散X射线谱 15
麦冬拉曼光谱532nm 20
麦冬拉曼光谱785nm 20
黄芪药材不同产地ESI-IMS电喷雾离子迁移谱 40
黄芪药材不同产地多模态数据集_拉曼光谱 Multimodal data set of Astragalus membranaceus from different habitats (Raman spectra) 40
黄芪药材不同产地多模态数据集_离子迁移谱 Multimodal data set of Astragalus membranaceus from different habitats (IMS) 40
黄芪药材不同产地多模态数据集_紫外光谱 Multimodal data set of Astragalus membranaceus from different habitats (UV) 40
黑龙江某乳业公司4种不同奶拉曼光谱 60

Unstructured Data / Semi-structured Data - Image

示例:奶粉样品的拉曼显微图像

Core Concepts and Terminology


Code Coding System Concept Type Digest Description
E0065 SPACS DataSet [Entity] 17DED3B A collection of spectrum data, generated for the same purpose (e.g. classify milk brands or identify a specific genuine geo herb), by the same measurement modality (Raman or MALDI-TOF-MS), with the same data preprocessing methods (i.e. filtering, averaging, peak identification, baseline drift removal, etc.), and manifest the same data dimensions.
E0065.A0001 SPACS DataSet Id [Attribute] 8B3D52 Unique ID. Primary key.
E0065.A0002 SPACS DataSet Name [Attribute] 2866E67 Name of the data set
E0065.A0003 SPACS DataSet InputCode [Attribute] 1AB698B Acronym or abbreviation. Used for quick search.
E0065.A0004 SPACS Test Object [Attribute] 1657C57 The object under test, where the sample is taken from. E.g. infant milk, horse meat, a specific herb, etc. Use public terminologies to encode the object, such as FOODON, is recommended.
E0065.A0005 SPACS Test Topic [Attribute] 1B248DE The topic or target of the data set. E.g. classify milk brands or identify a specific genuine geo herb.
E0065.A0006 SPACS SOP [Attribute] 331E491 SOP (Standard Operation Procedures) to prepare the sample and get the spectrum data. SOP should be specific and detailed so that other researchers can reproduce the same result.
E0065.A0007 SPACS Modality [Attribute] 222E357 The test/detection modality. Should be one of the following enumerated values. Can also use HUGO-PSI MS terminology if the modality a kind of MS.
E0065.A0007.V0001 SPACS Raman [Value] 3BE230B Raman spectrometry
E0065.A0007.V0002 SPACS MS [Value] 217F6B4 Mass spectrometry in a general sense. Equal to MS:1000268 in HUGO-PSI MS.
E0065.A0007.V0003 SPACS MALDI_TOF_MS [Value] 2ABFAA Matrix-Assisted Laser Desorption /Ionization Time-Of-Flight Mass Spectrometry. Equal to MS:1000075 in HUGO-PSI MS.
E0065.A0007.V0004 SPACS SELDI_TOF_MS [Value] F345E5 Surface-Enhanced Laser Desorption /Ionization Time-Of-Flight Mass Spectrometry. Equal to MS:1000278 in HUGO-PSI MS.
E0065.A0007.V0005 SPACS IMS [Value] 2DAA7C5 Ion Mobility Spectrometry. Equal to MS:1000261 in HUGO-PSI MS.
E0065.A0007.V0006 SPACS NIRS [Value] 1C13486 Near-InfraRed Spectrometry
E0065.A0007.V0007 SPACS FIRS [Value] 77E3A2 Far-InfraRed Spectrometry
E0065.A0007.V0008 SPACS SPI_MS [Value] 2EF8E66 Single Photon Ionization Mass Spectroscopy
E0065.A0007.V0009 SPACS unknown [Value] 38E07F6
E0065.A0008 SPACS Device [Attribute] 1815B03 The instrument and client software version that generates this data set.
E0065.A0009 SPACS FilePath [Attribute] 16EC369 A cached (will not re-create if already exists) matrix or tabular file exported from this data set, which can be directly importable by major scientific data analysis platforms, such as MATLAB, R, or Python.
E0065.A0010 SPACS Spectrums [Attribute] 2EC9D0A Navigation property for a collection of Spectrum objects.
E0065.A0011 SPACS Samples [Attribute] 35BE80F Total count of spectrum data samples.
E0065.A0012 SPACS XLabels [Attribute] 209DDDA The headers or X labels of the data set. A string separated by comma. For example, the XLabels of a Raman spectrum data would be the wave numbers "250, 251, 252 ... , 2338, 2339". For MS, XLabels could be m/z "M/Z 0.019054, 0.019869, 0.020702, 0.021552, 0.022419, 0.023303 ... 303.687942, 303.789856, 303.891787 ... 304.503730, 304.605781, 304.707848".
E0065.A0013 SPACS YLabels [Attribute] 1150EF The Y labels of the data set, used for training supervised learning models. Use a json format. For the liquor year identification, YLabels can be "["5 years","8 years","16 years","26 years"]
E0065.A0014 SPACS YLabelSamples [Attribute] B06501 The sample count of each Y label. Use a json format. For the liquor year identification, YLabelSamples can be "{"5 years": 30,"8 years": 29, "16 years": 30, "26 years": 27}
E0065.A0015 SPACS DataSet Timestamp [Attribute] 380037E Latest revision timestamp.
E0066 SPACS Spectrum [Entity] 184FBBD Represent a piece of spectroscopic data, which is usually a final processed state (e.g. averaged and filtered from multiple scans or raw data) and can be used directly for successive data analysis.
E0066.A0001 SPACS Spectrum Id [Attribute] 29AED24 Uniqure ID. Primary key.
E0066.A0002 SPACS Spectrum FilePath [Attribute] 2E41C7B The original mzML (for MS) or JCAMP-DX (for vibrational spectroscopy) file from third-party instruments (e.g. Agilent, Bruker, Horiba, Shimadzu, Thermo, Waters, etc.). Used as a standard file format to import/export spectrum data.
E0066.A0003 SPACS Spectrum Digest [Attribute] 33C9CB5 The digital fingerprint or digest of the spectrum data.
E0066.A0004 SPACS YLabel [Attribute] 16B6E5B The category or Y label of this data. Used for training supervised learning models.
E0066.A0005 SPACS Sequence [Attribute] 22C57F8 A compressed byte array of the spectrum data.
E0066.A0006 SPACS Modality [Attribute] 1396CA9 The test/detection modality. Share the same modality enumerations with E0066.A0007.
E0066.A0006.V0001 SPACS XAxisMeaning [Value] 2DFED22 The physiochemical meaning of X axis. E.g. for Raman, X axis means wave number. For MS, X axis means m/z or time.
E0066.A0007 SPACS XAxisUnit [Attribute] 1451A63 X Axis Unit. e.g. cm-1 for Raman.
E0066.A0008 SPACS Logs [Attribute] 386A05A Navigation property for a collection of Log objects, which tracks the historical status change of the data.
E0066.A0009 SPACS Spectrum Metadata [Attribute] 2F2A1B9 Additional metadata. Can be a serialized json or xml object.
E0066.A0010 SPACS Spectrum Timestamp [Attribute] 3F09537 Latest revision timestamp.
E0071 SPACS Algorithm [Entity] 204BF87 Represents a specific algorithm component used in the pipelines. The algorithms belong to several categories, such as baseline drift removal, averaging filter, feature scaling, feature selection, classifier, regressor, etc.
E0071.A0001 SPACS Algorithm Id [Attribute] C5143C Unique ID. Primary key.
E0071.A0002 SPACS Algorithm Source [Attribute] FDDADC The source of the algorithm. If the algorithm is self developed, source should be "private". Otherwise, specify the fully qualified module or class name, e.g. "sklearn.manifold.TSNE"
E0071.A0003 SPACS Algorithm Name [Attribute] 1F10C62 Name of the algorithm
E0071.A0004 SPACS Algorithm InputCode [Attribute] AEA876 Acronym or abbreviation. Used for quick search.
E0071.A0005 SPACS Algorithm Category [Attribute] 106308D The category of the algorithm. Should be one of the following enumerated values.
E0071.A0005.V0001 SPACS Preprocessing [Value] 1D33A42
E0071.A0005.V0002 SPACS Dimension Reduction [Value] 22A1114
E0071.A0005.V0003 SPACS Feature Selection [Value] 1FA5ADF
E0071.A0005.V0004 SPACS Regression [Value] 1F3913D
E0071.A0005.V0005 SPACS Classification [Value] 46151B
E0071.A0005.V0006 SPACS Clustering [Value] 3EC18BE
E0071.A0005.V0007 SPACS Visualization [Value] 39F4774
E0071.A0006 SPACS Algorithm Tag [Attribute] 218E13C An additional tag for the algorithm.
E0071.A0007 SPACS Algorithm Reference [Attribute] 1792398 Published literature resource for the algorithm.
E0071.A0008 SPACS Algorithm Url [Attribute] 1B7BFCE Knowledge base URL. e.g. https://en.wikipedia.org/wiki/{Name}
E0071.A0009 SPACS Algorithm Description [Attribute] 2FC79E4 A brief description for the algorithm.
E0071.A0010 SPACS Algorithm Metadata [Attribute] 1F9CA7A Metadata about the algorithm. Can be a serialized json or xml object.
E0071.A0011 SPACS Algorithm Implementation [Attribute] 125B0E7 The programming language or script for algorithm implementation. Should be one of the following enumerated values.
E0071.A0011.V0001 SPACS Python [Value] 2BD39B6
E0071.A0011.V0002 SPACS C/C++ [Value] 3CB48DE
E0071.A0011.V0003 SPACS C# [Value] D6E5E7
E0071.A0011.V0004 SPACS Javascript [Value] 265CB39
E0071.A0011.V0005 SPACS R [Value] 322C0E4
E0071.A0011.V0006 SPACS Java [Value] 8D0614
E0071.A0011.V0007 SPACS Matlab [Value] 111AB73
E0071.A0011.V0008 SPACS Octave [Value] 23173BE
E0071.A0012 SPACS Algorithm Code [Attribute] 8206CA Code snippet or pseudo code for the algorithm.
E0071.A0013 SPACS Algorithm Timestamp [Attribute] 163AE35 Latest revision timestamp.
E0070 SPACS Pipeline [Entity] 31F620E The “Pipeline” is a set of algorithm elements organized to achieve complex data analysis tasks. A typical pipeline for spectroscopic data contains several preprocessors (e.g. filter, normalization, dimension reduction) and one regressor or classifier.
E0070.A0001 SPACS Pipeline Id [Attribute] 3BECEA6 Unique ID. Primary key.
E0070.A0002 SPACS Pipeline Name [Attribute] 30CA3D6 Name of the pipeline
E0070.A0003 SPACS Pipeline InputCode [Attribute] 10598F1 Acronym or abbreviation. Used for quick search.
E0070.A0004 SPACS Pipeline Reference [Attribute] FB0FA8 Literature or document describing the pipeline.
E0070.A0005 SPACS Pipeline Url [Attribute] 1F55322 Knowledge base URL, which provides a preview for the pipeline.
E0070.A0006 SPACS Pipeline Description [Attribute] 2C7877A A brief description for the pipeline.
E0070.A0007 SPACS Pipeline Metadata [Attribute] 27C28F7 Metadata about the pipeline. Can be a serialized json or xml object.
E0070.A0008 SPACS Pipeline Template [Attribute] 3066931 A pipeline template that can be populated with actual data input in runtime. The current implementation uses .ipynb (IPython notebook) file as the template format.
E0070.A0009 SPACS Pipeline Timestamp [Attribute] 23FB60 Latest revision timestamp.
E0068 SPACS Log [Entity] 34D6824 Track the status change in the life cycle of a spectrum data. The ontology defines several phases for the spectrum data life cycle, including generate, preprocess, curate, analyze and report.
E0068.A0001 SPACS Log Id [Attribute] 15D7BF9 Unique ID. Primary key.
E0068.A0002 SPACS Operator [Attribute] 249C299 The operator that causes the status change. Must be one of the valid users in the Account data table.
E0068.A0003 SPACS Operation [Attribute] 32A7491 Should be one of the following enumerated values.
E0068.A0003.V0001 SPACS generate [Value] 3883345
E0068.A0003.V0002 SPACS preprocess [Value] 1311AA6
E0068.A0003.V0003 SPACS curate [Value] 302FED0
E0068.A0003.V0004 SPACS analyze [Value] 3454EE5
E0068.A0003.V0005 SPACS report [Value] 1D17D4E
E0068.A0004 SPACS Device [Attribute] 3AFA8AD The instrument (e.g. MALDI-TOF-MS or Raman Spectrometer) or client computer where the operation is preformed. Should be the UID in the Device data table.
E0068.A0005 SPACS Location [Attribute] 1737FC6 The institute or laboratory that performs the operation. Can also be a 3rd-party testing organization.
E0068.A0006 SPACS Message [Attribute] 1AD2D8C Any messages or additional data that comes with the operation.
E0068.A0007 SPACS Spectrum ID [Attribute] 3E9F1AD A foreign key pointing to the related spectrum object. Spectrum and Log have one-to-many cardinality.
E0068.A0008 SPACS Log Timestamp [Attribute] 8D94C0 The creation timestamp of the log entry.

安全风险的发现路径




以乳制品非法添加物风险中的重要风险三聚氰胺和硫氰酸钠添加风险为例,当输入乳制品的理化指标时,系统可以判是否存在风险,如果存在风险,则会告知风险发生地路径,如三聚氰胺添加风险可能在奶农饲料喂养环节、奶站的收集环节、乳制品的加工环节、政府质检环节等存在重要隐患,硫氰酸钠可能在奶农运输、奶站运输环节存在安全隐患,除此之外,还给出了这些风险会出现的后果及其案例,从而为后面的政策指导提供基础。

© 2024 - 浙ICP备2023001282号