File size: 2,007 Bytes
b31f748
 
 
 
 
 
31ff59a
b31f748
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
title: ISCC-LAB - Semantic-Code Text
emoji: ▶️
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 4.41.0
app_file: ./iscc_sct/demo.py
pinned: true
license: CC-BY-NC-SA-4.0
short_description: Cross Lingual Similarity Preserving Text Simprints
description: >
  # ISCC-LAB - Semantic-Code Text

  `iscc-sct` is a **proof of concept implementation** of a semantic Text-Code for the
  [ISCC](https://core.iscc.codes) (*International Standard Content Code*). Semantic Text-Codes are
  short identifiers created from text documents that preserve similarity (in hamming distance)
  for semantically similar cross-lingual text inputs.

  ## What is the ISCC

  The ISCC is a combination of various similarity preserving fingerprints and an identifier for
  digital media content.

  ISCCs are generated algorithmically from digital content, just like cryptographic hashes. However,
  instead of using a single cryptographic hash function to identify data only, the ISCC uses various
  algorithms to create a composite identifier that exhibits similarity-preserving properties (soft
  hash or Simprint).

  The component-based structure of the ISCC identifies content at multiple levels of abstraction. Each
  component is self-describing, modular, and can be used separately or with others to aid in various
  content identification tasks. The algorithmic design supports content deduplication, database
  synchronization, indexing, integrity verification, timestamping, versioning, data provenance,
  similarity clustering, anomaly detection, usage tracking, allocation of royalties, fact-checking and
  general digital asset management use-cases.


  ## ISCC Status

  The [ISCC](https://iscc.codes) is an ISO Standrad published under
  [ISO 24138:2024](https://www.iso.org/standard/77899.html) - International Standard Content Code
  within [ISO/TC 46/SC 9/WG 18](https://www.iso.org/committee/48836.html).

  The algorithms of this `iscc-sct` are experimental and not (yet) part of the official standard.