VisualStringDistances
VisualStringDistances.GlyphVisualStringDistances.GlyphVisualStringDistances.GlyphCoordinatesVisualStringDistances.glyph!VisualStringDistances.printglyphVisualStringDistances.visual_distance
VisualStringDistances.Glyph — TypeGlyph <: AbstractArray{Bool,2}
Holds the bitmap associated to a Unifont glyph in a packed format.
VisualStringDistances.Glyph — MethodGlyph(s::AbstractString) --> Glyph
Construct a Glyph from a string.
Examples
julia> Glyph("abc")
------------------------
------------------------
------------------------
---------#--------------
---------#--------------
---------#--------------
--####---#-###----####--
-#----#--##---#--#----#-
------#--#----#--#------
--#####--#----#--#------
-#----#--#----#--#------
-#----#--#----#--#------
-#---##--##---#--#----#-
--###-#--#-###----####--
------------------------
------------------------VisualStringDistances.GlyphCoordinates — TypeGlyphCoordinates{T} <: AbstractVector{T}A sparse representation of a Glyph.
VisualStringDistances.glyph! — Methodglyph!(v::Vector{UInt8}) -> GlyphCreates a Glyph for a vector of bytes, assuming the vector represents a single Unifont character. Modifies v and may share its memory.
VisualStringDistances.printglyph — Functionprintglyph([io=stdout], g::Union{Char, AbstractString, Glyph})Prints a visual representation of g to io.
VisualStringDistances.visual_distance — Methodvisual_distance(::Type{T}, s::Union{Char,AbstractString},
t::Union{Char,AbstractString}; D=KL(one(T)), ϵ=T(0.1),
normalize=nothing) where {T}Computes a measure of distance between the strings s and t in terms of their visual representation as rendered by GNU Unifont and quantified by an unbalanced Sinkhorn divergence from UnbalancedOptimalTransport.jl.
- The keyword argument
Dchooses theUnbalancedOptimalTransport.AbstractDivergenceused to penalize the creation or destruction of "mass" (black pixels). ForD = VisualStringDistances.KL(ρ)for some numberρ ≥ 0, the distance is non-negative and zero if and only if the two visual representations of the strings are the same, as is generally desired. - The keyword argument
ϵsets the "entropic regularization" in the Sinkhorn divergence; see the documentation there for more information. In short, smallerϵcomputes a quantity more directly related to the cost of moving mass, but takes longer to compute. - The keyword argument
normalizecan be chosen to be a function which returns a normalizing constant given the maximum length of the two strings. The choicenormalize=identitythus divides the result by the maximum length of the two strings. The choicenormalize=sqrthas been found to give a good balance in some settings.
One may use printglyph to see the visual representation of the strings as rendered by GNU Unifont.
At the time of this writing, GNU Unifont is capable of rendering 57086 different unicode characters. However, it renders some unicode characters with the same graphical representation; specifically, 689 distinct unicode characters have duplicate representations. Here's a set of six duplicates, for example:
- 'Ꮋ': Unicode U+13BB (category Lu: Letter, uppercase)
- 'Н': Unicode U+041D (category Lu: Letter, uppercase)
- 'ꓧ': Unicode U+A4E7 (category Lo: Letter, other)
- 'Ⲏ': Unicode U+2C8E (category Lu: Letter, uppercase)
- 'Η': Unicode U+0397 (category Lu: Letter, uppercase)
- 'H': ASCII/Unicode U+0048 (category Lu: Letter, uppercase)
The visual distance between these, therefore, is returned as zero (up to numerical error).
Example
julia> using VisualStringDistances
julia> printglyph("abc")
------------------------
------------------------
------------------------
---------#--------------
---------#--------------
---------#--------------
--####---#-###----####--
-#----#--##---#--#----#-
------#--#----#--#------
--#####--#----#--#------
-#----#--#----#--#------
-#----#--#----#--#------
-#---##--##---#--#----#-
--###-#--#-###----####--
------------------------
------------------------
julia> printglyph("def")
------------------------
------------------------
------------------------
------#-------------##--
------#------------#----
------#------------#----
--###-#---####-----#----
-#---##--#----#--#####--
-#----#--#----#----#----
-#----#--######----#----
-#----#--#---------#----
-#----#--#---------#----
-#---##--#----#----#----
--###-#---####-----#----
------------------------
------------------------
julia> visual_distance("abc", "def")
31.57060117541754
julia> visual_distance("abc", "abe")
4.979840716647487