Skip to content

Tool to convert CSV files into a number of destination formats, including XML, RDF, user-provided template, and MarkLogic sem:triple nodes.

Notifications You must be signed in to change notification settings

masyukun/convert-csv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

convert-csv

Tool to 1) convert relational schemas into MarkLogic sem:triples and 2) convert CSV files into a number of destination formats, including XML, RDF, user-provided template, and MarkLogic sem:triple nodes.

##Background information Here is a PowerPoint presentation of the project's technical background. It will be updated periodically, but may lag behind current functionality. There is a demo script in the same folder with a sample workflow that you can adapt to use this tool on your own dataset. https://github.com/masyukun/convert-csv/blob/master/demo-ppt/

##Required libraries:

##Parameters, command-line and properties file

Option Meaning
--csv-filename File name of the CSV file to read as input.
--csv-input-directory The absolute path of a folder containing 1 or more CSV files for ingest. Overrides csv-filename!
--database-name Name of database for output schema. (Default=[sql-file])
--define-header Specify the header by command line with comma separated list. "ColumnID,Column1,Column2,Column3"
--generate-triples Should the code attempt to generate MarkLogic sem:triple nodes inside the resulting output document? (True/false)
--has-header The first line of the CSV file contains a comma separated list of column names.
--output-filename Filename to store converted content.
--output-filename-auto Automatically generate output filenames based on the name of each input CSV file.
--output-format Type to convert into: SEMTRIPLE, TEMPLATE, XML
--output-path Output directory for output files as absolute filesystem path.
--output-record-num Maximum number of transformed records to write to the output file before starting a new output file.
--properties-file set properties file name [convertCsv.properties]
--schema-output-filename Filename for output schema. (Default=[myfile.txt])
--schema-output-type Format of output schema: PLAINTEXT, SEMTRIPLES, SQLINSERT
--sql-file Filename of MySQL SQL file containing CREATE TABLE statements to ingest.
--template-file File name of template file to use.
--template-footer Footer content to insert at the end of each template output file.
--template-header Header content to insert at the beginning of each template output file.
--xml-namespace Default namespace URI for XML output file.
--xml-namespace-prefix Prefix name for XML default namespace.
--xml-record Element name for each output CSV record.
--xml-root Element name for root of XML output file.

About

Tool to convert CSV files into a number of destination formats, including XML, RDF, user-provided template, and MarkLogic sem:triple nodes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published