Since comma is not necessary be a delimiter of CSV file, this app can detect other delimiter automatically with the assumption number of delimiter for every row must be the same. The app validates first row for each partition of a file as it divide file into 100 partitions, so you can get 100 sample rows to disk (display first 20 rows to screen). Inside the source code, an instruction is helping you how to change the number from 100 to 1000.
Download URL: https://github.com/hkpeaks/peaks-consolidation/tree/main/Documents/PreviewFile
- main.go is a Golang version
- main.rs is a Rust version
- Peaks.py is a Python version
Demo video: https://lnkd.in/gCTFR9rh
Note: Use of "." to indicate it is member of your defined function is optional. First line is to define data extraction and data load. Below are 3 possible scenarios:-
UserDefineFunctionName = from Extraction to Load
Or
UserDefineFunctionName = from Extraction, Extraction, Extraction to Load
Or
UserDefineFunctionName = from Extraction to Load, Load, Load
You can define query/data transformation function from second line and after.
Examples:
.ExpandFactor: 123
.Filter: Saleman(Mary,Peter,John)
.JoinTable: Product, Category => InnerJoin(Master.csv)
.AddColumn: Quantity, Unit_Price => Multiply(Amount)
.Filter: Amount(Float20000..29999)
.GroupBy: Saleman, Shop, Product => Count() Sum(Quantity) Sum(Amount)
.OrderBy: Saleman(A) Product(A) Date(D)
.CreateFolderLake: Shop
.Filter: Product(222..888) Style(=F)
.ReadSample: StartPosition%(0) ByteLength(100000)
.View
AddColumn{Column, Column => Math(NewColName)}
where Math includes Add, Subtract, Multiply & Divide
BuildKeyValue{Column, Column ~ KeyValueTableName}
CurrentSetting{StreamMB(Number) Thread(Number)}
Distinct{Column, Column}
Filter{Column(CompareOperator Value) Column(CompareOperator Value)}
FilterUnmatch{Column(CompareOperator Value) Column(CompareOperator Value)}
where Compare operator includes >,<,>=,<=,=,!= & Range e.g. 100..200
Compare integer or float e.g. Float > Number, Float100..200
GroupBy{Column, Column => Count() Sum(Column) Max(Column) Min(Column)}
JoinKeyValue{Column, Column => JoinType(KeyValueTableName)}
where JoinType includes AllMatch, Filter & FilterUnmatch
JoinTable{Column, Column => JoinType(KeyValueTableName)}
where JoinType includes AllMatch & InnerJoin
OrderBy{PrimaryCol(Sorting Order) SecondaryCol(Sorting Order)}
OrderBy{SecondaryCol(Sorting Order) => CreateFolderLake(PrimaryCol) ~ FolderName or FileName.csv}
where Sorting Order represents by A or D, to sort real numbers, use either FloatA or FloatD
Read{FileName.csv ~ TableName}
ReadSample{StartPosition%(Number) ByteLength(Number)}
ReadSample{Repeat(Number) ByteLength(Number)}
Select{Column, Column}
SelectUnmatch{Column, Column}
SplitFile{FileName.csv ~ NumberOfSplit}
CreateFolderLake{Column, Column ~ SplitFolderName}
View{TableName}
Write{TableName ~ FileName.csv or %ExpandBy100Time.csv}