Page 1 of 3

Sampling data from CSV to DBF

Posted: Sat Jun 23, 2018 10:57 pm
by Eugene Lutsenko
How do I make a row-by-row sample of data from CSV to DBF? The text reproduced below does not work: it turns out only one record in dbf

Code: Select all

   oScrn   := DC_WaitOn( L('Заполнение БД "Inp_data.dbf" даными из файла: "train.csv"' ))

   CLOSE ALL

   nHandle := DC_txtOpen( 'train.csv' )
   USE Inp_data EXCLUSIVE NEW
   SELECT Inp_data

   DO WHILE !DC_TxtEOF( nHandle )                   // Начало цикла по строкам

      mLine = DC_TxtLine( nHandle )                 // Выделить строку из текстового файла

      APPEND BLANK

      mNFields = NUMTOKEN(mLine,",")
      FOR j=1 TO mNFields
          mWord = ALLTRIM(TOKEN(mLine,",",j))
          FIELDPUT(j, IF(j=1,mWord,VAL(mWord)))
      NEXT

      DC_TxtSkip( nHandle, 1 )
   ENDDO
   DC_TxtClose( nHandle )

   DC_Impl(oScrn)                                                   
[/size]

Re: Sampling data from CSV to DBF

Posted: Sat Jun 23, 2018 11:10 pm
by sdenjupol148
Hi Eugene,

Have you looked at DC_Csv2Workarea() and DC_Csv2Array()?
You can find some samples in \exp19\samples\csv

Regards,

Bobby

Re: Sampling data from CSV to DBF

Posted: Sun Jun 24, 2018 2:22 am
by Eugene Lutsenko
Thanks, I'll see. Once looked and tried a lot of different options. But already forgot about it

Re: Sampling data from CSV to DBF

Posted: Sun Jun 24, 2018 3:57 am
by Eugene Lutsenko
It's not working yet. Standard tools do not suit me, because in a CSV-file about 5000 fields and I make them into a sample in dbf, which can be a maximum of about 1700 fields. Csv files can have several million lines. I can't add an entry to Inp_data.dbf!

Code: Select all

   oScrn   := DC_WaitOn( L('Заполнение БД "Inp_data.dbf" даными из файла: "train.csv"' ))

   CLOSE ALL

   nHandle := DC_txtOpen( 'train.csv' )
   DC_TxtSkip( nHandle, 1 )

   USE Inp_data EXCLUSIVE NEW
   SELECT Inp_data

   DO WHILE !DC_TxtEOF( nHandle )                   // Начало цикла по строкам

      mLine = DC_TxtLine( nHandle )                 // Выделить строку из текстового файла

      aFieldVol := {}
      mNFields = NUMTOKEN(mLine,",")
*     FOR j=1 TO mNFields
      FOR j=1 TO 1500
          mWord = ALLTRIM(TOKEN(mLine,",",j))
          AADD(aFieldVol, mWord)
      NEXT

*     SELECT Inp_data
*     APPEND BLANK
      Inp_data->(DBAPPEND())
      FOR j=1 TO LEN(aFieldVol)
*         FIELDPUT(j, IF(j=1,aFieldVol[j],VAL(aFieldVol[j])))
          mFN = 'N'+ALLTRIM(STR(j))
          REPLACE Inp_data->&mFN WITH IF(j=1,aFieldVol[j],VAL(aFieldVol[j]))
      NEXT

      DC_TxtSkip( nHandle, 1 )
   ENDDO
   DC_TxtClose( nHandle )

   DC_Impl(oScrn)                                                   

[/size]

Re: Sampling data from CSV to DBF

Posted: Sun Jun 24, 2018 6:44 am
by rdonnay
If you can give me your CSV file and an empty DBF file, I will see what I can do.

Re: Sampling data from CSV to DBF

Posted: Sun Jun 24, 2018 7:29 am
by Eugene Lutsenko
rdonnay wrote:If you can give me your CSV file and an empty DBF file, I will see what I can do.
Greetings, Roger!
These are: "train.csv" and "test.csv" files, that can be downloaded here:
https://www.kaggle.com/c/santander-valu ... lenge/data
A DBF file has the same fields as a total of 4993. However, it is probably impossible to create a single DBF file with so many fields. Maximum, that I have steadily is obtained - 1,500 fields. So I'm probably going to create many DBF files 1000 fields associated with relationship one-to-one how to do this. In a DBF file, all fields except the 1st are numeric with 1 decimal place. 1st field is a text of 30 characters.

Code: Select all

   aStructure := { { aFieldName[1], "C", 30, 0 },;      // ID
                   { aFieldName[2], "N", 15, 1 } }      // TARGET

*  FOR j=3 TO LEN(aFieldName)-2
   FOR j=3 TO 1500
       mFN = 'N'+ALLTRIM(STR(j))
       AADD(aStructure, { mFN, "N", 15, 1 })
*      AADD(aStructure, { aFieldName[j], "N", 15, 1 })
   NEXT
   DbCreate( 'Inp_data.dbf', aStructure )
[/size]
Maybe there is a possibility to use some other database standard (not DBF), in which there is no such hard limit on the number of fields?

Re: Sampling data from CSV to DBF

Posted: Sun Jun 24, 2018 7:59 am
by Auge_Ohr
Eugene Lutsenko wrote:A DBF file has the same fields as a total of 4993.
DBF are wrong Database for so many Fields ...

neverless i wonder why you have so many Fields ... what about to use a Array and store it into Memo Type "V" (Var2Bin)

Re: Sampling data from CSV to DBF

Posted: Sun Jun 24, 2018 8:06 am
by Eugene Lutsenko
Auge_Ohr wrote:
Eugene Lutsenko wrote:A DBF file has the same fields as a total of 4993.
DBF are wrong Database for so many Fields ...

neverless i wonder why you have so many Fields ... what about to use a Array and store it into Memo Type "V" (Var2Bin)
Hi, Jimmy!
So many fields because I often solve large-dimensional problems: "big data". The array is not suitable, because there are a lot of observations - millions. With such a number of fields-this database that barely fit in 2GB, and sometimes do not fit. I was processing on my computer the largest database of 100,000 records per 100,000 fields. This database was created a little more than half an hour and had a size of 239 GB. I used my own database standard to process such data, as ADS does not support it either.

PS
My colleague has developed a module for parallel processing of information for high-speed synthesis and verification of large-scale models. This module uses graphics cards with an NVIDIA chip for non-graphical computing. But I have not used this module yet. It is in the stage of fine-tuning to the level when you can actually use it.

Re: Sampling data from CSV to DBF

Posted: Sun Jun 24, 2018 11:47 am
by Auge_Ohr
Eugene Lutsenko wrote:The array is not suitable, because there are a lot of observations - millions.
if you have so many data you should think about reduce/compress it or change data format.

you can use a Bitmap where each Pixel can have Value 0 - 16777216.
a 4K have 2000x2000 = 4.000.000 Pixel in a single Bitmap and need less Space than a 2000x2000 Array
Eugene Lutsenko wrote:My colleague has developed a module for parallel processing of information for high-speed synthesis and verification of large-scale models. This module uses graphics cards with an NVIDIA chip for non-graphical computing. But I have not used this module yet. It is in the stage of fine-tuning to the level when you can actually use it.
did he wrote a Interface for Xbase++ :roll:

what about running multi-Instance of your App while modern PC have more than 1 CPU
of course your App must be Network able when share same Database.

Re: Sampling data from CSV to DBF

Posted: Sun Jun 24, 2018 8:52 pm
by Eugene Lutsenko
Hi, Jimmy!

Yes, he has written an interface that provides the use of his module from any program written in any language or even manually.

My system is available online: http://lc.kubagro.ru/aidos/_Aidos-X.htm (use https://translate.yandex.ru/translate)

As for the use of video card memory - this is a good idea. But in this case it will not help. Another good idea is to treat all data of any nature as images in multidimensional space. I use it in my system.