Read data to databases from MS Excel-2010 files

This forum is for eXpress++ general support.
Message
Author
User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Read data to databases from MS Excel-2010 files

#11 Post by Eugene Lutsenko »

While here that turns out:

Code: Select all

/*
This test program will export Excel-Sheet to Work Area DBF
*/
#INCLUDE "inkey.ch"
#INCLUDE "dcdir.ch"
#INCLUDE "appevent.ch"
#INCLUDE "xbp.ch"
#INCLUDE "dcprint.ch"
#INCLUDE "dcmsg.ch"
#INCLUDE "dll.ch"
#include "dccursor.ch"
#Include "thread.ch"
#INCLUDE "class.CH"
#INCLUDE "dmlb.CH"
#INCLUDE "fileio.CH"
#INCLUDE "dctree.CH"
#INCLUDE "dcicon.CH"
#INCLUDE "dcdialog.CH"
#INCLUDE "outlook.CH"

#pragma Library("ascom10.lib")
#pragma library( "dclip1.lib" )
#pragma library( "dclip2.lib" )
#pragma library( "dclipx.lib" )
#pragma library( "xbtbase1.lib" )
#pragma library( "xbtbase2.lib" )
#pragma library( "xppui2.lib" )

FUNCTION Main()

 SET COLLATION TO SYSTEM   // Руссификация
*SET COLLATION TO ASCII    // Руссификация

PUBLIC cExcelFile := 'Inp_data'
PUBLIC cDbaseFile := cExcelFile

MsgBox('Преобразование Excel-файла в DBF-файл"')

IF LEN(ALLTRIM(cExcelFile)) = 0
   MsgBox('Задайте в качестве параметра имя конвертируемого Excel-файла без расширения: "XLS-DBF FileName"')
   RETURN NIL
ENDIF

***** В будущем сделать определение по расширению какой Excel: 2003 или 2010
***** Проверить, как будет работать для 2010
CLOSE ALL
cExcelFile = DC_CurPath() + '\' + ALLTRIM(cExcelFile) + '.xlsx'
LC_Excel2WorkArea( cExcelFile )
 
RETURN nil

* ---------------
PROC appsys ; RETURN
* -------------

FUNCTION LC_Excel2WorkArea( cExcelFile )

LOCAL lStatus := .f., oExcel, cPath, oSheet, oBook, aValues, i, j, ;
      aStru, xValue
LOCAL oProgressm, oDialogm, lOk

#if XPPVER > 1900000
  // Create the "Excel.Application" object
  oExcel := CreateObject("Excel.Application")
  IF Empty( oExcel )
    DC_WinAlert( "Excel is not installed" )
    RETURN .f.
  ENDIF
#else
  DC_WinAlert('This feature is available in Xbase++ 1.9 and later only!')
  RETURN .f.
#endif

oExcel:Visible := .f.

// Load a Workbook from an .XLS file
// Get path from ini file

IF !File(cExcelFile)
  DC_WinAlert( 'File does not exist:' + Chr(13) + cExcelFile )
  RETURN lStatus
ENDIF

oScrn := DC_WaitOn( 'Открытие XLS-файла' )
oBook   := oExcel:Workbooks:Open(cExcelFile)
oSheet  := oBook:activeSheet
aValues := oBook:workSheets(1):usedRange:value      // Загрузка 1-го листа Excel-книги
DC_Impl(oScrn)

*DC_DebugQout( { aValues[1], aValues[2], aValues[3] } )
*DC_DebugQout( { aValues } )
*MsgBox('Stop')

********* Сформировать файл Inp_name.txt            // Наименования классификационных и описательных шкал
aInp_name := aValues[1]                             // Имена колонок 1-го листа Excel-книги

N_Rec = LEN(aValues)                                // Количество строк   с данными в xls-файле
N_Col = LEN(aInp_name)                              // Количество колонок с данными в xls-файле
*MsgBox(STR(N_rec,15))

CrLf  = CHR(13)+CHR(10)                             // Конец строки (абзаца) (CrLf)
mCol_name = ""
FOR j=1 TO N_Col
    mNameJ = ALLTRIM(aInp_name[j])
    mNameJ = UPPER(SUBSTR(mNameJ,1,1)) + SUBSTR(mNameJ,2)   // Сделать первые символы заголовков колонок большими, а остальные оставить как есть
    mCol_name = mCol_name + mNameJ + CrLf
NEXT
StrFile(mCol_name, 'Inp_name.txt')


// Создание пустой dbf-таблицы для записи в нее данных из 1-го листа xls-файла ******************
// Определение структуры для dbf-таблицы по значениям всех строк (тип и формат данных в колонках)
// На основе просмотра ВСЕГО xls-файла сделать массивы описания полей и для создания структуры брать MAX из каждого массива
// В dbf-таблицу переносить данные, начиная со 2-й строки xls-листа
// 1-ю строку с наименованиями полей использовать вместо Inp_name.txt или чтобы автоматически его делать

aStructure := {}                      // Массив структуры dbf-таблицы

PRIVATE aFieldName[N_col]             // Наименования колонок для базы данных Inp_data.dbf
PRIVATE aFieldType[N_Col]             // Тип значений в j-й колонке (если есть хотя бы одно текстовое, то текстовое)
PRIVATE aFieldSize[N_Col]             // Максимальный размер значения в j-й колонке в символах
PRIVATE aFieldInt [N_Col]             // Максимальное количество значащих цифр до    десятичной точки в значениях j-й колонки
PRIVATE aFieldDeci[N_Col]             // Максимальное количество значащих цифр после десятичной точки в значениях j-й колонки
PRIVATE aFlag_neg [N_Col]             // Флаг наличия отрицательных значений в j-й колонке

AFILL(aFieldType, "X" )
AFILL(aFieldSize,  1  )
AFILL(aFieldInt ,  1  )
AFILL(aFieldDeci,  0  )
AFILL(aFlag_neg , .F. )

FOR j=1 TO N_Col
    aFieldName[j] = "N"+ALLTRIM(STR(j, 15))
NEXT

***** Отображение стадии исполнения в кратком варианте *****************************************
nMax  = N_Col + N_Rec - 1
nTime = 0
@ 4,5 DCPROGRESS oProgressm SIZE 70,1.1 MAXCOUNT nMax COLOR GRA_CLR_CYAN PERCENT EVERY 100
DCREAD GUI TITLE 'Идет преобразование XLS-файла в базу данных !!!' PARENT @oDialogm FIT EXIT
oDialogm:show()
DC_GetProgress(oProgressm,0,nMax)
************************************************************************************************


FOR j=1 TO N_Col                      // Цикл по колонкам

    FOR i=2 TO N_Rec                  // Цикл по строкам

        mFieldType = VALTYPE(aValues[i,j])

        DO CASE

           CASE mFieldType = "C"      // **************************************************************************
  
                aFieldType[j] = "C"   // Тип значений в j-й колонке (если есть хотя бы одно текстовое, то текстовое)
                aFieldSize[j] = MAX(aFieldSize[j], LEN(ALLTRIM(aValues[i,j])))
                aFieldDeci[j] = 0

           CASE mFieldType = "N"      // **************************************************************************

                IF aValues[i,j] < 0
                   aFlag_neg[j] = .T.
                ENDIF

                IF aFieldType[j] <> "C"
                   aFieldType[j] = mFieldType
                ENDIF

                mVal = DC_XtoC(aValues[i,j])                                      
                REMRIGHT(mVal,"0")                                                // Убрать подряд идущие нули справа до 1-й значащей цифры
                Pos  = AT('.', MVal)

                DO CASE
                   CASE Pos = 0               // Целое число

                        aFieldInt [j] = MAX(aFieldInt [j], LEN(ALLTRIM(mVal)))    // Определить число значащих цифр ДО запятой без ведущих нулей
                        aFieldDeci[j] = 0
                        // Если отрицательное, то символ на знак
                        aFieldSize[j] = MAX(aFieldSize[j], aFieldInt[j] + IF(aFlag_neg[j],1,0))

                   CASE Pos > 0               // Число с дробной частью

                        aFieldInt [j] = Pos - 1                                   // Определить число значащих цифр ДО запятой без ведущих нулей
                        mFieldDeci = LEN(ALLTRIM(mVal)) - aFieldInt [j] - 1
                        aFieldDeci[j] = MAX(aFieldDeci[j], mFieldDeci) 
                        // Если с дробной частью, то плюс символ на точку
                        // Если отрицательное, то символ на знак
                        aFieldSize[j] = MAX(aFieldSize[j], aFieldInt[j] + aFieldDeci[j] + 1 + IF(aFlag_neg[j],1,0))

                ENDCASE

           CASE mFieldType = "D"      // **************************************************************************

                aFieldType[j] = "C"
                aFieldSize[j] =  8
                aFieldDeci[j] =  0

        ENDCASE
    NEXT

    AADD(aStructure, { aFieldName[j] , aFieldType[j],  aFieldSize[j], aFieldDeci[j] })  // Добавить строку в структуру dbf-таблицы
    DC_GetProgress(oProgressm, ++nTime, nMax)

NEXT

*DC_DebugQout( { aStructure } )
*MsgBox('Stop')

cDbaseName = cDbaseFile
cDbaseFile = cDbaseFile + '.dbf'
DbCreate( cDbaseFile , aStructure )

***** Цикл переноса данных из xls-листа в dbf-таблицу

CLOSE ALL
cDbaseName = SUBSTR(cDbaseFile, 1, AT('.',cDbaseFile)-1)
USE (cDbaseName) EXCLUSIVE NEW;ZAP
SELECT (cDbaseName)

FOR i=2 TO N_Rec

    DBAPPEND()

    FOR j=1 TO N_Col

        // Преобразовывать значения с типами числа и даты в текст, если тип поля текст
        DO CASE

           CASE VALTYPE(aValues[i,j]) = 'C' .AND. aFieldType[j] = 'C'

                FIELDPUT(j, aValues[i,j])                                       // Просто занести

           CASE VALTYPE(aValues[i,j]) = 'N' .AND. aFieldType[j] = 'N'

                FIELDPUT(j, aValues[i,j])                                       // Просто занести

           CASE VALTYPE(aValues[i,j]) = 'N' .AND. aFieldType[j] = 'C'

                FIELDPUT(j, STR(aValues[i,j], aFieldSize[j], aFieldDeci[j] ))   // Преобразовать число к текстовому виду

           CASE VALTYPE(aValues[i,j]) = 'D' .AND. aFieldType[j] = 'C'

                FIELDPUT(j, DTOC(aValues[i,j]))                                 // Преобразовать дату к текстовому виду

        ENDCASE        

    NEXT

    DC_GetProgress(oProgressm, ++nTime, nMax)

NEXT

*    IF Valtype(xValue) == 'N' .AND. aStru[j,2] == 'C'
 *     xValue := DC_XtoC(xValue)
  *  ELSEIF aStru[j,2] == 'D' .AND. Valtype(xValue) == 'N'
   *   xValue := Str(xValue)
    *  xValue := StoD(xValue)
*    ELSEIF aStru[j,2] == 'D' .AND. Valtype(xValue) == 'C'
 *     xValue := Ctod(xValue)
  *  ELSEIF aStru[j,2] == 'C' .AND. Valtype(xValue) == 'D'
   *   xValue := DtoS(xValue)
    *ENDIF
*    MsgBox(DC_XtoC(000123.12345000))         // Надо убрать подряд идущие нули справа до 1-й значащей цифры

DC_GetProgress(oProgressm,nMax,nMax)
oDialogm:Destroy()

oBook:close()
oBook:destroy()

// Quit Excel
oExcel:Quit()
oExcel:Destroy()

RETURN lStatus

* -------------
To me it is unclear why numerical values of the xls-file are loaded always with two signs after a comma whereas them there can be a different quantity.

Can be to eat possibility of access to information on formats and types of data of the xls-file not to learn these parameters by the analysis of directly values. It would be very good since values can be much and it can take noticeable time.

Still I didn't lick into shape definition of formats of numerical values with fractional part.

In the future I want to make option of this function which can be started from a command line with parameter - the name xls or the xlsx-file.

For some reason doesn't write in headings of windows in Russian.

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Read data to databases from MS Excel-2010 files

#12 Post by Eugene Lutsenko »

Files both from Excel-2003 and from Excel-2007 (2010) are equally well loaded. But only rather not so big (I checked till 65536 lines, everything works normally). At 881904 lines in the Excel-file when loading there is a mistake "There is no memory". Thus on the computer of 24 Gb of random access memory and 3 Tb of external memory. And after all in opportunity to use big files one of the main advantages of Excel-2010.

In this regard there is a question: whether it is possible to load somehow big Excel-files in parts (can be even in the lines). Can be to eat and any other option. Still there are the questions formulated in the last message.

The program with the executable module and a source text, and also the test big Excel-file are provided here:

http://lc.kubagro.ru/Dima/XLS-DBF.rar

skiman
Posts: 1199
Joined: Thu Jan 28, 2010 1:22 am
Location: Sijsele, Belgium
Contact:

Re: Read data to databases from MS Excel-2010 files

#13 Post by skiman »

Hi,

I'm reading xls files by 500 rows.

Code: Select all

....
for y = 1 to nMaxLines step 500
    if nMaxLines - y < 500
	nNumBlock := nMaxlines - y
    endif
    aData :=  oSheet:Range("A"+alltrim(str(y))+":"+cLastField+alltrim(str(y+nNUmBlock)) ):Value 
    .....
next
Best regards,

Chris.
www.aboservice.be

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Read data to databases from MS Excel-2010 files

#14 Post by Eugene Lutsenko »

Thanks! I understood that something similar would have to be, but didn't know how to realize it. And how to learn value of the nMaxLines parameter, that is the actual quantity of the lines of an Excel-leaf filled with values? In this fragment of the program to me a lot of things are unclear. You couldn't explain, from where it is possible to learn values of variables entering into it, for example y. How to define, what Excel is installed on the computer: 2003 or 2007-2010? It is too much unknown to me functions in your fragment of the program. In particular that it for function: Range where it is described? It seems as in it text names of columns of an Excel-leaf ("A") are formed? Though its general sense is clear, but for me is problematic to use it specifically in practice. You couldn't provide the small really working program which is reading out the Excel-file in the massif on groups of lines from beginning to end. Very probably on VBA: http://www.rondebruin.nl/find.htm

skiman
Posts: 1199
Joined: Thu Jan 28, 2010 1:22 am
Location: Sijsele, Belgium
Contact:

Re: Read data to databases from MS Excel-2010 files

#15 Post by skiman »

Hi,

There is not very much to explain.

aData := oSheet:Range("A1:C5"):Value

This gives an array with the contents from A1 to C5. This way you get 5 rows.

Once you defined the maximum number of rows and colums, you can read them the way you want. In my sample I read it by 500 rows a time. Much faster that working row by row. Never had a memory problem with it.
Best regards,

Chris.
www.aboservice.be

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Read data to databases from MS Excel-2010 files

#16 Post by Eugene Lutsenko »

Thanks!
And how to define quantity of rows (how to define quantity of columns to me clearly)

aData: = oSheet:Range ("A1:C5"):Value

skiman
Posts: 1199
Joined: Thu Jan 28, 2010 1:22 am
Location: Sijsele, Belgium
Contact:

Re: Read data to databases from MS Excel-2010 files

#17 Post by skiman »

Hi,

I don't understand your question.

Number of rows is as the n_row in your sample, columns as your n_col.
Best regards,

Chris.
www.aboservice.be

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Read data to databases from MS Excel-2010 files

#18 Post by Eugene Lutsenko »

skiman wrote:Hi,

I don't understand your question.

Number of rows is as the n_row in your sample, columns as your n_col.
The problem in that to learn these N_Row and N_Col that way which is used in the program given above, it is necessary to load ALL Excel-leaf into memory in the form of the massif, and at a large number of lines just and arises an error of shortage of memory. Therefore I also became interested in possibility of loading of a leaf in parts. But for this purpose it is necessary to know how to learn quantity of columns and lines before leaf loading as a whole. It is possible to place, of course, information on dimension of a leaf in the first cell or to enter these "A1:C5" in dialogue, but it everything not absolutely that would be desirable. But probably initially will approach.

Code: Select all

oScrn := DC_WaitOn( 'Открытие XLS-файла' )
oBook   := oExcel:Workbooks:Open(cExcelFile)
oSheet  := oBook:activeSheet
aValues := oBook:workSheets(1):usedRange:value      // The error of shortage of memory arises on this line
DC_Impl(oScrn)

********* Сформировать файл Inp_name.txt            
// Наименования классификационных и описательных шкал
aInp_name := aValues[1]                             // Имена колонок 1-го листа Excel-книги

N_Rec = LEN(aValues)                                // Количество строк   с данными в xls-файле
N_Col = LEN(aInp_name)  

skiman
Posts: 1199
Joined: Thu Jan 28, 2010 1:22 am
Location: Sijsele, Belgium
Contact:

Re: Read data to databases from MS Excel-2010 files

#19 Post by skiman »

Hi,

This should help:
...
oUsedRange := oSheet:usedrange
nMaxLines := oUsedrange:rows:count()

There is also oUsedrange:Columns.Count
Best regards,

Chris.
www.aboservice.be

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Read data to databases from MS Excel-2010 files

#20 Post by Eugene Lutsenko »

Similar it that is necessary. Thank you very much!

Post Reply