I believe, there are many articles or blogs already available
which speaks about how to convert an excel file to a comma separated file using
C# and in all the cases (which I referred), excel is read from a hard drive of
a local machine and csv file is saved back to the same hard drive. But in spite
of knowing this, again, I’m going to draft another post.
Wondering, why?
Well, this post is going to be slightly different in the way files are being read and saved back. Below are the major offerings of this post:
Well, this post is going to be slightly different in the way files are being read and saved back. Below are the major offerings of this post:
- What if we have many excel files to convert but disk is not having enough space to save all of those? Same is the case for conversion output too.
- What if we don’t have permission to save our converted files on to the local machine?
- How can we run this conversion utility using web jobs?
Problem Statement
Reading excel files from Azure blob storage, convert them to csv
format and uploading them back to Azure blob storage. This entire process has
to run via triggered web job, so that it can be repeated as and when excel to
csv conversion is required.
Setting up environment
I’m using Visual Studio 2019 v16.4.0 and having an active Azure
subscription.
High level steps
- Creating containers in Azure storage
- Reading Excel from Azure storage
- Converting Excel to CSV format
- Uploading CSV to Azure storage
- Creating Azure WebJob
- Triggering Azure WebJob
A container must be created under blob service to store all the
excel files which need to be converted to csv format. Now there are two ways,
one can create a container – one is through the Azure portal and another one is
by using C#. As both these are easily available on MSDN, I’m not going to
repeat the entire procedure.
For detailed steps on how to create a container,
please refer references section placed at the end of this article.
For our exercise, I’ve created two containers named excelcontainer
and csvcontainer under one storage account. Where,
excelcontainer – holds excel files which are to be converted to
csv
csvcontainer – holds the converted csv files
Below is the screenshot of my excelcontainer, which holds 3 excel
workbooks:
Reading Excel from Azure storage
Now we have excelcontainer ready with uploaded files, it’s time to
read data from all those files and here is the code to do that:
Where BlobOutput is the DTO with below members.
Converting Excel to CSV format
public
async Task<List<BlobOutput>> Download(
string
containerName)
{
var downloadedData =
new
List<BlobOutput>();
try
{
// Create service and container client for blob
BlobContainerClient blobContainerClient =
_blobServiceClient.GetBlobContainerClient(containerName);
// List all blobs in the container
await
foreach
(BlobItem item
in
blobContainerClient.GetBlobsAsync())
{
// Download the blob's contents and save it to a file
BlobClient blobClient = blobContainerClient.GetBlobClient(item.Name);
BlobDownloadInfo downloadedInfo = await blobClient.DownloadAsync();
downloadedData.Add(
new
BlobOutput
{ BlobName = item.Name, BlobContent = downloadedInfo.Content });
}
}
catch
(Exception ex)
{
throw
ex;
}
return
downloadedData;
}
Where BlobOutput is the DTO with below members.
public
class
BlobOutput
{
public
string
BlobName {
get
;
set
; }
public
Stream BlobContent {
get
;
set
; }
}
In above step, we have collected the data from each blob object into
a stream. So, in this step, we will convert the streamed data into csv format
and here is the code for that:
where
BlobInput is the DTO with below members:
If a workbook contains multiple sheets, then a separate csv will be created for each sheet with the file name format as <ExcelFileName>_<SheetName>. csv.
public
static
List<BlobInput> Convert(List<BlobOutput> inputs)
{
var dataForBlobInput =
new
List<BlobInput>();
try
{
foreach
(BlobOutput item
in
inputs)
{
using
(SpreadsheetDocument document =
SpreadsheetDocument.Open(item.BlobContent,
false
))
{
foreach
(Sheet _Sheet
in
document.WorkbookPart.Workbook.Descendants<Sheet>())
{
WorksheetPart _WorksheetPart =
(WorksheetPart)document.WorkbookPart.GetPartById(_Sheet.Id);
Worksheet _Worksheet = _WorksheetPart.Worksheet;
SharedStringTablePart _SharedStringTablePart =
document.WorkbookPart.GetPartsOfType
<SharedStringTablePart>().First();
SharedStringItem[] _SharedStringItem =
_SharedStringTablePart.SharedStringTable.Elements
<SharedStringItem>().ToArray();
StringBuilder stringBuilder =
new
StringBuilder();
foreach
(var row
in
_Worksheet.Descendants<Row>())
{
foreach
(Cell _Cell
in
row)
{
string
Value =
string
.Empty;
if
(_Cell.CellValue !=
null
)
{
if
(_Cell.DataType !=
null
&&
_Cell.DataType.Value == CellValues.SharedString)
Value = _SharedStringItem[
int
.Parse
(_Cell.CellValue.Text)].InnerText;
else
Value = _Cell.CellValue.Text;
}
stringBuilder.Append(
string
.Format(
"{0},"
, Value.Trim()));
}
stringBuilder.Append(
"\n"
);
}
byte
[] data = Encoding.UTF8.GetBytes
(stringBuilder.ToString().Trim());
string
fileNameWithoutExtn = item.BlobName.ToString().Substring
(0, item.BlobName.ToString().IndexOf(
"."
));
string
newFilename = $
"{fileNameWithoutExtn}_{_Sheet.Name}.csv"
;
dataForBlobInput.Add(
new
BlobInput { BlobName = newFilename,
BlobContent = data });
}
}
}
}
catch
(Exception Ex)
{
throw
Ex;
}
return
dataForBlobInput;
}
public
class
BlobInput
{
public
string
BlobName {
get
;
set
; }
public
byte
[] BlobContent {
get
;
set
; }
}
If a workbook contains multiple sheets, then a separate csv will be created for each sheet with the file name format as <ExcelFileName>_<SheetName>. csv.
Uploading CSV to Azure storage
Once the data is converted to csv, we are good to go for uploading
the csv files back to container and here is the code to perform this:
So far, we have read the excel file from container, convert it to
csv format and uploaded back to another container. All good. The next task is
to automate this using triggered WebJob.
public
async Task Upload(
string
containerName, List<BlobInput> inputs)
{
try
{
// Create service and container client for blob
BlobContainerClient blobContainerClient =
_blobServiceClient.GetBlobContainerClient(containerName);
foreach
(BlobInput item
in
inputs)
{
// Get a reference to a blob and upload
BlobClient blobClient =
blobContainerClient.GetBlobClient(item.BlobName.ToString());
using
(var ms=
new
MemoryStream(item.BlobContent))
{
await blobClient.UploadAsync(ms, overwrite:
true
);
}
}
}
catch
(Exception ex)
{
throw
ex;
}
}
Creating Azure WebJob
WebJob can be created using Visual Studio by right clicking on the
project and selecting Publish…
Apart from this, there are many ways to create a triggered WebJob
and all are mentioned on MSDN.
Triggering Azure WebJob
If everything is setup correctly, you will be able to see below
screen on your Azure portal.
As this is triggered WebJob, clicking on Run button will trigger this job and will create output as shown below:
Takeaway
Using Azure storage and WebJob, we have converted files from one
format to another without utilizing the local disk space for saving file during
this entire conversion process.
Comments
Post a Comment