Download DNACloud: A Tool for Storing Big Data on DNA

Transcript
DNACloud: A Tool for Storing Big Data on DNA
Shalin Shah, Dixita Limbachiya and Manish K. Gupta
arXiv:1310.6992v2 [cs.ET] 16 May 2014
Laboratory of Natural Information Processing
Dhirubhai Ambani Institute of Information and Communication Technology
Gandhinagar, Gujarat, 382007 India
Email: [email protected], [email protected], [email protected]
Abstract—The term Big Data is usually used to describe
huge amount of data that is generated by humans from digital
media such as cameras, internet, phones, sensors etc. By building
advanced analytics on the top of big data, one can predict many
things about the user such as behavior, interest etc. However
before one can use the data, one has to address many issues
for big data storage. Two main issues are the need of large
storage devices and the cost associated with it. Synthetic DNA
storage seems to be an appropriate solution to address these issues
of the big data. Recently in 2013, Goldman and his collegues
from European Bioinformatics Institute demonstrated the use
of the DNA as storage medium with capacity of storing 1 peta
byte of information on one gram of DNA and retrived the data
successfully with low error rate [1]. This significant step shows
a promise for synthetic DNA storage as a useful technology for
the future data storage. Motivated by this, we have developed a
software called DNACloud which makes it easy to store the data
on the DNA. In this work, we present detailed description of the
software.
Keywords—DNA storage, Biostorage, DNA Computing, DNA
codes, Huffman Coding, Software, Open source, DNA hard disk,
Error correction, Synthetic DNA, Organic data stroage.
I.
I NTRODUCTION
Storage has been a fundamental requirement for the Humans. In the modern era of computing and communication,
huge amount of data is being generated and there is a pressing
need for dense storage medium which is cost effective. Table
I shows the typical amount of the data generated and the
kind of storage device it will require to store such a data.
It is predicted that by 2015, the amount of data generated by
NSA (National Security Agency) will be so large that it may
need 1000 billion tera bytes of hard disk space worth $1, 000
trillion [2]. At present, the world is producing 1 exabytes
of data per day and soon devices, machines and sensors of
Internet of Things (IoT) will generate data in the order of
bronobytes, where 1 bronobyte is 1027 bytes, [3] for which a
dense storage medium is needed. From the past 30 years, the
blue print of life viz. DNA has been used as storage medium.
Unlike existing storage device, DNA requires no maintenance
and can be stored without electricity in cold and dark place.
One of the venture to use the DNA as artistic material and
convert the graphic image to the language of genetic code
was initiated by Joe Davis in the work Microvenus [4]. In
1999, Synthetic gene that was created by Kac by translating
a sentence from the biblical book of Genesis into Morse
Code, and converting the Morse code into DNA base pairs
according to a conversion principle [5]. In the 20th century,
many researchers have translated English text, mathematical
equations [6], latin text [7] and simple musical notations [8]
to DNA using different DNA coding principles [9] [10] [11].
All the above mentioned efforts were successful on a small
scale giving birth to the idea of data storage on DNA. But the
most prolific work was done in 2012 by Church, et al. [12]
of Harvard University. They encoded successfully entire book
of Regenisis: How Synthetic Biology Will Reinvent Nature and
Ourselves [13] including 53, 426 words, 11 JPG images and a
JavaScript program into DNA using 1 bit per base encoding.
The main draw back of their method was that it had high error
rate [12]. In the subsequent year, in 2013, this limitation was
overcomed by the Goldman and his group. They implemented
a modified approach that includes error correction and scaled
DNA based data storage [1]. Based on this method of DNA
data storage [1], in this work we present the software called
DNACloud which converts the data file to DNA sequences and
vice versa. The reader is referred to excellent short reviews of
synthetic DNA storage [14], [15] to get an overview of this
new area.
This paper is organized as follows. Section 2 includes
algorithms used for encoding and decoding data into DNA.
Section 3 provides an overview of Graphical User Interface
(GUI). Section 4 describes detailed GUI while Section 5
has remarks on limitations and assumptions in the software.
Section 6 concludes with challenges in the area of synthetic
hard drive and last section provides a link for downloading the
software and related material.
II.
A LGORITHMS FOR E NCODING AND D ECODING DATA
F ILES
While implementing the methods of [1], we modified the
algorithms little bit so that they are memory efficient. For
encoding, algorithm 1 generates DNA string from given data
file which is further divided into DNA chunks of lengths 117
using algorithm 2. For decoding, algorithm 3 takes the DNA
file containing DNA chunks of length 117 and produces DNA
string which is further decoded to get the original data file
using algorithm 4. In order to describe these algorithms we
define a term index info and also give remarks for algorithms
3 and 4.
Definition 1. (Index Info): Index info is base 3 string of length
15 which has format (ID: no of chunk : parity of the chunk),
where ID has length 2, no of chunks has length 12 and parity of
chunk has length 1[1]. Later on, every chunk is also appended
with ’G’ or ’C’ and prepended with ’A’ or ’T’ .
Remark 2. (For * in Algorithm 3) The decoding is
always not possible since the format of .dnac file is
[’chunk1 ’,’chunk2 ’,...,’chunkn ’]. Now while reading x chunks
TABLE I.
H OW BIG IS THE B IG DATA ?
Data Unit
Tera Byte (TB)
Size
1000 GB
How big it is
200000 Photos
On what it can be stored [2]
1 TB Hard Disk
Peta Byte (PB)
1000 TB
Exa Byte (EB)
1000 PB
Zetta Byte (ZB)
1000 EB
Yotta Byte (YB)
1000 ZB
3 years of EOS data
(NASA’s Earth Observing System)
2 Exabytes: Total volume of
information generated in 1999
1.9 zettabytes of information sent
through broadcast technology like T.V and GPS. [17]
1 YB is the total Volume of
government data the NSA (National Security Agency) [18]
16 Backblaze storage pads
racked in two datacenter
A city Block of 4
storey datacentre
20 percent of Manhattan,
New york
State of Delware and
Rhode Island with million Data centre [18]
Algorithm 1 Algorithm for generating DNA string
Require: File size and chunk size
Ensure: DNA string for the file
1:
if file size < chunk size then
Do not divide file into chunks
else
Divide file into chunks where no of chunks = file size /
chunk size + 1
end if
2:
if no of chunks > 1 then
Read bytes string from chunk one
Convert the string to ascii values list
Convert the ascii values list to base 3 string
Convert base 3 string to DNA bases and store the last
base of the DNA string
3:
for chunk number 1 to total number of chunks - 1 do
Read the bytes string for the chunk
Convert the bytes string to ascii values list
Convert the ascii value list to base 3 string
Convert base 3 string to DNA bases using pervious
chunk’s last base
Concatante this new DNA string with original one
Store the last base of the DNA string
end for
4:
Read bytes string for the last chunk
Convert the bytes string to ascii values list
Convert this ascii value list to base 3 string
Convert base 3 string to DNA bases using pervious
chunk’s last base
Concatenate the new DNA string with original one
else
if no. of chunks = 1 then
Read entire file and perform conversion steps directly
for converting to ASCII then to base 3 then to DNA
base (trivial case)
end if
end if
5. Convert length of the final DNA String obtained to base
three and add leading zeros unless length is 20
6. Add zeros in between the DNA string and base 3 string
obtained in previous step such that total string length is
divisible by 25
7. Convert the remaining base 3 string to DNA base
Remarks [16]
400 Terabytes: National
Climactic Data Center (NOAA) database
200 Petabytes:
All printed material.
5 Exabytes: All words ever
spoken by humans.
5 Zetta Byte is equal to
US NSA’s Utah Data Center [17]
1.3 zettabytes is of traffic
annually over the internet in 2016 [19]
Algorithm 2 Algorithm for generating DNA chunks
Require: DNA string for the file obtained in algorithm 1
Ensure: DNA chunks of length 117
1:
if file size < chunk size then
Do not divide file into chunks
else
Divide file into chunks where no of chunks = file size /
chunk size + 1
end if
2:
if no of chunks > 1 then
Read DNA string for the chunk one
Divide the string into chunks of length 100 and add index
info in these chunk
Store last 75 DNA bases of the DNA string read
3:
for chunk number 1 to total number of chunks - 1 do
Read DNA string in the chunk
Append the read DNA string to last stored temporary
DNA string of 75 bases
Again divide the string into chunks of length 100, add
index info and store its last 75 DNA bases
Concatenate the list of chunks to original list of chunks
end for
4:
Read last chunk, append the read DNA string to last stored
temporary DNA string of 75 bases
Divide the string into chunks of length 100 and add index
info in these chunks
Concatenate the list of chunks to original list of chunks
else
if no. of chunks = 1 then
Read entire file and divide the string into chunks of
length 100 and add index info in these chunks (trivial
case)
end if
end if
This entire list obtained is stored in .dnac file
it may happen that last chunk is not completely read, hence
we keep on removing the last byte from the read string unless
we get ’,’ before which entire chunk is decodable.
Remark 3. (For ** in Algorithm 4) The decoding here also is
not always possible since huffman values are either of length
5 or length 6. So we keep on removing the last byte from the
read string and try decoding again and again unless decoded.
Algorithm 3 Algorithm for regenerating DNA string from
DNA chunks
Require: .dnac file containing DNA chunks of length 117
obtained from algorithm 2
Ensure: DNA string for the chunks
1:
if file size < chunk size then
Do not divide file into chunks
else
Divide file into chunks where no of chunks = file size /
chunk size + 1
end if
2:
if no of chunks > 1 then
Decode the given chunks read to corrosponding DNA
string if possible*
while not decoded do
Remove last base from buffer of .dnac file and try
decoding again
Store this base at the end of prepend string if a bit is
removed
end while
3:
for chunk number 1 to total number of chunks - 1 do
Prepend last stored String to buffer read if prepend
string is not null
while not decoded do
Remove last base from buffer of .dnac file and try
decoding again
Store this base at the end of prepend string if a bit
is removed
end while
write the DNA string obtained to original
end for
4:
Prepend last stored String to buffer read if prepend string
is not null, decode it and append it to original DNA string
else
if no. of chunks = 1 then
Trivial case so read entire file at once and convert it to
DNA string
end if
end if
III.
G RAPHICAL U SER I NTERFACE (GUI) OVERVIEW
DNACloud has been primarily developed to facilitate the
storage of data on DNA. The software converts any type of
data (text, image, audio or video etc.) into DNA strings and
enables it to store on DNA and helps to retrieve the data
stored on DNA. The GUI of DNACloud is developed to enable
this feature. Along with the encoding and decoding facility,
DNACloud provides the user various estimations related to the
data storage on DNA as shown in Figure 1. There are three
basic modules of the software as discussed in Sections A, B
and C.
A. DNA Encoder (File to DNA)
To store data on DNA, one has to find ways for encoding
the given data into DNA sequence. There are many encoding
techniques available to convert the data into DNA sequences
Algorithm 4 Algorithm for regenerating original file from
DNA chunks
Require: DNA string obtained from algorithm 3
Ensure: Original computer file
1:
if file size < chunk size then
Do not divide file into chunks
else
Divide file into chunks where no of chunks = file size /
chunk size + 1
end if
2:
if no of chunks > 1 then
Read DNA string for the chunk 1
Convert the DNA string to base 3 string
Convert the base 3 string to list of Huffman values if
possible**
while not decoded do
Remove last base and try decoding again
Add removed base to ’prepend string’
end while
Convert the huffman list to corrosponding ascii list
Convert ascii list to string of bytes and write to file
3:
for chunk number 1 to total number of chunks - 1 do
Read DNA string for the chunk and prepend ’prepend
string’ to it if not null
Convert the DNA string to list of huffman values if
possible
while not decoded do
Remove last base and try decoding again
Add removed base to ’prepend string’(after clearing)
end while
Convert huffman list to corrosponding ascii list
Convert the ascii list to string of bytes and write to file
end for
4:
Read last chunk, convert this to base 3 string, to corresponding Huffman list, to corresponding ascii list, to
string of bytes and write this bytes to file
else
if no. of chunks = 1 then
Trivial case process entire file at once and convert it to
base 3 then to huffman list which in turn is converted
to ascii list and then to stream of bytes which are then
written to a file
end if
end if
by using DNA codes [20]. One of the most the efficient source
coding technique called Huffman codes is well known for data
compression [21]. The DNA encoding by Huffman is uniquely
decodable. In this software, similar Huffman encoding is
implemented [1]. For error correction, the overlapping codes
[1] are implemented and data is retrived from DNA with
reduced error rates. The encoding module takes the data file
of any format (.text, .png, .jpg, .mp3, .mkv etc.) as an input.
The DNA sequence encoded is divided into fixed length of
DNA chunks and the part of the DNA chunks were overlapped
implementing four fold redundancy for error correction. The
2)
Fig. 1. Functionality of DNACloud: This flowchart represents the basic
function of the DNACloud. As it shows that there are three main modules 1.
Encode, 2. Decode and 3. Estimator. Encode converts the data file of any input
and gives DNA sequences as an output. Decode takes DNA sequences as an
input and convert it back to original data. Estimator is developed to estimate
certain numerical values like memory required for DNA storage that configure
your memory of the system and other estimates the biochemical properties of
the DNA employed in the wetlab experiments.
original file is converted to Huffman base 3 code (0,1,2) with
code length of 5 which is transformed to triplet codon to DNA
code according to the conversion principle as substituting each
trit (triplet) with one of the the three nucleotide different from
the preceding one i.e. if G is the preceding, then A or T or
C will be placed, this ensures that no homo polymers are
generated to reduce the sequencing error. For these code, if any
DNA chunk or base was deleted, then it can be regenerated by
reading that overlaped code sequence. This module saves the
encoded file with extension ”fileformatextension.dnac”. E.g. an
image file will be encoded and saved as ”.png.dnac”.
B. DNA Decoder (DNA to File)
To retrive the data stored on DNA, data has to be decoded
from DNA. The reverse step of encoding is followed for
decoding. The data stored in DNA can be retrived by excluding
the index bits and converting base 3 Huffman DNA codes
back to original data. This module takes the DNA sequence
as input and gives original data stored as output. The output
of the sequencer can be used as input for this module. It takes
”.dnac” file as input.
C. Storage Estimator
This module gives various statistics and biochemical
properties of DNA for the encoded file. These estimated
values are helpful while doing the experiments for storing
data in DNA. Estimator has two main sections.
1)
Memory Required
User can select the file to be encoded from the system
and the following values of the file are estimated.
This will help the user to decide how much memory
of his system will be occupied for encoding and
storing particular data file in DNA. These values are
approximated.
a) File size in bytes
b) Size of DNA string
c) Free Memory Required
d) Amount of DNA Required
Biochemical Properties and Cost
This will estimate the biochemical properties of the
DNA sequence used to store the data. Select ”.dnac”
file from the system which contains the DNA sequences to estimate the properties. This will take salt
concentration (mM) and cost per base as an input.
It will estimate the GC content of DNA, melting
temperature of the DNA and total cost to store the
file in DNA. All the values are approximated. This
facilites the user to figure out the budget for the
experiments depending on the total amount of DNA.
IV.
DETAILED DESCRIPTION OF GUI
When the program is executed, a dialouge box is popped
up for workspace where one can save his work. All the files
generated will be automatically saved in this workspace. You
can switch to other workspace. After this dialouge box user
details are asked. It includes name, contact number and email
address and file you are using as an input. This will save
your details and Generate Barcode button will generate a
barcode of it which can be used as unique identification by
Biotech companies when performing the experiments. It is not
mandatory but recommendable to fill the details else the box
will remind you again and again. To Encode or decode the file
select either of the options from File menu. Software includes
options A to G in the menu.
A. Encode (File to DNA) Button
This option is available under the File menu which will
convert any type of data to DNA strings. User can select the
file to be converted into DNA string by clicking on Choose
File button. Once the user selects the file to be encoded, the
list of information for the encoded file will be displayed as
below.
1)
2)
3)
4)
Length of DNA string
No of DNA oligonucleotides (chunks)
Length of each DNA oligonucleotide
File size in bytes
To save the encoded file, user can save the file with specific
name on specific location by using Encode your File button.
It will generate the file with extension ”.dnac” that has DNA
string for the file selected. RESET button can clear the selected
file then user can select new file. It is like clear button.
B. Decode (DNA to File) Button
This option is available under the file menu which will
retrieve the data stored in DNA. User can enter the DNA string
from which data is to be retrieved in the text box against Please
write DNA string option. User can also decode the encoded
file from the system by option Select .dnac file. Once the file
is selected, click on Decode option. To save the decoded file,
give the name and save the file at specific location. This will
TABLE II.
C OMPARISION OF THE FILE FORMATS ENCODED BY DNAC LOUD . D IFFERENT FILE TYPES WERE ENCODED AND DECODED USING
DNAC LOUD . F OR COST CALCULATIONS SEE [1]
File Type
Limit of File size can be encoded
Encoded File size (Bytes)
Required amount of DNA
Cost of DNA (US $)
Memory required DNA Chunks
Ref.
Text
Audio
Video
Image (HD)
581130733 bytes ASCII characters
554 MB (around 50 songs)
581130733.33 bytes (around 65 minutes)
5.7MB (100 HD images)
15902545
151391203
598292824
23013231
3.4 × 10−14 gms
3.3 × 10−13 gms
1.31 × 10−12 gms
5.051 × 10−14 gms
197191.6
1877250.9
7418831.0
285364.06
409MB
3896 MB
15400 MB
24MB
[22]
[23]
[24]
[25]
generate the original file that was encoded in DNA. RESET
button can clear the selected file i.e, It is like a clear button.
C. Storage Estimator
As mentioned above, it has two estimators. To estimate
the memory required user can select the option from F ile →
Estimator →Memory Required. This will take data file
to be encoded as input and Calculate button will estimate
the values as mentiond above. For second estimator, user can
select F ile → Estimator → Biochemical Properties option.
This will ask .”dnac” file as an input and give the GC content
and Melting temperature values and cost for total DNA. Save
button will help to save estimated information.
D. Export Button
This will help the user to export the file generated to
different formats. DNA strings can be exported to file format
that can be used as input for the synthesizer. File can be
exported to file format that is required by the sequencer. These
options are available in File menu. This will generate the
feasible output of the DNA strings that is to be used by
respective machines. To export the DNA file for the synthesizer
to synthesize the DNA, use Export DNA synthesizer File
option. To decode the file stored in DNA, use Import DNA
sequencer file option to get the DNA sequences to be decoded.
These options will be available in the next version of the
software. The .dnac file can be exported to PDF and latex
file with all software output details in single PDF by using the
option Export .dnac to PDF and Export to latex respectively.
V.
C OMPARISION OF DATA S TORAGE ON DNA
The software has limitation of encoding and decoding the
file beyond certain file size. Table II compares the file size limit
of different file types encoded by the software. At present,
the maximum file size of 3486784400 bytes or 3.4 GB of
DNA strings could be decoded by the software and any file
of size 581130733.333 bytes or 554 MB can be encoded with
DNACloud.
VI.
Considering the current rate of data explosion, DNA storage becomes an absolutely indispensable data storage medium
because of its low maintenance cost, high data density, ecofriendliness and durability. However, the technological advancements are rudimentary, since still the cost for sequencing
and synthesizing DNA is pretty high. But since the cost is
decreasing every day, we expect that the research in encoding
and decoding algorithms can avail common man with this
technology within next few years. Thus, DNACloud can be
considered as a potential tool to convert data files into DNA
and vice versa. We are anticipating to enhance the capability
of the software to encode large size data by implementing
better encoding and decoding techniques and error correction
methods.
VII.
SOFTWARE AVAILABILITY
The software source code, installers for Mac and Windows,
user manual, product demo and other related materials can be
downloaded from http://www.guptalab.org/dnacloud.
VIII.
E. Clear temp files
This will clear all your history of the software. It will
remove all the temporary files generated by the software.
C ONCLUSION
ACKNOWLEDGEMENT
We would like to thank Thorsten Weimann and Anand B
Pillai whose open source libraries of python-barcode [26] and
pytxt2pdf [27] respectively are used in the software.
R EFERENCES
F. Exit
[1]
Exit will help to quit the software.
[2]
G. User details
This option is available in preferences menu. For the data
security, user has to feed his details then the dialogue box to
enter the password appear. Password can be reset with Change
Password option in same menu. This helps the user to retrieve
his files stored in DNA safely. The barcode generated can
be used by biotech companies to tag the DNA on which the
particular file is stored.
[3]
[4]
[5]
[6]
N. Goldman, P. Bertone, S. Chen, C. Dessimoz, E. M. LeProust,
B. Sipos, and E. Birney, “Towards practical, high-capacity, lowmaintenance information storage in synthesized DNA,” Nature, 2013.
G.
Budman.
NSA
might
want
some
backblaze
pods. [Online]. Available: http://blog.backblaze.com/2009/11/12/
nsa-might-want-some-backblaze-pods/
L. Dixita and M. K. Gupta, “Natural data storage on DNA: A review,”
2013, preprint.
J. Davis, “Microvenus,” Art Journal, vol. 55, no. 1, pp. 70–74, 1996.
E. Kac. (1999) Genesis-art of DNA. [Online]. Available: http:
//www.ekac.org/geninfo.html
N. Yachie, K. Sekiyama, J. Sugahara, Y. Ohashi, and M. Tomita,
“Alignment-based approach for durable data storage into living organisms,” Biotechnology progress, vol. 23, no. 2, pp. 501–505, 2007.
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
N. G. Portney, Y. Wu, L. K. Quezada, S. Lonardi, and M. Ozkan,
“Length-based encoding of binary data in DNA,” Langmuir, vol. 24,
no. 5, pp. 1613–1616, 2008.
M. Ailenberg and O. D. Rotstein, “An improved huffman coding
method for archiving text, images, and music characters in DNA,”
Biotechniques, vol. 47, no. 3, p. 747, 2009.
P. C. Wong, K.-k. Wong, and H. Foote, “Organic data memory using
the DNA approach,” Communications of the ACM, vol. 46, no. 1, pp.
95–98, 2003.
M. Arita and Y. Ohashi, “Secret signatures inside genomic DNA,”
Biotechnology progress, vol. 20, no. 5, pp. 1605–1607, 2004.
G. M. Skinner, K. Visscher, and M. Mansuripur, “Biocompatible writing
of data into DNA,” Journal of Bionanoscience, vol. 1, no. 1, pp. 17–21,
2007.
G. M. Church, Y. Gao, and S. Kosuri, “Next-generation digital information storage in DNA,” Science, vol. 337, no. 6102, pp. 1628–1628,
2012.
G. M. Church and E. Regis, Regenesis: how synthetic biology will
reinvent nature and ourselves. Basic Books, 2012.
A. O’ Driscoll and R. D. Sleator, “Synthetic DNA: the next generation of big data storage,” Bioengineered, vol. 4, no. 3, pp. 123–
125, 2013, [PubMed Central:PMC3669150] [DOI:10.4161/bioe.24296]
[PubMed:23514938].
S. Greengard, “A new approach to information storage,” Commun.
ACM, vol. 56, no. 8, pp. 13–15, Aug. 2013. [Online]. Available:
http://doi.acm.org/10.1145/2492007.2492013
K. Swearingen. How much information. [Online]. Available: http:
//chnm.gmu.edu/digitalhistory/links/pdf/preserving/8 5a.pdf
M. Hilbert. How much information is there in the
world? [Online]. Available: http://news.usc.edu/#!/article/29360/
How-Much-Information-Is-There-in-the-World
R. Thomchick. NSA (national security agency) or FBI (federal
bureau of investigation) will have one yottabyte. [Online]. Available:
http://www.metaholic-musings.com/2013/03/20/brontobytes/
S. Higginbotham. As data gets bigger, what comes after
a yottabyte? [Online]. Available: http://gigaom.com/2012/10/30/
as-data-gets-bigger-what-comes-after-a-yottabyte/
M. Arita, “Writing information into DNA,” in Aspects of Molecular
Computing. Springer, 2004, pp. 23–35.
D. A. Huffman, “A method for the construction of minimumredundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–
1101, 1952.
P. Ribenboim, “How to recognize whether a natural number is a prime,”
in The New Book of Prime Number Records. Springer, 1996, pp. 19–
178.
J. Singh. Vakratunda mahakaya-prathameshwara ganadheeshwara. [Online]. Available: http://music.raag.fm/Bhakti Sangeet/songs-9797-Shri
Ganesh-Jagjit Singh
J. Rover. National geographic television megastructures 53 ultimate
skyscraper nyc. [Online]. Available: https://www.youtube.com/watch?
v=7lV1SQTqhl0
Wikipedia. DNA structure image. [Online]. Available: http:
//upload.wikimedia.org/wikipedia/commons/thumb/d/d8/Benzopyrene
DNA adduct 1JDG.png/433px-Benzopyrene DNA adduct 1JDG.png
T. Weimann. Code for barcode. [Online]. Available: https://bitbucket.
org/whitie/python-barcode
A. Pillai. Convert text to pdf. [Online]. Available: http://code.
activestate.com/recipes/189858-python-text-to-pdf-converter/