Download mxODBC - Python ODBC Database Interface
Transcript
mxODBC - Python ODBC Database Interface 8.7 Unicode and String Data Encodings mxODBC also supports Unicode objects to interface with databases. As more databases and ODBC drivers support Unicode natively, using Unicode for text data stored in database becomes more attractive than ever and allows you to avoid the problems you typically face when having to deal with different text encodings and code pages in databases. Even if you don't have access to an ODBC capable of dealing with Unicode natively, you can still take advantage of the auto-conversion mechanisms in mxODBC to simulate Unicode capabilities. mxODBC provides several different run-time configurations to deal with passing Unicode to and fetching it from an ODBC driver. The .stringformat attribute of connection and cursor objects allows defining how to convert string data into Python objects and vice-versa. Unicode conversions to and from 8-bit strings in Python usually assume the Python default encoding (which is ASCII unless you modify the Python installation). Since the database may be using a different encoding, mxODBC allows defining the encoding to be used on a per-connection basis. The .encoding attribute of connection objects is writeable for this purpose. Its default value is None, meaning that Python's default encoding (usually ASCII) is to be used. You can change the encoding by simply assigning a valid encoding name to the attribute. Make sure that Python supports the encoding (you can test this using the unicode() built-in). The default conversion mechanism used in mxODBC is EIGHTBIT_STRINGFORMAT (Unicode gets converted to 8-bit strings before passing the data to the driver, output is always an 8-bit string), the default encoding Python's default encoding. To store Unicode in a database, one possibility is to use the UNICODE_STRINGFORMAT and set the encoding attribute to e.g. 'utf-8'. mxODBC will then convert the Unicode input data to UTF-8, store this in the database and convert it back to Unicode during fetch operations. Note however that UTF-8 encoded data usually takes up more room in the database than the Unicode equivalent, so may experience data truncations which then cause the decoding process to fail. Another possibility is to use the MIXED_STRINGFORMAT which allows mxODBC to interface to the database using the best suitable data type. For e.g. MS SQL Server this usually means passing all string data as Unicode data to and from the database. In MIXED_STRINGFORMAT mode mxODBC will return string data in the default format of the database driver, leaving the conversion to the Python program. Note: mxODBC only supports Unicode objects at the data storage interface level meaning that it can insert and fetch Unicode data from a database provided that the database can handle Unicode and that the used mxODBC subpackage was 116