Accessing the new replicas, changes from the previous cluster¶

  1. Old cluster, using environment variables
  2. New cluster, using credentials file and host names
  3. Diff between the two versions
  4. Other links and contact info

Old cluster, using environment variables¶

Previously, we would connect to the databases using information in environment variables, like so:

In [4]:
import os
import pymysql

host = os.environ['MYSQL_HOST']
user = os.environ['MYSQL_USERNAME']
password = os.environ['MYSQL_PASSWORD']

query = "SELECT page_title FROM page WHERE page_title LIKE %s LIMIT 5;"

conn = pymysql.connect(
    host=host,
    user=user,
    password=password
)
with conn.cursor() as cur:
    cur.execute("USE eswiki_p;")
    cur.execute(query, args=('%Alicante%',))
    data = cur.fetchall()
    for row in data:
        print(str(row[0], encoding='utf-8'))

conn.close()
500_Kilómetros_Nocturnos_de_Alicante
500_Km_Nocturnos_de_Alicante
Abubilla_(escultura_de_Alicante)
Acceso_Alicante_desde_Elche
Acceso_Noroeste_a_Alicante

Due to issues with the proxy that MYSQL_HOST pointed to, and due to the fact that now we have to connect to the DB we want to use directly, we won't be able to use the environment variables.

New cluster, using credentials file and host names¶

Instead of using the environment variables, the connection will be very similar to what is done in other environments like Toolforge (*):

  • Each user has credentials to connect to the databases in ~/.my.cnf
  • To connect to a database you use the hosts as specified in Help:Toolforge/Database
  • We have to specify the database to connect that matches the url

Here is the same example from above:

In [3]:
import pymysql

# Host urls are like {wiki}.{analytics,web}.db.svc.wikimedia.cloud
host = "eswiki.analytics.db.svc.wikimedia.cloud"
credentials = ".my.cnf"

query = "SELECT page_title FROM page WHERE page_title LIKE %s LIMIT 5;"

conn = pymysql.connect(
    host=host,
    read_default_file=credentials,
    database="eswiki_p"
)
with conn.cursor() as cur:
    cur.execute(query, args=('%Alicante%',))
    data = cur.fetchall()
    for row in data:
        print(str(row[0], encoding='utf-8'))

conn.close()
500_Kilómetros_Nocturnos_de_Alicante
500_Km_Nocturnos_de_Alicante
Abubilla_(escultura_de_Alicante)
Acceso_Alicante_desde_Elche
Acceso_Noroeste_a_Alicante

Diff between the two versions¶

Here is the diff of the two code snippets to help you see what changes:

- import os
  import pymysql

- host = os.environ['MYSQL_HOST']
- user = os.environ['MYSQL_USERNAME']
- password = os.environ['MYSQL_PASSWORD']
+ # Host urls are like {wiki}.{analytics,web}.db.svc.wikimedia.cloud
+ host = "eswiki.analytics.db.svc.wikimedia.cloud"
+ credentials = ".my.cnf"

  query = "SELECT page_title FROM page WHERE page_title LIKE %s LIMIT 5;"

  conn = pymysql.connect(
      host=host,
-     user=user,
-     password=password
+     read_default_file=credentials,
+     database="eswiki_p"
  )
  with conn.cursor() as cur:
-     cur.execute("USE eswiki_p;")
      cur.execute(query, args=('%Alicante%',))
      data = cur.fetchall()
      for row in data:
          print(str(row[0], encoding='utf-8'))

  conn.close()

Other links and contact info¶

If you want to see a small example connecting to different databases with the new cluster, please see the notebook Accessing Wikireplicas from PAWS

If you'd like to see more examples or have questions, reach out to #wikimedia-cloud on freenode IRC or email cloud@lists.wikimedia.org