mysql to neo4j: a dba perspective - david stern @ graphconnect ny 2013
DESCRIPTION
This session is a walk through and best practices from installation and initial set up, through maintenance and performance tuning, all the way to production use for a series of Neo4j learning opportunities for administrators.TRANSCRIPT
(MySQL)-[:to]->(neo4j)A DBA Perspective
Dave Stern
@davestern1
Dev Ops @ FiftyThreeMySQL user & admin since 1998
Multiple tiers of masters & slaves
Bare metal & AWS - EC2/RDS
MySQL & Percona
neo4j user & admin since 2012
neo4j 1.8, 1.9
AWS: Multiple 3-instance enterprise clusters
How do you use MySQL?
Single Instance
Master/Slave, Multi-master
MySQL Cluster
Have you tried neo4j yet?
Where does FiftyThree useneo4j?
Where does FiftyThree useneo4j?
Much more in development...
What is this talk about?
Comparison
Configuration
Use
Comparison
Logical Partitioning
http://www.mysql.com/products/workbench/
MySQLStrictly enforced schema
neo4jNo logical databases
No tables
...no schema
...no joins
2.0: schema-optional
Physical Partitioning & ShardingImproves write performance, usually disk I/O
MySQLinnodb_file_per_table
Databases on separate partitions or devices
Shard horizontally (e.g. by time range)
Shard vertically (e.g. by table or function)
Logs can be on separate partitions for I/O
gain
neo4jNo logical partitioning by DB or table
Highly connected data: no clear separation
Logs can be on separate partitions for I/O
gain
SCALE UP!
Authentication & AuthorizationMySQL
mysql> select Host, db, user, select_priv, insert_priv, update_priv, delete_priv from db;+-----------+---------+-----------+-------------+-------------+-------------+-------------+| Host | db | user | select_priv | insert_priv | update_priv | delete_priv |+-----------+---------+-----------+-------------+-------------+-------------+-------------+| % | test | | Y | Y | Y | Y || % | test\_% | | Y | Y | Y | Y || localhost | Orders | admin | Y | Y | Y | Y || localhost | Events | admin | Y | Y | Y | Y || localhost | Events | events | Y | Y | Y | N || 10.% | Events | events | Y | N | N | N |+-----------+---------+-----------+-------------+-------------+-------------+-------------+
Authentication & Authorizationneo4j
No permissions
No users
How do you secure the DB?1. Protect the database in a Private Network or VPC2. Firewall: router, AWS Security Groups, iptables3. Proxy requests via web server or Load Balancer
If you must allow access, use HTTPS & authenticate at the proxy.
Replication
http://www.mysqlperformanceblog.com/wp-content/uploads/2013/07/23.png
Replication STOP SLAVE; SET GLOBAL sql_slave_skip_counter = 1; START SLAVE;
Replication vs. HA
MySQLFree
Slaves pull updates
Eventual consistency
One-way, asynchronous
neo4jEnterprise edition: can cost $depending on use
Slaves can pull asynchronousupdates
Eventual consistency, optimisticpushes to slaves are the default
Writes to any cluster member
JVMBuffers & Memory management =~ JVM settings
The database itself is extendable via Java
... if you're into that sort of thing
Built-in ToolsData Browser
Built-in ToolsData BrowserBackup Script
neo4j
$ /opt/neo4j/bin/neo4j-backup -from single://10.66.182.177:6362 \> -to /media/neo4j-backup/production/2013-11-02T05:40:10ZPerforming full backup from 'single://10.66.182.177:6362'............................................[44 Files copied]Full consistency check.................... 10%.................... 20%.................... 30%.................... 40%.................... 50%.................... 60%.................... 70%.................... 80%.................... 90%.................... 100%Done
Built-in ToolsData BrowserBackup Script
MySQL
$ innobackupex --user=DBUSER --password=DBUSERPASS /path/to/BACKUP-DIR/
innobackupex: Backup created in directory '/path/to/BACKUP-DIR/2013-03-25_00-00-09'innobackupex: MySQL binlog position: filename 'mysql-bin.000003',position 1946111225 00:00:53innobackupex: completed OK!
Built-in ToolsData BrowserBackup Script
Visual Server Info
ConfigurationMySQL
So many options... mysql> SHOW VARIABLES; +-----------------------------------------+---------------------------+ | Variable_name | Value | +-----------------------------------------+---------------------------+ | auto_increment_increment | 1 | | auto_increment_offset | 1 | | autocommit | ON | | automatic_sp_privileges | ON | | back_log | 50 | | basedir | /home/mysql/bin/mysql-5.5 | | big_tables | OFF | | binlog_cache_size | 32768 | | binlog_direct_non_transactional_updates | OFF | | binlog_format | STATEMENT | | binlog_stmt_cache_size | 32768 | | bulk_insert_buffer_size | 8388608 | ... | max_allowed_packet | 1048576 | | max_binlog_cache_size | 18446744073709547520 | | max_binlog_size | 1073741824 | | max_binlog_stmt_cache_size | 18446744073709547520 | | max_connect_errors | 10 | | max_connections | 151 | | max_delayed_threads | 20 | | max_error_count | 64 | | max_heap_table_size | 16777216 | | max_insert_delayed_threads | 20 | | max_join_size | 18446744073709551615 | ...
You can optimize dozens of settings like these...
MySQL ConfigurationBuffers, Caching & I/O
innodb_buffer_pool_size = 12Ginnodb_buffer_pool_instances = 8innodb_additional_mem_pool_size = 256M
innodb_flush_log_at_trx_commit = 2innodb_flush_method = O_DIRECTinnodb_log_file_size = 128Minnodb_log_buffer_size = 64M
innodb_file_per_tableinnodb_io_capacity = 500innodb_read_io_threads = 64innodb_write_io_threads = 64
and these...
MySQL ConfigurationNetwork & Concurrency
table_cache = 2048max_connections = 1000
max_allowed_packet = 16M
and these...
MySQL ConfigurationReplication
server-id = 2master-host = db-master.mycompany.commaster-port = 3306master-user = usernamemaster-password = passwordmaster-connect-retry = 60
And these, depending on version & hardware...
MySQL ConfigurationOther
sort_buffer_size = 2Mtmp_table_size = 32M
join_buffer_size = 128k
query_cache_type = 1query_cache_size = 64M
open_files_limit = 8192
....
neo4j Configuration TuningSimple Questions
How many nodes do you expect?
How many relationships do you expect?
Average number of properties per node and relationship?
Optional: How do you expect to traverse the graph?
Long paths and/or large result sets?
Short paths and/or small results sets?
3 things to calculate:File Cache Mapped Memory & Object Caches
Heap Size
RAM for OS
neo4j ConfigurationStore file Record size Contents
neostore.nodestore.db 9 B Nodes
neostore.relationshipstore.db 33 B Relationships
neostore.propertystore.db 41 B Properties for nodes andrelationships
neostore.propertystore.db.strings 128 B Values of string properties
neostore.propertystore.db.arrays 128 B Values of array properties
Capacity Planning Estimates:
Node size (9B) x expected nodes (14 B in 2.0)
Relaltionship size (33B) x expected relationships
Property size (41B) x expected properties
Strings & Arrays
ConfigurationMain config files
neo4j-wrapper.conf
neo4j.properties
neo4j-server.properties
Configurationneo4j-wrapper.conf
Heap Size
GC method
Configurationneo4j.properties
File Caches: Mapped memory
Object Caches
Indexes
HA
Backup
Configurationneo4j-server.properties
HTTP/S
Admin client
REST
Database mode
Logging
Configuration21.2. Server Configuration
25. Configuration & Performance
neo4j: Buffers, Caching & I/Oneo4j-wrapper.conf
# Initial Java Heap Size (in MB)wrapper.java.initmemory=1024
# Maximum Java Heap Size (in MB)wrapper.java.maxmemory=1024
neo4j: Buffers, Caching & I/Oneo4j.properties
Two types of caches: file buffer and object cache
File Buffer Cache:
# Default values for the low-level graph engineneostore.nodestore.db.mapped_memory=25Mneostore.relationshipstore.db.mapped_memory=50Mneostore.propertystore.db.mapped_memory=90Mneostore.propertystore.db.strings.mapped_memory=130Mneostore.propertystore.db.arrays.mapped_memory=130M
Object Cache:
node_cache_size=256Mrelationship_cache_size=256M# optionalnode_cache_array_fraction=5relationship_cache_array_fraction=5
# The GC resistant cache described below is only available in the# Neo4j Enterprise Edition.# cache_type values: soft (default), weak, strongcache_type=gcr
neo4j: Concurrencyneo4j.properties
# concurrent HTTP requests that the server will service.org.neo4j.server.webserver.maxthreads=64
neo4j: HAneo4j-server.properties
org.neo4j.server.database.mode=HA
neo4j.properties
ha.server_id=1
ha.initial_hosts=server1:5001,server2:5001#ha.discovery.url=http://example.com/list
#Host & port to bind the cluster management communication.ha.cluster_server=server1:5001
#Hostname and port to bind the HA server.ha.server=my-domain.com:6001
##### Optional cluster strategies ###### Interval of pulling updates from master.ha.pull_interval=10s
#The amount of slaves the master will ask to replicate a committed#transaction.ha.tx_push_factor=1
#Push strategy of a transaction to a slave during commit.ha.tx_push_strategy=fixed # or round_robin
UseFile System
$PATH_TO_NEO4J = /opt/neo4j
/opt/neo4j/bin neo4j neo4j-backup
/opt/neo4j/conf neo4j.properties neo4j-server.properties neo4j-wrapper.conf
/opt/neo4j/data
/opt/neo4j/data/graph.db The actual graph data
/opt/neo4j/data/log All logs
UseFile System
$PATH_TO_NEO4J = /opt/neo4j
/opt/neo4j/bin (/usr/bin/mysql) neo4j neo4j-backup
/opt/neo4j/conf (/etc/mysql) neo4j.properties neo4j-server.properties neo4j-wrapper.conf
/opt/neo4j/data (/var/lib/mysql)
/opt/neo4j/data/graph.db (/var/lib/mysql/data) The actual graph data
/opt/neo4j/data/log (/var/log/mysql) All logs
UseIndexes
The database itself is a natural index
Lucene for searches
neo4j 2.0:Nodes have labels: Person, Location, etc. that group them into sets
CREATE INDEX ON :Person(name)
Look familiar?
CREATE INDEX id_index ON Person (id);
UseIndexesneo4j 2.0:
Properties can have unique constraints
CREATE CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE
Look familiar?
CREATE UNIQUE INDEX email_index ON Person (email);
UseIndexes
Current 1.9.x:
Auto indexing (deprecated):
one for nodes, one for relationships
off by default
UseQuerying
mysql> select * from graph_local limit 10;+----+-------------------+---------+---------------+------------+| id | graph_template_id | host_id | snmp_query_id | snmp_index |+----+-------------------+---------+---------------+------------+| 1 | 12 | 1 | 0 | || 2 | 9 | 1 | 0 | || 3 | 10 | 1 | 0 | || 4 | 8 | 1 | 0 | || 5 | 58 | 2 | 0 | || 6 | 62 | 2 | 0 | || 7 | 53 | 2 | 0 | || 8 | 37 | 2 | 0 | || 9 | 67 | 2 | 0 | || 10 | 65 | 2 | 0 | |+----+-------------------+---------+---------------+------------+10 rows in set (0.00 sec)
http://www.mysql.com/products/workbench/
Example response:
UseQuerying via REST
POST http://localhost:7474/db/data/cypherAccept: application/json; charset=UTF-8Content-Type: application/json
{ "query" : "start x = node:node_auto_index(name={startName}) match path = (x-[r]-friend) where friend.name = {name} return TYPE(r)", "params" : { "startName" : "I", "name" : "you" }}
200: OKContent-Type: application/json; charset=UTF-8
{ "columns" : [ "TYPE(r)" ], "data" : [ [ "know" ] ]}
DBA PerspectiveUse the best database for the job, or both
neo4j ships with great tools
neo4j is easier to configure: fewer options, less complex, still flexiblefor optimization
HA more robust and more opaque than basic replication
For better or worse, JVM handles a lot for you
Authorization - it's up to you
Scaling up is easier than changing your data model
We're [email protected]
Thank You!Thanks to:
Aseem Kishore @aseemk
Chris Leishman @cleishm
Max De Marzi @maxdemarzi