Any thoughts of using sqlite?

Talk among developers, and propose and discuss general development planning/tackling/etc... feature in this forum.
wowsher
Trader
Trader
Posts: 22
Joined: Sun Feb 04, 2007 5:16 pm
Contact:

Any thoughts of using sqlite?

Post by wowsher »

I was working on VSEditor and have started to use sqlite for loading, matching and sorting the data and was wondering if any thought has been given to moving the data from the csv files (master parts & units) into sqlite? So far it has gone pretty smooth, is fast, and would allow for sql to be used to look up the data and allow for much easier data changes etc. I started looking for where units.csv is used to see how hard it would be to change to looking at a table or tables and found only one reference to opening the units.csv (vegastrike-0.5.0\cgi-accountserver\db.py). So any thoughts? I would be willing to help out with this ...

Anyhow just thinking out loud here. Call me crazy if you want ;)

wowsher aka tony
ace123
Lead Network Developer
Lead Network Developer
Posts: 2560
Joined: Sun Jan 12, 2003 9:13 am
Location: Palo Alto CA
Contact:

Post by ace123 »

It's possible... but one nice thing about CSV is that there are hundreds of editors out there that edit the format (Spreadsheet programs).

Sqlite3 would require software to modify it unless it has a built-in database editor that works well.
javier
Bounty Hunter
Bounty Hunter
Posts: 137
Joined: Tue Jan 31, 2006 12:46 am

Post by javier »

While it could be a good idea, specially if the data structures keeps growing, where I feel a database engine could be more useful is replacing the actual savefile and to keep state data on non active systems, to reduce the actual memory consumption on data structures that are not really needed to be always loaded.
I agree the main advantage the CSV format has is the ability to be edited with a wider number of means, so a good multiplatform editor would be a must.
ace123
Lead Network Developer
Lead Network Developer
Posts: 2560
Joined: Sun Jan 12, 2003 9:13 am
Location: Palo Alto CA
Contact:

Post by ace123 »

I know what you mean about sizes of data, especially saved games -- I think it holds at least tens of thousands of rows about flightgroup data, which is all read into a map (hash_map had some memory issues or something)

The only issue is that Python may be constantly needing to access data for flightgroups in all systems... (it might go through one system and a lot of flightgroups per Python frame)...

I also wouldn't really want to switch to a fully relational model (which could enforce a certain savedgame format), unless we save the whole database in it, and allow Python to create the database tables, which could be interesting.
javier
Bounty Hunter
Bounty Hunter
Posts: 137
Joined: Tue Jan 31, 2006 12:46 am

Post by javier »

Well, sqlite integrates well with python.
If there is some need of a sql database (that's the first point), I think sqlite is a suitable candidate:
  • Source code available on the public domain
  • C++ and Python access. Also VB, Lua and almost every language out there.
  • Databases are contained only as one file.
  • CLI interface by default, GUI interfaces available
I've been using it for small/medium sized databases and found it simple and reliable, with no maintaining needed.
And, of course, you can create a new database from the source python modules.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

I think the main issue you have to consider is this.

Why do you want to change how we do csv files?

If you answer, because it's too hard to read, then moving to a database is not the answer, you cant read those at all.

The real answer to that is to provide a cross platform editor that will display and allow manipulation of the info just as easily as the database editors, probably easier.

If you answer, it's because loading a file takes up a lot of memory, then you have to also consider the memory sqlite will use. Either method will have to load the file to ram, the alternative is reading from disk as needed, and the latency would be outrageous. db's will always consume more memory than reading a flat text file.

So what would be a good reason to move to a database?

if we used a lot of small files, database is better.
if we required some type of thread safe access to files, database would be better.
if we required complex data sorting, database might make it easier.

I'm just not convinced csv's are ideal candidates for databasing. What we need is a robust editor in python's tk. A single py file would solve your issues. Such an editor is already in the works.


edit: just reread your first post. If your work is along the same lines, then we should definitely talk about that.
Ed Sweetman endorses this message.
javier
Bounty Hunter
Bounty Hunter
Posts: 137
Joined: Tue Jan 31, 2006 12:46 am

Post by javier »

The memory consumption argument is based in that the data really needed in simulations is an small subset of the full dataset loaded. So, the cost of disk access can be balanced with the amount of data actually loaded in memory.
What a database really could give is a better support for persistence than plain files and the ability of multiple processes accessing the data.

With ACID capability, and sqlite has it, some of the heavier python memory hogs could be splat into separate, low priority processes. And it opens the door to concurrent processing.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

You really only have two advantages to db use. Lots of small files (we dont have), Or multiple processes writing files (we dont have).

We can read a single file's data by multiple processes without databases, even share the data read in amongst all of them.

I think db is overkill for our use of tabled data. It's advantages dont outweigh it's costs.

Perhaps it would have a place in the account server for managing user data.
Ed Sweetman endorses this message.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

btw, is this vseditor cross platform? What language is it written in, gui tookit does it use, etc?

the vseditor i was hoping would get done would have been python based using tkinter as it's gui. This would yield one file that automaticallly runs on every OS we support without any changes.

It really shouldn't need to be any more complicated than pushing data to slots of vectors that get pre-defined based on the first line. Where all the thought goes is into the gui, making editing, inserting, deleting, moving and such operations as easy and intuitive as possible.
Ed Sweetman endorses this message.
javier
Bounty Hunter
Bounty Hunter
Posts: 137
Joined: Tue Jan 31, 2006 12:46 am

Post by javier »

Right now, yes. But looking at future development, it offers a well known, well documented way of having both persistence and data sharing. If, at some point, we want to distribute processes across a network, it could be even more convenient.

BTW, it's not a the number of files what you have to take into account, but the number of data structures you're managing. Reading a large plain file to find some piece of data is usually slower than using a query on a well defined database.

Evidently, there is data that could simply not be accessed this way because of performance reasons, but I can think about a lot of things that could be. If you can put out of permanent memory allocation things like unit stock lists or mission lists, that's an improvement, specially if you can make parallel background processes updating such kind of data.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

the game doesn't write to csv files. it only reads them. Reading is totally thread safe. databases are only faster at queries if our data pool is _huge_ or the data would otherwise be spread around many files.

I'd rather fix how we load csv files in game, make accessing data from that loaded file more efficient than move it all to a database backend.

databasing it just doesn't make sense. It's not big enough for the database to be faster. Everything that a database does to query something we'd do with much less overhead. The database wouldn't be any simpler than the flat file we use, since the data in the flat file is fairly fixed in format already.

The problem here isn't using csv files in the game. It's making them easy for developers/modders to edit them. changing the backend is not necessary for this. A robust cross platform gui is.
Ed Sweetman endorses this message.
javier
Bounty Hunter
Bounty Hunter
Posts: 137
Joined: Tue Jan 31, 2006 12:46 am

Post by javier »

Sorry, but I think there is a misunderstanding. I was talking about the save file, not the CSV.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

the thread is about csv files. but i can address the config file.

The config file is xml, not in the csv format. The config file is not written to by the game, though it could be in the future, in any case, there is still no need for thread worries, as the only place you would be able to edit the config file in game, is prior to actually starting it, so there would be no code trying to read while we edit.

There is only one config file in use at a time. If csv files were too few to make sense in a db, then the config file definitely is.

There is already a robust way of accessing xml data in VS. A database would only duplicate the function of that code, only you'd have to add the overhead of the DB storage engine on top of the xml parser. That's horribly inefficient for one file. A file that's not even large.
Ed Sweetman endorses this message.
javier
Bounty Hunter
Bounty Hunter
Posts: 137
Joined: Tue Jan 31, 2006 12:46 am

Post by javier »

Sigh.
I know the OP was about the CSV. I'm not trying to hijack the thread, really. But I wasn't talking about the config file either. I was stating that where I could find a database useful is in the savegame file and the information it holds and updates during the game.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

ok, savegame now. I really dont know where you're going with that. You want VS to throw out all the data structures related to game state, and replace it with a database backend? How exactly is that faster than reading and writing a variable in ram?

I think you're confused as to how savegames are read and written to. It's not something that is kept up to date. When a user saves a game, the function just pulls data from everywhere it needs to and writes the data to disk. When a user loads a game, the function tells VS to do things based on that data to "recreate" the state of that savegame.

I dont see how a database would make sense there either. To do anything but add a level of abstraction that would only serve to slow things down, you'd have to replace all access and storage of game state throughout the game with the database server. The latency and scope of that change would be fairly huge, on both accounts.
Ed Sweetman endorses this message.
javier
Bounty Hunter
Bounty Hunter
Posts: 137
Joined: Tue Jan 31, 2006 12:46 am

Post by javier »

This is going nowhere fast, so I'm going to leave it here. I haven't stated anytime ALL THE DATA should be stored in a database. As you said, this would have an unaceptable performance penalty.
My point is that there is a lot of data hanging around that is seldom used, and here is where a database comes handy, because putting that kind of data there will reduce the memory footprint of the main executable.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

If you dont replace it all with the database, then you have all the code necessary now + the database code. Now only would that increase the code size, it would decrease the speed and increase the complexity.

What data is hanging around that's seldom used? How would a database result in less memory being used ? Are we talking about the save game again? The savegame file is thrown out of memory after it's processed, when the user saves another savegame, the data is pulled from various places in the game.

I'm not seeing the amounts of memory being used to serve _just_ the save/load game feature. I'm not seeing how adding an abstraction while retaining the legacy code saves space anywhere or speeds things up. Now instead of reading in a file and pushing to data structures used by the game, we have reading in a file via db engine, then querying the db engine, then saving to data structures.

the only time db's make anything faster or better is if the reading in of the data from files is so complex that all that extra overhead involved with the db engine is less than the processing of the files into the data structures used by the game. Even then, it only helps if the DB representation is much closer to something directly translatable to the internal data structures used in the game.
Ed Sweetman endorses this message.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Post by klauss »

First of all, how big is the csv file?
Just under 1MB.
That much RAM it takes (plus some tiny overhead). Units are kept on that csv-table format until they get spawned, when they get inflated to a full Unit object with all the overhead that implies.

Do you want to optimize over 1MB? I don't think so.

How much do savegames take?
In unparsed form (on disk), about 8MB is the biggest one I've seen. If you want to gzip them, then a LOT less. (gzipping savegames has been on my mind for a while :D ). But... can we use them in unparsed form? No... loading a savegame implies parsing it and putting all its stored content where it's needed as safemode said. Databasing the savegame wouldn't help with that. Not without making every part of VS read directly from the DB as needed, which is a HUGE undertaking with unclear benefits.

Now... databasing has been proposed earlier for other reasons. One being that it would somehow help distributed online content creation. That is, a server is online somewhere accepting updates which it stores to the db. VS can replacate the DB by pulling or access the server directly, therefore propagating any online change. Campaign stuff is most useful this way, since there could be distributed creation of content by actions ingame (if a player does something, the effect is stored on the db and then replicated down to each client).

In short, putting things in a DB helps manage increasing complexities where those complexities would be required. IF those complexities were required. I don't think it's a topic for discussion right now, though ways to let mods support such a backend have been discussed and that would be interesting. Ie: make data storage moddable.
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
ace123
Lead Network Developer
Lead Network Developer
Posts: 2560
Joined: Sun Jan 12, 2003 9:13 am
Location: Palo Alto CA
Contact:

Post by ace123 »

I think his concern is that we take a 8 MB save file and load it all into memory (which is a map of string to vector of string), which might take up more space in RAM.

If we integrate it into the savegame system there's a possibility that we can allow for some cool things (economy?) at the Python level.
Speaking of economy, it might help to have the sectors file in a database level.

The main issue is that database software (even sqlite3) comes with overhead, so we have to figure out whether that overhead is worth it...

I am not saying I want to do this. I am only suggesting if there is a *significant* (enough to run an advanced economy system in Python/C++) performance benefit by using SQL, that we should do it. I fear, however, that the map is actually pretty fast, and that the slowness in Python is just caused by the language itself.

There is no reason IMO to attempt to do a huge optimization like this, and at the end of the day, all of it still gets loaded into RAM (since it is accessed every frame), and it does not improve loading/saving speed.

Optimizations need to make something better to be worthwhile.
legine
Bounty Hunter
Bounty Hunter
Posts: 139
Joined: Mon Sep 27, 2004 8:40 am
Location: Germany
Contact:

Post by legine »

Doesnt a DB bring transparency into the game?

Wasnt the Savegame loadeing produce uncatchable Classes? (Or something similar... :P) or... Well you know the problem within the code better then I do.

However for instance I would love to access the Gameconfiguration within the game whenever I want. I.E. for swapping to fullscreen mode. I miss such a feature badly.
Ofc that has nothing to do with a db in the first place. :-D But if you think about it from a DB view. We gain outomaticly the ability to access the game stuff if we organize the Game Data in one place and use central Classes for reading.

By the way a file system is a very fine DB. Not optimized on data but on files. Or XML is great concept for small tabled structured data, we already use that. Hashes we use too I learned just today.
We just need to drag it all together. And we could put a DB whatever we like whenever we like if we need it.
As a side point a DB like thinkage in Dataorganization would also help to chop the game into a Game client - Server structure because the Data is "independendly" stored from the game.

I think that are huge Amount of bonuses against the speed loss and the more memory consumption we might get.

(To make sure. I am not talking about actually useing one specific DB . I am talking about being able to do so by simply exchange some Classes that provide the data.)

I hope I am understandable in my blabber bc I am just the oneeyed among the 2 eyed and not the king. (Ahh and I am sure we have a lot/some of the needed structure already.)
wowsher
Trader
Trader
Posts: 22
Joined: Sun Feb 04, 2007 5:16 pm
Contact:

Post by wowsher »

wow -- sorry I did not mean to cause problems ;) I was thinking along the lines of simplifying the data to enable easier modding and tool creation. So yes it was a selfish question but I was truly curious.

Warning --- personal opinions follow do not read if easily offended ;)

I also do not entirely agree about the performance issues of using sqlite versus csv files. I am somewhat prejudiced towards dbs since I work with them all day but have used the file method in the old days when it was faster then many of the available db engines. Oh and just because it is in a DB does not mean it has to be normalized. In fact our testing showed that it was faster to have it flat but you could still take advantage of all the other features. I have also been talking with a buddy of mine who is designing the db backend for MMO games which I know this is not one some of the ideas are nice. The main reason I asked is that the csv files are pretty complex and require a great deal of coding to read and write and not very tolerant of misplaced characters or missing or misplaced 'fixed' lines denoting sections etc -- the main advantage is that the current stuff is coded into the game engine now. The use of quotes also seems to vary between the different subsets of data (or at least I do not see the reasoning behind it). Anyhow I was thinking in the vein of making things much easier for very little cost in performance. I also understand that this is your creation and I do read that you like it how it is today. No problems with that.

The following is again safe for consumption:

thanks for an enlightening thread and
thanks for creating this game and continuing to improve it!!!!
wowsher aka tony
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

bringing in another layer of abstraction will always meet resistance, but bringing in a new dependency, will meet and should meet serious resistance. There needs to be a very good reason why this new dependency is needed, and i just dont think temporary storage between reading and saving csv files is a good enough reason.

the only real hope i can see for sqlite is that it's supported out of the box on all OS's we run on and it doesn't require a separate download or special configuration to setup. If that's good, you'd then be free to try and convince us that we wont be making more work for ourselves in interfacing Vegastrike with sqlite to read a db stored version of csv files. Because the only way db'ing these files makes any sense, is if you dont have to translate to a flat file.

using it as an intermediary storage is just extra work.
Ed Sweetman endorses this message.
ace123
Lead Network Developer
Lead Network Developer
Posts: 2560
Joined: Sun Jan 12, 2003 9:13 am
Location: Palo Alto CA
Contact:

Post by ace123 »

I noticed this thread turned a bit into a war between relational DB vs. custom format. People tend to stick with what already works (our savegame/csv/xml formats) unless there is both a clear benefit and an easy way to do it.

If you are genuinely interested, I would recommend you look into integrating sqlite directly into vegastrike yourself. It is open source, and including your changes is entirely optional. If it is successful it also has the potential (especially if relational tables are used right) to allow for neat new features in VS!

The only main caveats are that you have to support existing code (unless you want to rewrite all the Python code :-D), so I would try rewriting savegame.cpp, while keeping the header as much the same as possible... same would go for galaxy_xml.cpp if you want that.

For CSV data, we don't have enough data for it to have a significant gain from a DB, so I would avoid working with CSV files until we see if the savegames or galaxies work better.

Would definitely be an interesting thing to try, and it sounds like you have a fair bit of experience with it. If you start working on this it would be possible for you to have a Subversion branch where you could make a Database version of Vega Strike.

However, also do not be dissapointed if these efforts don't end up being used--it will still be an interesting experiment and your code can be integrated in the future if new things are added.
legine
Bounty Hunter
Bounty Hunter
Posts: 139
Joined: Mon Sep 27, 2004 8:40 am
Location: Germany
Contact:

Post by legine »

agreed with ace123. If it is a feature, and can is aditionall to the basics what we have it is the way to go.

Enjoy the choice :-D
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Post by klauss »

On the other side of things, I believe supporting such a customization is a noble goal. Right now, though not pretty or anything, units.csv's codebase is quite modular in that there's only a handful of places where you have to edit code (units_csv.cpp in fact) to make it fetch data from any other place.

System information and other related stuff may not be as modular.

Perhaps the fact that implementing such a backend change is so complex is a symptom of a problem that needs solving. The priority of such a task is another matter though.
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
Post Reply