Translations of Simple English Wikipedia Articles into Typed Lambda Calculus
The below text files are annotated using a cued-association sentence processing (CASP) markup, including associations for anaphoric inheritance (-n and -m tags) and quantifier scope (-s, -t, -u tags).
Files:
- (v0.3) syntactic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
- (v0.3) semantic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the second 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
- (v0.3) syntactic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the second 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
- (v0.3) semantic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the second 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
The below version 0.2 annotation files must be manually translated into large lambda calculus text files (over 100M each) using the modelblocks software package.
After installing modelblocks, go to the modelblocks-release directory and create the workspace directory:
makeThen, from the modelblocks/workspace directory:
curl -O https://linguistics.osu.edu/sites/default/files/2021-06/wikisemc2.casp_.toktrees_0.txt mv wikisemc2.casp_.toktrees{_0.txt,} make wikisemc2.casp_.discexprsIf you have trouble running modelblocks, you can build the files manually:
cat wikisemc2.casp_.toktrees | perl ../resource-linetrees/scripts/editabletrees2linetrees.pl > wikisemc2.casp_.senttrees cat wikisemc2.casp_.senttrees | sed 's/\^g//g' | python2 ../resource-gcg/scripts/senttrees2discgraphs.py -e > wikisemc2.casp_.discgraphs if [ ! -d ../../modelblocks-release/config ]; then mkdir ../config; fi echo '-DNDEBUG -O3' > ../config/user-cflags.txt if [ ! -d bin ]; then mkdir bin; fi g++ -I../resource-rvtl -Wall `cat ../config/user-cflags.txt` -g -lm ../resource-linetrees/src/indent.cpp -o bin/indent cat wikisemc2.casp_.discgraphs | python2 ../resource-gcg/scripts/discgraphs2discexprs.py | bin/indent > wikisemc2.casp_.discexprsFiles:
- (v0.2) semantic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
- (v0.2) semantic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the second 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
- (v0.2) semantic annotations for 3-sentence beginnings of the first 279 articles in a 2014 dump of Simple English Wikipedia which are not redundant with tranches C1 or C2.