"researchers was the fact that the unicorns spoke perfect English." "previously unexplored valley, in the Andes Mountains. Prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " \ Tokenizer = om_pretrained("EleutherAI/gpt-j-6B") Model = om_pretrained("EleutherAI/gpt-j-6B") Sending an input and generating an output takes 2 minutes on average to send a response to the user.įirst question: Is the memory consumption that is observed normal for this model? Do you see reasonable times for this level of RAM and Swap memory?įrom transformers import AutoModelForCausalLM, AutoTokenizer It takes 500 seconds (8 min) to load the model and then the RAM consumption drops to 24 GB of RAM and 14 of Swap. It consumes the 32 GB of RAM and 17 GB of Swap. I’ll start explaining what works for me: I’ve loaded the model into the machine’s RAM (no GPU, just CPU). | 0 N/A N/A 7608 G /usr/lib/firefox/firefox 10MiB | | 0 N/A N/A 2808 G /usr/bin/gnome-shell 34MiB | | 0 N/A N/A 1324 G /usr/bin/gnome-shell 70MiB | | GPU GI CI PID Type Process name GPU Memory | ![]() ![]() | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
0 Comments
Leave a Reply. |